Content-Type: multipart/mixed; boundary="-------------0004191419890" This is a multi-part message in MIME format. ---------------0004191419890 Content-Type: text/plain; name="00-190.comments" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="00-190.comments" 29 pages ---------------0004191419890 Content-Type: text/plain; name="00-190.keywords" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="00-190.keywords" information theory, neural networks, renormalization, thermodynamics ---------------0004191419890 Content-Type: application/x-tex; name="wallace.tex" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="wallace.tex" \scrollmode \documentclass[twocolumn]{article} \begin{document} \title{\textbf{Information resonance and pattern recognition in classical and quantum systems: toward a `language model' of hierarchical neural structure and process}} \author{Rodrick Wallace, Ph.D.\\The New York State Psychiatric Institute\\and\\PISCS Inc.\thanks{Address correspondence to R Wallace, PISCS Inc., 549 W. 123 St., Suite 16F, New York, NY, 10027. Tel. (212) 865-4766, rdwall@ix.netcom.com. Affiliations are for identification only.}} \date{May, 2000} \maketitle \begin{abstract} Recent applications of the Shannon-McMillan Theorem to arrays of nonlinear components undergoing what is effectively an `information resonance' (R Wallace, 2000a) may be extended to include many neural models, both classical and quantum. Some consideration reduces the threefold interacting complex of sensory activity, ongoing activity, and nonlinear oscillator to a single object, a parametized ergodic information source. Invocation of the `large deviations' program of applied probability that unifies treatment of dynamical fluctuations, statistical mechanics, and information theory allows a `natural' transfer of thermodynamic and renormalization arguments from statistical physics to information theory, permitting a markedly simplified analysis of neural dynamics. This suggests an inherent language-based foundation, in a large sense, to neural structure and process, and implies that approaches without intimate relation to language may be seriously incomplete. \end{abstract} \textbf{Key Words:} Coevolution, information resonance, information theory, large deviations, multitasking, neural networks, Onsager relations, phase transition, quantum neural networks, renormalization, state space algebra, stochastic resonance. \begin{center} \textbf{Introduction} \end{center} Researchers have begun to adopt an information-theory approach to the study of stochastic resonance, usually involving the maximization of noise-dependent `mutual information' between input and output (Deco and Schurmann, 1998, Heneghan et al, 1996, Godivier and Chapeau-Blondeau, 1998, Nieman et al., 1996). Similarly, other groups are examining the neural code and neural networks from an information theory viewpoint, as is summarized by Rieke et al. (1997) and Deco and Obradovic (1996) respectively. Recently Wallace (2000a, b) and Wallace and Fullilove (1999) effectively invoked an `information resonance' combining both stochastic resonance and neural models under the intellectual umbrella of the `large deviations program' of applied probability (e.g. Dembo and Zeitouni, 1998), permitting the transfer of renormalization methods and thermodynamic formalism from statistical mechanics to information theory as an expression of underlying `architecture.' Here we will describe these results as they apply to hierarchical neural structures and outline a research agenda based on the questions this effort raises. The general context is given by Schurmann in his forward to the recent book by Deco and Obradovic (1996, p. vii). Schurmann writes: \begin{quotation} ``Chronological milestones in the history of artificial neural networks are Hebb's book on the organization of behavior, Rosenblatt's book on principles of neurodynamics in which he defines the perceptrons, Hopfield's discovery of the analogy of certain types of neural networks to spin glasses and the exploitation of the associated energy function, the generalization of simple perceptrons to feedforward multi-layer perceptrons accompanied by the backpropagation learning algorithm of Rumelhart and others and its extension to multi-layer perceptrons with feedback accompanied by the recurrent backpropagation learning algorithm of Almeida, Pineda and others.'' \end{quotation} Schurmann sees the application of information theory methods as a next natural step in neural network theory, and finds ``particularly high potential'' if information theory treatments are explicitly ``linked with the methods of nonlinear dynamics... [which] remains a topic for future research...'' Below we will outline such a linkage. Several particulars, however, distinguish our approach: First, we attempt a draconian simplification, seeking to employ information theory concepts only as they directly relate to the basic limit theorems of the subject. That is, message uncertainty and information source uncertainty are interesting \textit{only because they obey the Coding and Source Coding Theorems}. `Information Theory' treatments which do not sufficiently center on these theorems are, from our view, off the mark. From this perspective most discussion of `complexity,' `entropy maximization,' other definitions of `entropy,' and so forth, just does not appear on the horizon. In the words of William of Occam, ``Entities ought not be multiplied without necessity.'' The second matter is more complicated: Rojdestvenski and Cottam (2000, p.44), following Wallace and Wallace (1998), see the linkage between information theory and statistical mechanics in quite general terms as involving \begin{quotation} ``...[Homological] mapping... between ... unrelated ... problems that share the same mathematical basis... [whose] similarities in mathematical formalisms... become powerful tools for [solving]... traditional problems.'' \end{quotation} We believe the relation of information theory to neural structure and process to be somewhat more sharply constrained, revolving about two homologies, in the above sense: (1) a `linguistic' equipartition of probable paths consistent with the Shannon-McMillan Theorem which serves as the formal connection with nonlinear mechanics and large fluctuation theory, and (2) a correspondence between information source uncertainty and statistical mechanical free energy, not statistical mechanical entropy. Indeed, Bennett (1988), among others, long ago realized that ``...[T]he value of a message is the amount of...work plausibly done by its originator, which the receiver is saved from having to repeat.'' We discuss the first point in more detail below, and will invoke the second for exploration of the connection between neural architecture and dynamics to obtain deep results in a relatively elementary manner. \begin{center} \textbf{Information resonance} \end{center} The central idea of stochastic resonance is that the addition of `noise' to a weak input, usually taken as a sinusoidal or other repeated train of excitations, can raise the amplitude of the combined signal so as to exceed the triggering threshold of a powerful nonlinear oscillator, resulting in an amplified output train. Proper choice of noise amplitude can maximize the signal-to-noise ratio of the combined system (e.g. Gammaitoni et al., 1998): too little noise fails to reach threshold, while too much washes out the signal. Using the `prehistory probability density' concept (McClintock and Luchinsky, 1999) as the critical starting point, we have carried out a somewhat expanded development (R. Wallace, 2000a, b) leading to the more general concept of an `information resonance.' We consider a generalized ongoing activity `noise,' which may in fact be very highly structured, and a sensory activity `signal' mixed together, in some possibly complicated manner, to produce a compound intermediate which is then sent into the nonlinear oscillator to produce an amplified output. Figure 1 of Wallace (2000a) shows this two-step process: The `signal' and `noise' are convoluted to produce a sequence of discrete states $a_{i}$, where $i$ is a non-negative integer. A relatively small number of sequential patterns of these convoluted states having the form $a_{0}, a_{1}, ... a_{n}$, which we call paths $x$, lead to discontinuous observable events, a generalized `information resonance' analogous to the enhanced clicking of a switch -- the nonlinear oscillator -- by a weak signal in the presence of noise. That is, each path $x$ has associated with it a discontinuous function $h(x)$ taking possible values $0, 1$. $h(x)=1$ represents the triggering of the oscillator by the path $x$, $h(x)=0$ implies that $x$ did not trigger the oscillator. The definition can be extended, under proper conditions, to a stochastic system in which $h(x)$ is the probability that the nonlinear oscillator fires, provided a disjunction can be made between paths $x$ which have high and low probabilities of triggering the oscillator. We make an application to the stochastic neuron: A series of inputs $y_{i}^{j}, i=1...m$ from $m$ nearby neurons at time $j$ is convoluted with `weights' $w_{i}^{j}, i=1...m$, using an inner product (e.g. Deco and Obradovic, 1996, p. 24) \begin{equation} \[a_{j} = \mathbf{y^{j} \cdot w^{j}}=\sum_{i=1}^m y_{i}^{j}w_{i}^{j}, \] \end{equation} in the context of a `transfer function' $f(\mathbf{y^{j} \cdot w^{j}})$ such that the probability of the neuron firing and having a discrete output $z^{j}=1$ is \[ P(z^{j}=1) = f(\mathbf{y^{j} \cdot w^{j}}). \] The probability the neuron does not fire at time $j$ is thus \[ P(z^{j}=0)=1 - P(z^{j}=1) = 1 - f(\mathbf{y^{j} \cdot w^{j}}). \] From our viewpoint the $m$ values $y_{i}^{j}$ constitute the `sensory activity' and the $m$ weights $w_{i}^{j}$ the `ongoing activity' at time $j$, with $a_{j} = \mathbf{y^{j} \cdot w^{j}}$ and $x=a_{0}, a_{1}, ... a_{n}$. The $a_{j}$ will almost always be serially correlated, e.g. `integrate-and-fire.' It would appear that many neural models fall under what we have called an information resonance, although extension of the concept to address architecture and learning paradigms requires some work. Given a fixed initial state $a_{0}$ such that $h(a_{0})=0$ we examine all possible subsequent paths $x$ beginning with $a_{0}$ and leading exactly once to the event $h(x)=1$. Thus $h(a_{0},..a_{j})=0$ for all $j < m$ but $h(a_{0},...a_{m})=1$. For each positive integer $n$ let $N(n)$ be the number of paths of length $n$ which begin with some fixed $a_{0}$ having $h(a_{0})=0$, and lead to the condition $h=1$, or, in a stochastic system, to a state $a_{n}$ with a very high probability of firing the oscillator. We shall call such paths `meaningful.' In general we assume $N(n)$ to be considerably less than the number of all possible paths of length $n$ -- information resonance transitions are comparatively rare -- and in particular assume that the finite limit \begin{equation} \[ H = \lim_{n \rightarrow \infty} \frac{\log [N(n)]}{n} \] \end{equation} exists and is independent of the path $x$. Thus meaningful paths are asymptotically equiprobable, with each of length $n$ having probability $P(n) \propto \exp(-n H)$. We shall, in accordance with the standard treatment of information theory (e.g. Khinchine, 1957), call an information resonance satisfying this condition \textit{ergodic}. It seems likely that the underlying space defined by the $a_{i}$ can be partitioned into disjoint equivalence classes according to whether states can be connected by meaningful paths. This would be analogous to a partition into `domains of attraction' for a chaotic system, and indeed much of the subsequent development, including questions of hierarchical structure, can be rephrased in terms of the algebraic structure of that space, a matter to which we will return repeatedly. Such state space partitioning implies the possibility of a `neural multitasking' where nonlinear oscillators participate in disjoint but (nearly) simultaneous information resonance processes. Imposition of an inverse algebraic relation results in a group of finite order. This implies the possibility of even more complicated multitasking structure, since the number of different groups of finite order is powerfully determined by the prime number factoring of that order. We can envision, then, not only a number of disjoint multitasking processes corresponding to a given group's equivalence class structure, but determined in total by the full number of different groups corresponding to that order. By the Asymptotic Equipartition Theorem, otherwise known as the Shannon-McMillan Theorem (SMT), for a certain class of information resonances, as we have defined them, there will be an ergodic information source $\mathbf{X}$ associated with stochastic variates $X_{i}$ taking the values $a_{i}$ with joint and conditional probabilities $P[a_{0}, ..., a_{n}]$ and $P[a_{n}|a_{0}, a_{1}, ... a_{n-1}]$ such that appropriate joint and conditional Shannon uncertainties may be defined satisfying the relations \begin{equation} \[H[\mathbf{X}] = \lim_{n \rightarrow \infty} \frac{\log [N(n)]}{n}\] \[ = \lim_{n \rightarrow \infty}H(X_{n}|X_{0}...X_{n-1}) \] \[= \lim_{n \rightarrow \infty}\frac{H(X_{0},...X_{n})}{n+1} \] \end{equation} where $H(X|Y)$ and $H(X,Y)$ represent, respectively, the \textit{conditional} and \textit{joint} uncertainties of the variates $X$ and $Y$. The joint uncertainty of stochastic variates $X$ and $Y$, taking possible values $x_{i}$ and $y_{i}$, is defined in terms of their joint probabilities as \begin{equation} \[H(X, Y) = -\sum_{i}\sum_{j} P(x_{i}, y_{j})\log [P(x_{i}, y_{j})]. \end{equation} The conditional uncertainty of $X$ given $Y$ is \begin{equation} \[H(X|Y)=-\sum_{i}\sum_{j} P(x_{i},y_{j})\log[P(y_{j}|x_{i})]. \] \end{equation} See Khinchine (1957), pp. 117-120, Ash (1990) or Cover and Thomas (1991) for essential details. We will define the information source $\mathbf{X}$, provided it exists, to be \textit{dual} to the information resonance. We have thus reduced three complex, synergistically interacting components -- signal, noise and nonlinear oscillator -- into a single object that can be appropriately parametized and on which we can impose important structure and symmetry. The utility of this approach will become more apparent as we proceed. Source uncertainty is a language function with an important heuristic interpretation (Ash, 1990, p. 206): \begin{quotation} ``...[W]e may regard a portion of text in a particular language as being produced by an information source. The [conditional] probabilities $P[X_{n}=a_{n}|X_{0}=a_{0}...,X_{n-1}=a_{n-1}]$ may be estimated from the available data about the language; in this way we can estimate the uncertainty associated with the language. A large uncertainty means... a large number of `meaningful' sequences. Thus given two languages with uncertainties $H_{1}$ and $H_{2}$ respectively, if $H_{1} > H_{2}$ then in the absence of noise it is easier to communicate in the first language; more can be said in the same amount of time. On the other hand, it will be easier to reconstruct a scrambled portion of text in the second language, since fewer of the possible sequences of length $n$ are meaningful.'' \end{quotation} Languages are most fundamentally characterized by strict patterns of internal relationship, for example grammar, syntax, and higher levels of organization. Our development suggests that many information resonance phenomena, in this larger sense, are very highly structured and may be studied and perhaps predicted by understanding the `metalanguage' in which they are embedded and indeed which they define. According to this development, then, `nonsense' paths $x = a_{0},..., a_{n}$ which violate the grammar and syntax of a particular information resonance cannot trigger it. What we have done is, in the sense of Schurmann above, closely related to current research in nonlinear dynamics: The condition $h(x)=1$ represents, in this formulation, a `large fluctuation' of the system in the sense of Dykman et al. (1996). To paraphrase that work, large fluctuations, although infrequent, are fundamental in a broad range of processes, and it was recognized by Onsager and Machlup (1953) that insight into the problem could be gained from studying the distribution of fluctuational paths along which the system moves to a given state. This distribution is a fundamental characteristic of the fluctuational dynamics, and its understanding leads toward control of fluctuations. Fluctuational motion from the vicinity of a stable state may occur along different paths. For large fluctuations, the distribution of these paths peaks sharply along an optimal, i.e. most probable, path. In the theory of large fluctuations, the pattern of optimal paths plays a role similar to that of the phase portrait in nonlinear dynamics. For our development the information-theoretic `meaningful' statements $x=a_{0},...,a_{n}$ play the role of `optimal' paths in the theory of large fluctuations, and we have given them an information theory treatment consistent with large deviation theory (Dembo and Zeitouni, 1998; Ellis, 1985). The first real step in this direction was made more than sixty years ago by Cramer (1938). The ergodic theorem, in the context of recent generalizations of Cramer's results by Gartner and Ellis, permits derivation of the Shannon-McMillan Theorem as the `zero error limit' under the rubric of `rate distortion theory' (Dembo and Zeitouni, 1998). Our analysis suggests that neural phenomena and other generalizations may fit `naturally' into this larger framework. See Appendix 1 for details of the most elementary large deviations argument. \begin{center} \textbf{Tuning an information resonance: learning paradigms and the Shannon Coding Theorem} \end{center} Here we explore the extraordinary utility of an information resonance as a detector of subtle pattern. It is indeed this behavior which suggests characterization as a resonance. `Learning paradigms' in neural networks evidently represent one systematic means of constructing such detectors, but application of a simple information theory argument appears to suggests room for improvement. We now focus on the Shannon Coding Theorem, the other fundamental result of information theory, rather than on the Source Coding Theorem. The properties of an information resonance combining `signal,' ongoing activity `noise,' convolution operation and nonlinear oscillators are the synergistic result of a subtle, multifactorial interaction. Effective control of such a system will likely involve a similarly synergistic tuning of more than one component. In particular assume we have some complicated pattern represented by a stochastic variate of sensory input $X$, which we wish to detect. To reiterate, we feed that signal into a generalized information resonator by (1) convoluting it with the ongoing activity `noise' which may itself be highly structured, and (2) feeding the combined result into a system of nonlinear oscillators, producing, under proper circumstances, an `encoded' train of output spikes or more subtle coherent spatiotemporal patterns, which we characterize as a stochastic variate $Y$. Proceeding somewhat schematically, let $H(X)$ be the Shannon uncertainty of the signal $X$, defined simply in terms of its probability distribution, $p_{i}=Pr[X = x_{i}]$ in the usual manner as \[ H(X) = -\sum_{i} p_{i} \log[p_{i}]. \] Let $H(X|Y)$ be the conditional uncertainty of $X$ given the output pattern $Y$, again defined in terms of joint and conditional probabilities as \[ H(X|Y)= -\sum_{i}\sum_{j} p(x_{i},y_{j})\log[p(x_{i}|y_{j})]. \] The information transmitted by the information resonance as a communication channel is, classically (e.g. Ash, 1990), \begin{equation} \[ I(X|Y) \equiv H(X) - H(X|Y) = H(X) + H(Y) - H(X,Y), \] \end{equation} where $H(X,Y)$ is the joint Shannon uncertainty of the stochastic variates $X$ and $Y$. Note that if there is no uncertainty in $X$ given $Y$, then $H(X|Y)=0$ and the information is transmitted without loss. If we fix the ongoing activity `noise,' convolution, and nonlinear oscillator properties, then we may vary the probability distribution of the signal variate $X$, which we write $P(X)$. The \textit{capacity} of the channel is defined as \begin{equation} \[ C \equiv \max_{P(X)} I(X|Y), \] \end{equation} where we vary the probability distribution of $X$ so as to maximize $I(X|Y)$. The essential content of the Shannon Coding Theorem (Ash, 1990, Khinchine, 1957) is that for any rate of transmission of signals along the channel, $R$, such that $R,...,|\psi_{s}>$ of normalized vectors in $\cal H$. We take $\cal H$ as spanned by $\cal S$, so that $\cal H$ is of dimension $d \leq s$. Unlike the classical case, this system can entertain a superposition of states. Let $p_{j}$ be the given probability of the state $|\psi_{j}>$ being sent. The density matrix corresponding to the ensemble of signals $\cal S$ is then \begin{equation} \[ \rho = \sum_{1 \leq j \leq s} p_{j}|\psi_{j}><\psi_{j}|, \] \end{equation} with $tr(\rho)\equiv 1$. While $\cal S$ and the distribution of $p_{j}$ uniquely determine the density matrix, each such matrix corresponds to an infinite number of possible sets of states. The observables associated with quantum signals are $d \times d$ hermitian matrices, the elements of a $C^{*}$-algebra $\cal A = \cal L(\cal H)$ of linear observables on $\cal H$. The state on the algebra of observables ${\cal A}$ associated with the density matrix $\rho$ is, for any given $A \in {\cal A}$, \begin{equation} \[ \tau_{1}(A) \equiv tr(A\rho) = \sum_{1 \leq j \leq s}p_{j}<\psi_{j}|A|\psi_{j}> .\] \end{equation} Appropriate generalizations can be given for infinite dimensional tensor products, and ergodic quantum information sources can be defined. The density matrix of order $n$ becomes, in terms of the states $\psi_{j}$ which span $\cal S$, \begin{equation} \[\Pi_{n}=\] \[\sum_{1 \leq j_{1},..., j_{n} \leq s}p_{j_{1},...,j_{n}}|\psi_{j_{1}}><\psi_{j_{1}}|\otimes...\otimes|\psi_{j_{n}}><\psi_{j_{n}}| \] \end{equation} The entropy associated with a sequence of $n$ signals is defined as \begin{equation} \[H_{n}(\Pi) \equiv -tr_{{\cal H}^{\otimes n}}(\Pi_{n} \log \Pi_{n}) \] \end{equation} Some development gives \[ H_{m+n}(\Pi) \leq H_{m}(\Pi) + H_{n}(\Pi) \] so that the limit \begin{equation} \[h(\Pi) = \lim_{n \rightarrow \infty} \frac{H_{n}(\Pi)}{n} \] \end{equation} exists. We call $h(\Pi)$ the entropy of the quantum source. For a Bernoulli source $\Pi_{n} = \rho \otimes ... \otimes \rho$ and $h(\Pi) = -tr_{{\cal H}}(\rho \log \rho)$. General sources with internal serial correlations have far more complex expressions for $h$. Let $\mathbf{A} = [A_{1},...,A_{r}], r < \infty$ be a family of observables on $\cal H$ such that $A_{j} \geq 0$ for all $j$, and \begin{equation} \[A_{1} + ... + A_{r} = I\] \end{equation} where $I$ is the identity. We call the set $\chi_{\mathbf{A}} = [ 1, ..., r]$ the classical alphabet associated with $\mathbf{A}$, and denote by $\chi_{\mathbf{A}}^\infty$ the space of all infinite messages over the alphabet $\chi_{\mathbf{A}}$. In this way we can associate a classical information source with each quantum information source. Let ${\cal H}^{\otimes n}$ be the space of all signals of length $n$ for an ergodic quantum information source. According to the quantum Shannon-McMillan Theorem, it can be factored into two orthogonal subspaces \begin{equation} \[ {\cal H}^{\otimes n} = {\cal S}_{n} \otimes {\cal S}_{n}^{\perp}, \] \end{equation} whose relative dimensions are constrained by the uncertainty of the classical information source $h_{\textbf{A}}$ associated with the quantum source in a precise manner. If the $|\psi_{j}>$ are orthogonal, $h_{\textbf{A}}$ is just the Von Neumann entropy of the source, since the density operators all commute. Let $P_{{\cal S}_{n}}$ be the orthogonal projection onto the relatively small subspace ${\cal S}_{n}$. Let $C$ be an observable $C \in {\cal L}({\cal H}^{\otimes n})$, where the signal is of length $n$. Then, according to the quantum Shannon-McMillan Theorem, the difference \[ |\tau(CP_{{\cal S}_{n}}) - \tau(C)| \] can be made arbitrarily small as $n$ increases without limit. Here $\tau$ is an appropriate infinite-dimensional generalization of $\tau_{1}$ above, in terms of the complicated density matrices $\Pi$. Again paraphrasing King and Lesniewski (1995), in the case where the $|\psi_{j}>$ are orthogonal, there is a direct correspondence with the classical Shannon-McMillan theorem, and the quantum theory is simply a restatement of the classical result, with the associated classical source uncertainty $h_{\textbf{A}}$, (which constrains the dimensionality of the significant space ${\cal S}_{n}$), given by the Von Neumann entropy. Although we do not have equivalently full quantum forms of the Shannon Coding Theorem and its `Learning Theorem' variant, these considerations nonetheless suggest a possible `correspondence principle' generalization of the classical neural network results given in the earlier sections: Parametization of the quantum information source corresponding to a QNN must reflect the underlying structural hierarchy of the system, incorporated in the renormalization symmetry and other inherent properties of the information source. Measurement must give an appropriately parametized classical information source with appropriate renormalization and generalized Onsager properties. The parametization of the quantum information source might well be complicated, for example simultaneously involving both quantized and unquantized physical quantities: one imagines simultaneously macroscopic external signals and an array of quantum oscillators coupled by some kind of quantized field - phonons, photons, etc. Since a quantum information source is still a `language,' in the sense of the earlier sections of this work, its renormalization and generalized Onsager properties may not be simple extensions or reflections of commonly understood physical systems, but characterize, in no small part, the patterns of internal correlations defining that language -- the jointly defined grammar and syntax of the coupling of sensory signal, neural weights and array of nonlinear oscillators constituting the system: Neural networks, quantum or classical, are defined by their `meaning' even more than by their physical structure. Considerations of the various possible `natural' relations between neural architecture, learning paradigms, renormalization symmetry and generalized Onsager relations which applied to classical systems would seem appropriate to the pure quantum case as well. \begin{center} \textbf{Density matrix and path integral} \end{center} Rojdestvenski and Cottam (2000), in their explicit extension of Wallace and Wallace (1998) to physical processes, end with the following `simple' observation: \begin{quotation} ``If one takes an `evolution' equation of any system..., it may always be written in the following differential form \[ \psi(t + dt) = (1 + \mathbf{E}dt)\psi(t) \] where $\textbf{E}$ is called the `evolution operator.' If the evolution has different `channels,' i.e. \[ \mathbf{E} = \sum_{i=1}^{N_{0}}\mathbf{E}_{i}, \] then [the first equation] takes the following recursive form: \[\psi(t+mdt)=\] \[(1+\mathbf{E}dt(...(1+\mathbf{E}dt(1+\mathbf{E}dt(1+\mathbf{E}dt)))...)\psi(t)\] \[=\sum_{r=1}^{m}(dt)^{m}\sum_{C_{r}}K(C_{r})[\mathbf{E}_{i_{1}}...\mathbf{E}_{i_{r}}]\psi(t) \] and again we deal with the `sentence' representation. In a certain sense, any temporal evolution, if only it is describable by equations, is a message [from some information source] in its own right.'' \end{quotation} Behrman et al. (1996) open their description of a quantum dot neural network in a similar manner: \begin{quotation} ``In most artificial neural network implementations, the neurons receive inputs from other processors \textit{via} weighted connections and calculate an output which is passed on to other neurons. The calculated output... of the $i^{th}$ neuron [is determined from] the signals from the other neurons in the network... Similarly we can write the expression for the time evolution of the quantum mechanical state of a system: \[|\psi(x_{f},T)>=G(x_{f},T;x_{0},0)|\psi(x_{0},0)> ... \] Here $|\psi(x_{0},0)>$ is the input state, the initial state of the quantum system. $|\psi(x_{f},T)>$ is the output state, the state of the system at $t=T$. $G$ is the Green's function, which propagates the system forward in time, from initial position $x_{0}$ at time $t=0$ to final position $x_{f}$ at time $t=T$. [$G$ can be expressed] in the Feynman path integral formulation of quantum mechanics (Feynman, 1965), in which $G$ is thought of as the infinite sum over all possible paths that the system could possibly take to get from $x_{0}$ to $x_{f}$... Each path is weighted by the complex exponential of the phase contributed by that path, given by the classical action for that path;... Each of the $N$ [quantum] neurons' different possible states contribute to the final measured state; the amount it contributes can be adjusted by changing the potential energy...'' \end{quotation} Those paths with higher weighting thus have higher probability -- are `meaningful,' in our terminology -- than the others. For an `ergodic' information source such paths would be equiprobable. Using this formalism, Behrman et al. (1996) conclude that \begin{quotation} ``Potentially, a quantum neural network would be an extremely powerful computational tool... capable, at least in principle, of performing computations that cannot be done, classically... an actual working quantum neural net would likely want to take advantage of the greater multiplicity and connectivity inherent in an entire array of quantum dot molecules, by placing molecules physically close enough to each other that nearest neighbors can interact directly...'' \end{quotation} The path integral formulation of quantum density matrices (Feynman, 1998) thus seems to form the natural linkage between quantum mechanics and quantum information theory in much the same way that the Large Deviations Program of applied probability connects statistical mechanics, fluctuations and information theory in classical systems. Imposition of appropriate renormalization symmetry on the ergodic quantum information source dual to the QNN, in the context of a similarly appropriate `generalized Onsager relation' and associated algebras, would indeed seem to be the most natural means of expressing the unique architecture of the network, hierarchical or otherwise. By analogy, it seems that a Landau-like `two fluid' model of superconductivity and superfluidity is likely to apply to the general QNN, with a classical information source uncertainty playing the role of a `phonon gas excitation' of the purely quantum QNN (Feynman, 1998). It is difficult, at this point, to imagine any other outcome. We find that the work of King and Lesniewski (1995), a rigorous extension of the Shannon-McMillan Theorem to quantum systems, in conjunction with the material described in the first part of this paper, suggests a direction for development of a purely quantum neural network formalism, in contrast, for example, with the quasi-classical results of Toth et al. (1996). Quantum neural networks, like their classical counterparts, should be reducible to the convolution of sensory activity, ongoing activity `neural weights' and an array of nonlinear components into a single quantum information source parametized by continuous or quantized variates. `Tuning' the parameters and the `ongoing activity' should, as for classical systems, result in highly efficient pattern recognition, depending on the inherent grammar and syntax of the associated quantum information source: data consistent with the system's linguistic rules are recognized, others are not. The inherently parallel nature of pure quantum computation should provide some significant advantages over classical neural network pattern recognition. Quantum neural architecture should, as in the classical case, express itself in the renormalization symmetry of the dual quantum information source, its `generalized Onsager relations,' and the algebraic structure of the underlying state space. Thus, for a certain class of QNN, high probability paths will define a quantum information source having grammar, syntax and higher order structures which will define the characteristics of the system for pattern recognition. We speculate that a quantum linguistics -- the extended algebra of $\Pi$ operators corresponding to quantized neural networks -- will be a principal growth technology for the 21st century. \begin{center} \textbf{Discussion and conclusions} \end{center} From an information theory base, we have created a very general phenomenological picture of hierarchical neural structure and process with several distinct pieces which can be modified and assembled in different ways. The approach is recognizably similar to the macroscopic spring-weight-and-dashpot models which 19th century physicists used successfully to predict a surprisingly large part of the subtle viscoelastic behaviors of materials without necessity of a detailed reductionist understanding of their microscopic structure. That picture can be summarized as follows: (1) Systems undergoing `information resonance,' as we have described it, including but not limited to neural process, are characterized by an inherent information source and its associated language. Paths in state space which are not consistent with the grammar and syntax of that language do not trigger `fundamental events.' (2) The underlying space of the information source may well have a structure much like the division of the state space of a nonlinear dynamic system into domains of attraction: states connectable by `meaningful' paths may form disjoint equivalence classes, a reflexive, symmetric and transitive algebraic relation between them. Existence of an inverse mapping imposes a group structure of finite order, possibly leading to exceedingly complex `multitasking' ability, related to the prime number partitioning of the group size. Existence of an order relation between regimes imposes a `natural' hierarchical multitasking structure in which there is a tradeoff between pattern recognition capacity and complexity of functional dynamics. (3) Distributed systems subject to information resonance, including neural structures, are likely to undergo precipitate phase transitions. Indeed, neural architecture -- in a large sense -- may well be, at least in part, characterizable by the renormalization symmetry which describes the behavior of the system near such phase transition. Renormalization symmetries of such structures are not necessarily those of simple physical systems. (4) Systems subject to information resonance are likely to have a `thermodynamics,' an `equation of state' derived from the source uncertainty of the defining language by imposition of a Legendre transform, and associated `generalized Onsager relations' describing the role of architecture, though the disorder construct, in driving system dynamics. The generalized Onsager relations are, again, unlikely to be constrained to simple physical analogs. (5) We have described learning paradigms for arrays undergoing information resonance in terms of an information-theoretic `tuning' which may permit more efficient pattern detection than simple least-squares or `infomax' treatments which do not utilize the internal syntactic structures of the incoming signal constituting the pattern to be recognized. It is this tuning, in fact, which allows us to speak of information resonance. (6) We have explored `coevolutionary,' in the large sense and order-relation-based interactions between disparate information sources which may permit a `natural' hierarchical and/or multitasking nesting of function in arrays undergoing information resonance, including neural networks. Many other linking mechanisms seem possible for building larger from smaller parametized information sources. A number of research questions emerge from these discussions, several in particular concerning the relations between the points above. These are, in essence, a search for `natural' arrangements of our phenomenological building blocks. First, the algebra of state space partitioning seems intimately related to important `language' structures, perhaps even the question of coevolutionary (in the large sense) hierarchy and nesting. If that algebra goes beyond a single reflexive, symmetric, and transitive relation, then hierarchy seems implicit: envision the additive structure of the integers supplemented first by multiplication and division to give fractions, and then extended to real and complex numbers. Any integer is at the same time a rational, a real and a complex number. On the other hand, an integer is either even or odd, and the integers modulo two form a group of order two. That is, there is evidently some relation between state space partitioning and algebra, and language grammar and hierarchy for systems undergoing information resonance, including neural structures. Explicating that relation seems an important topic for further research, particularly in view of the possibility of multitasking inherent in any disjoint partition. Imposition of an inverse mapping to give a group structure further opens the multitasking vista since markedly different groups may have the same order, depending, roughly, on its prime partition. This set of questions is related to a second point: What are the relations between renormalization properties, order relations, hierarchy and neural architecture? The simple renormalization relation we chose for a two-stage hierarchy had `language richness,' and presumably computing capacity, growing as some power of the clumping parameter, $\propto R^{D}$ according to equation (13). What happens to order-relation-nested hierarchical systems undergoing phase transition? What kind of renormalization is appropriate for higher iterations of hierarchy? Can renormalization symmetries be nested along with structural hierarchy? Recent work in landscape ecology (Richie and Olff, 1999; Milne, 1992) suggests that hierarchically nested structures may have nested scaling laws. That is, if the structures are nested and follow respective scaling laws $\propto R^{D}, R^{F}, R^{Q}$, then necessarily \[ D \geq F \geq Q. \] This suggests that renormalization properties and state space algebra may both constrain architecture. What other forms of renormalization symmetry are appropriate for information sources, in particular information resonances? Equation (13) is taken by `abduction' from physical systems. Other qualitatively different expressions may be possible and necessary. How are the two points above related to the generalized Onsager relations which define the response of the system the Legendre transform of the source uncertainty of the underlying language? Onsager relations are serious business for technological applications; how are they influenced by neural architecture, as we have characterized it, in particular hierarchy and coevolutionary condensation or fragmentation? Specification of state space algebra, renormalization symmetry and generalized Onsager relations may, in fact, be equivalent to specification of architecture. Is there indeed a formal `tuning theorem' inverse to the Shannon Coding Theorem for systems undergoing information resonance which would allow use of the internal correlations or other structures of scanned input signals to greatly improve the efficiency of learning paradigms for neural networks? Does `error-free' pattern recognition loom for appropriate input rates? The discussion thus far has been in terms of fairly abstract structures. Can explicit application be made to some of the current neural models? For example, what is the relation between this work and currently popular spin glass and other models of neural networks? If, however, it is not possible to identify explicit `language' structures underlying or associated with these models, does that not, perhaps, suggest a serious weakness to those approaches? This conjecture leads to the next point: How can the development be generalized to non-ergodic information sources? The discussion in Appendix 2 indicates that the primary difference between the two kinds of information sources is that the limit $H[\textbf{X}]$ exists for all sources, but is independent of path $x=a_{0}, a_{1}, ... a_{n}, ... $ only for ergodic sources. What can be done otherwise? Three lines of attack seem obvious. If the underlying state space can indeed be partitioned into disjoint equivalence classes of meaningful paths, then we may break the system into mutually disjoint information sources, and proceed as above, much like working with separate domains of attraction in a nonlinear system. We will call such a system `disjointly' ergodic. Imposition of an order relation gives a `natural' induced information source. A second way of proceeding is `locally,' i.e. imposing a `manifold' structure on the underlying state space, the collection of paths $x$. That is, we assume a topology for the state space such that each path $x$ has an open neighborhood which can be mapped by a homeomorphism onto a reference state space which has an ergodic information source. With appropriate topology, each open covering of the state space has a finite sub covering which patches the thing together in the standard manner (e.g. Sternberg, 1964; Thirring, 1992). This differential geometry approach is recognizably similar to, but would seem to generalize, use of an `information metric' to derive asymptotic statistical results (Amari, 1982; Kass, 1989). A somewhat different attack is to suppose that, given one particular highly probable path, $x_{0}$, we can reasonably define the source uncertainty associated with nearby paths in terms of their distance from $x_{0}$. Let $x = x_{0} + \delta x$, where $\delta x$ represents a `small variation,' and make the usual formal series expansion near $H(x_{0}) \equiv H_{0}$ in terms of a generalized derivative: \[ H(x) = H_{0} + \delta H(x) \approx H_{0} + \frac{\delta H}{\delta x} \delta x, \] where we assume $\delta H \ll H_{0}$. We might well call such a system `nearly' ergodic. Extension of our results to `slightly less than' ergodic information sources appears direct. It may well be, however, to expand the suggestion above, that arrays undergoing `ergodic' information resonance, i.e. in duality with an ergodic information source, are of great interest precisely because of their intimate relation to `language,' in the large sense, and if the most popular neural network and information resonance array models do not have an inherent association with language, this may well be a serious deficiency of current approaches. We suggest that language is utterly fundamental to any realistic understanding of neural process, and that treatments without direct involvement of language are missing the forest and, indeed, most of the trees. The next question is implied by those previous: Suppose we specify the grammar and syntax of the underlying language, the associated state-space algebra, the nested renormalization or hierarchical order symmetry defining behavior at phase transition, and the generalized Onsager relations giving more subtle behaviors. Do these, then, specify an optimal architecture and learning paradigm? That is, can we create an algorithmic `neural compiler' which will spit out `optimal' circuit diagrams, given an appropriate specification of desired behaviors? A principal outcome of our `ordered hierarchy' analysis is the inference of a tradeoff between the capacity of an uncorrelated parallel multitasking structure for pattern recognition, and the more complicated behavioral dynamics possible for a hierarchically ordered system. This may well be a far deeper result than our specialized treatment might indicate. Pastor et al. (2000) have recently proposed a purely algebraic phenomenological model for information processing in large-scale cerebral networks. It seems likely our results have some relation to theirs. Finally, the quantum generalization we have proposed seems worthy of further exploration, particularly the search for a `two fluid' model, although the current state of quantum information theory remains a serious constraint. While some of these matters can be addressed, if not quite really answered, using the kinds of specific case-history models popular with physicists, more formal treatments seem necessary, if not precisely our program, then something recognizably similar. In sum, the information resonance approach represents, in our view, a theoretical advance which could well translate into a broad spectrum of significant technology. \begin{center} \textbf{Appendix 1: `Large Deviations' and entropy} \end{center} We can place our development in the context of `large deviations' as follows (Dembo and Zeitouni, 1998, p.2): Let $X_{1}, X_{2},... X_{n}$ be a sequence of independent, standard Normal, real-valued random variables and let \begin{equation} \[ S_{n} = \frac{1}{n} \sum_{j=1}^{n}X_{j}. \] \end{equation} Since $S_{n}$ is again a Normal random variable with zero mean and variance $1/n$, for all $\delta >0$ \begin{equation} \[ \lim_{n \rightarrow \infty} P(|S_{n}| \geq \delta)=0,\] \end{equation} where $P$ is the probability that the absolute value of $S_{n}$ is greater or equal to $\delta$. Some manipulation, however, gives \begin{equation} \[ P(|S_{n}| \geq \delta) = 1 - \frac{1}{\sqrt{2}\pi}\int_{-\delta \sqrt{n}}^{\delta \sqrt{n}} \exp(-x^2/2) dx, \] \end{equation} so that \begin{equation} \[ \lim_{n \rightarrow \infty} \frac{\log P(|S_{n}| \geq \delta)}{n} = -\delta^2/2\.\] \end{equation} We can rewrite this for large $n$ as \begin{equation} \[ P(|S_{n}| \geq \delta) \approx \exp(-n\delta^2/2).\] \end{equation} That is, for large $n$, the probability of a large deviation in $S_{n}$ follows something much like what follows from equation (2), i.e. that meaningful paths of length $n$ all have approximately the same probability $P(n) \propto \exp(-n H[\mathbf{X}])$. Our questions about `meaningful paths' thus appear suddenly as formally isomorphic to one of the central developments in an emerging sector of applied probability termed `large deviation theory,' which encompasses statistical mechanics, what the physicists call fluctuation theory, and information theory into a single structure (Dembo and Zeitouni, 1998). A cardinal tenet of large deviation theory is that the `rate function' $-\delta^2/2$ in equation (49) can often be expressed as a mathematical `entropy' having the form \begin{equation} \[ -\sum_{k}p_{k}\log p_{k}, \] \end{equation} for some set of probabilities $p_{k}$. This result goes under various names at various levels of approximation -- Sanov's Theorem, Cramer's Theorem, the Gartner-Ellis Theorem, the Shannon-McMillan Theorem, and so on (Dembo and Zeitouni, 1998). \begin{center} \textbf{Appendix 2: Ergodic and non-ergodic information sources} \end{center} Following the treatment of Cover and Thomas, (1991, p. 474), the Shannon-McMillan Theorem on which we have based our analysis is predicated on having a stationary ergodic information source -- one whose long-time pattern of emitted symbols follows the strong law of large numbers. An ergodic source is defined on some probability space $(\Omega, \mathcal{B}, \mu)$, where $\mathcal{B}$ is a sigma algebra of subsets of the space $\Omega$ and $\mu$ is a probability measure. A random variable $X$ is defined as a function $X(\omega), \omega \in \Omega$, on the probability space. There is also a time translation operator, $T:\Omega \rightarrow \Omega$. Let $\mu$ be the probability measure of a set $A \in \mathcal{B}$. Then the transformation is \textit{stationary} if $\mu(TA)=\mu(A)$ for all $A \in \Omega$. The transformation is \textit{ergodic} if every set $A$ such that $TA=A$ almost everywhere satisfies $\mu(A)=0$ or $1$. That is, almost everything flows. If the transformation $T$ is stationary and ergodic, we call the process defined by $X_{n}(\omega)=X(T^{n}\omega)$ stationary and ergodic. For a stationary ergodic source with a finite expected value, the Ergodic Theorem concludes that \[\frac{1}{n}\sum_{i=1}^{n}X_{i}(\omega) \rightarrow E(X) = \int X d\mu \] with probability $1$. This is the generalized law of large numbers for ergodic processes: the arithmetic mean in time converges to the mathematical expectation in `space.' Beginning here, after some considerable mathematical travail, the Shannon-McMillan Theorem, as we have described it, follows (Khinchine, 1957; Petersen, 1995; Cover and Thomas, 1991). The essential point is that for a stationary, ergodic information source the limit \[H[\mathbf{X}] = \lim_{n \rightarrow \infty} \frac{H[X_{0}, ... X_{n}]}{n+1} \] not only exists, but \textit{is independent of path}. That is, as $x=a_{0},...,a_{n}$ gets longer and longer, all paths converge to the same value of $H[\mathbf{X}]$ regardless of their origin or meandering. This is the fundamental information theory simplification, onto which we have imposed parametization and on which we have further grafted invariance under renormalization at phase transition as an expression of architecture. A careful reading of the proof to the Shannon-McMillan Theorem (Khinchine, 1957; Petersen, 1995) shows that non-ergodic information sources still converge to some value $\lim_{n \rightarrow \infty} H(x)$, where $x$ is a path of increasing length, but the value $H(x)$ is now \textit{path dependent}. That is, each increasing path $x$ converges to its own value of $H(x)$, depending, thus, on both the overall `language' and on the particular path chosen. \begin{center} \textbf{Acknowledgments} \end{center} This work benefited from support under an Investigator Award in Health Policy Research given by the Robert Wood Johnson Foundation and under NIEHS Grant 1-P50-ES09600-02. \begin{center} \textbf{References} \end{center} Amari S, 1982, ``Differential geometry of curved exponential families -- curvature and information loss,'' \textit{Annals of Statistics}, \textbf{10}, 357-387. Ash R, 1990, \textit{Information Theory}, Dover, New York. Behrman E, J Niemel, J Steck and S Skinner, 1996, ``A quantum dot neural network,'' \textit{Proceedings of the Workshop on Physics of Computation}, New England Complex Systems Institute, Cambridge, MA, pp. 22-24. Bennett C, 1988, ``Logical depth and physical complexity.'' In \textit{The Universal Turing Machine: A Half-Century Survey}, R Herkin ed., pp. 227-257, Oxford University Press, Oxford. Binney J, N Dowrick, A Fisher and M Newman, 1995, \textit{The theory of critical phenomena; An introduction to the renormalization group}, Oxford Science Publications, Oxford. Boyd R and P Richerson, 1985, \textit{Culture and Evolutionary Theory}, University of Chicago Press, Chicago. Braiman Y, J Linder and W Ditto, 1995, ``Taming spatiotemporal chaos with disorder,'' \textit{Nature}, \textbf{378}, 465. Caswell H, 1999, \textit{Matrix Population Models}, Sinaur Associates, New York. Cavalli-Sforza L and M Feldman, 1981, \textit{Cultural Transmission and Evolution: A Quantitative Approach}, Monographs in Population Biology, 16, Princeton University Press, Princeton, NJ. Cover T and J Thomas, 1991, \textit{Elements of Information Theory}, John Wiley and Sons, New York. Cramer H, 1938, ``Sur un nouveau theoreme-limite de la theorie des probabilities,'' in \textit{Actualities Scientifiques et Industrielles}, No. 736 in Colloque consacre al la theorie des probabilities, pp. 5-23, Hermann, Paris. Deco G and D Obradovic, 1996, \textit{An Information-Theoretic Approach to Neural Computing}, Springer-Verlag, New York. Deco G and B Schurmann, 1998, ``Stochastic resonance in the mutual information between input and output spike trains of noisy central neurons,'' \textit{Physica D}, \textbf{117}, 276-282. Dembo A, O Zeitouni, 1998, \textit{Large Deviations: Techniques and Applications, 2nd. Ed.}, Springer-Verlag, New York. Durham W, 1991, \textit{Coevolution: Genes, Culture and Human Diversity}, Stanford University Press, Palo Alto, CA. Dykman M, D Luchinsky, P McClintock and V Smelyansky, 1996, ``Corrals and critical behavior of the distribution of fluctuational paths,'' \textit{Physical Review Letters}, \textbf{77}, 5229-5232. Ellis R, 1985, \textit{Large Deviations and Statistical Mechanics}, Springer-Verlag, New York. Feller W, 1977, \textit{An Introduction to Probability Theory and its Applications, Vol. II}, second edition, John Wiley and Sons, New York. Feynman R and A Hibbs, 1965, \textit{Quantum Mechanics and Path Integrals}, McGraw-Hill, New York, NY. Feynman R, 1998, \textit{Statistical Mechanics}, Perseus Books, Reading, MA. Freidlin M and A Wentzell, 1998, \textit{Random Perturbations of Dynamical Systems}, Springer-Verlag, New York. Gammaitoni A, P Hanggi, P Jung and F Marchesoni, 1998, ``Stochastic resonance,'' \textit{Reviews of Modern Physics}, \textbf{70}, 223-287. Godivier X and F Chapeau-Blondeau, 1998, ``Stochastic resonance in the information capacity of a nonlinear dynamic system,'' \textit{International Journal of Bifurcation and Chaos}, \textbf{8}, 581-589. Granovetter M, 1973, ``The strength of weak ties,'' \textit{American Journal of Sociology}, \textbf{78}, 1360-1380. Griffiths R, 1972, ``Rigorous results and theorems'' in \textit{Phase Transitions and Critical Phenomena}, C Domb and M Green, eds., Academic Press, London. Heneghan C, C Chow, J Collins, T Imhoff, S Lowen and M Teich, 1996, ``Information measures quantifying aperiodic stochastic resonance,'' \textit{Physical Review A}, \textbf{54}, 2366-2377. Holevo A, 1973, ``Some estimates for information quantity transmitted by quantum communication channels,'' \textit{Problems of Information Transmission}, \textbf{9}, 177-183. Holevo A, 1998, ``Coding theorems for Quantum Channels,'' xyz.lanl.gov/quant-ph/9809023. Ives A, 1995, ``Measuring resilience in stochastic systems,'' \textit{Ecological Monographs}, \textbf{65} 217-233. Kass R, 1989, ``The geometry of asymptotic inference,'' \textit{Statistical Science}, \textbf{4}, 188-234. Khinchine A, 1957, \textit{The Mathematical Foundations of Information Theory}, Dover, New York. King C and A Lesniewski, 1995, ``Quantum sources and a quantum coding theorem,'' xxx.LANL.gov quant-phy 9511019. Kadtke J and A Bulsara, 1997, \textit{Applied Nonlinear Dynamics and Stochastic Systems Near the Millenium}, AIP Conference Proceedings, American Institute of Physics, New York. Linder J, B Meadows and W Ditto, 1995, ``Array enhanced stochastic resonance and spatiotemporal synchronization,'' \textit{Physical Review Letters}, \textbf{75}, 3-6. Linder J, B Meadows, W Ditto, M Inchiosa and A Bulsara, 1996, ``Scaling laws for spatiotemporal synchronization and array enhanced stochastic resonance,'' \textit{Physical Review A}, \textbf{53}, 2081-2086. Luchnisky D, 1997, ``On the nature of large fluctuations in equilibrium systems: observations of an optimal force,'' \textit{Journal of Physics A Letters}, \textbf{30}, L577-583. McCauley L, 1993, \textit{Chaos, Dynamics and Fractals: an algorithmic approach to deterministic chaos}, Cambridge University Press, Cambridge. McClintock and D Luchinsky, 1999, ``Glorious noise,'' \textit{The New Scientist}, \textbf{161}, No. 2168, January, 36-39. Milne B, 1992, ``Spatial aggregation and neutral models in fractal landscapes,'' \textit{American Naturalist}, \textbf{139}, 32-57. Neiman A, B Shulgin, V Anishchenko, W Ebeling, L Schimansky-Geier and J Freund, ``Dynamical entropies applied to stochastic resonance,'' \textit{Physical Review Letters}, \textbf{76}, 4299-4302. Nieman A, B Shulgin, V Anishchenko, W Ebeling, L Schimansky-Gier and J Freund, ``Correction,'' \textit{Physical Review Letters}, \textbf{77}, 4851. Onsager L and S Machlup, 1953, ``Fluctuations and irreversible processes,'' \textit{Physical Review}, \textbf{91}, 1501-1512. Pastor J, M Lafon, L Trave-Massuyes, J Demonet, B Doyon and P Celsis, 2000, ``Information processing in large-scale cerebral networks: the causal connectivity approach, \textit{Biological Cybernetics}, \textbf{82}, 49-59. Petersen K, 1995, \textit{Ergodic Theory}, Cambridge University Press, Cambridge, UK. Ritchie M and H Olff, 1999, ``Spatial scaling laws yield a synthetic theory of biodiversity,'' \textit{Nature}, \textbf{400}, 557-560. Rojdestvenski I and M Cottam, 2000, ``Mapping of statistical physics to information theory with applications to biological systems,'' \textit{J. Theor. Biol.}, \textbf{202}, 43-54. Schimansky-Gier L, J Freund, U Siewert and A Nieman, 1996, ``Stochastic resonance: informational aspects and distributed systems,'' ICND-96 book of abstracts. Schumacher B, 1995, ``Quantum Coding,'' \textit{Physical Review A}, \textbf{51}, 2738-2747. Schumacher B, 1996, ``Sending entanglement through noisy quantum channels,'' \textit{Physical Review A}, \textbf{55}, 2614-2628. Sternberg S, 1964, \textit{Lectures on Differential Geometry}, Prentice-Hall, New York. Thrring W, 1991, \textit{Classical Dynamical Systems and Classical Field Theory}, 2nd. ed., Springer-Verlag, New York. Toth G, C Lent, P Tougaw, Y Brazhnik, W Weng, W Porod, R Liu and Y Huang, 1996, ``Quantum cellular neural networks,'' \textit{Superlattices and Microstructures}, \textbf{20}, 473-478. Wallace R, Y Huang, P Gould and D Wallace, 1997, ``The hierarchical diffusion of AIDS and violent crime among US metropolitan regions: inner-city decay, stochastic resonance and reversal of the mortality transition,'' \textit{Social Science and Medicine}, \textbf{44} 935-947. Wallace R and RG Wallace, 1998, ``Information theory, scaling laws and the thermodynamics of evolution,'' \textit{Journal of Theoretical Biology}, \textbf{192}, 545-559. Wallace R and RG Wallace, 1999, ``Organisms, organizations and interactions: An information theory approach to biocultural evolution,'' \textit{BioSystems}, \textbf{51}, 101-119. Wallace R and J Ullmann, 1999, ``Pentagon capitalism and the killing of the Red Queen: How the US lost the coevolutionary arms race between civilian firms and technology.'' Submitted. Wallace R, 2000a, ``Language and coherent neural amplification in hierarchical systems: Renormalization and the dual information source of a generalized spatiotemporal stochastic resonance,'' \textit{International Journal of Bifurcation and Chaos}, \textbf{10}, 493-502.. Wallace R, 2000b , ``Language and coherent neural amplification in hierarchical systems: `Thermodynamics,' generalized Onsager relations, and the detection of `abnormal' pattern.'' Submitted. Wallace R, 2000c, ``Quantum linguistics: information theory and quantum neural networks.'' Submitted. Wallace R and M Fullilove, 1999, ``Culturally-dependent canonical patterns of mental disorder in the context of socioeconomic stress: the mathematical epidemiology of madness.'' Submitted. Wilson K, 1971, ``Renormalization group and critical phenomena. I Renormalization group and the Kadanoff scaling picture,'' \textit{Physical Review B}, \textbf{4}, 3174-3183. Zurek W, 1985, ``Cosmological experiments in superfluid helium?'' \textit{Nature}, \textbf{317}, 505-508. Zurek W, 1996, ``The shards of broken symmetry,'' \textit{Nature}, \textbf{382}, 296-298. \end{document} ---------------0004191419890--