\documentstyle[12pt]{article}
\textheight=25.7cm
\textwidth=15.6cm
\topmargin=-20mm
\oddsidemargin=0mm
\pagestyle{heading}
\renewcommand{\theequation}{\arabic{section}.\arabic{equation}}
\renewcommand{\baselinestretch}{1}
\newcommand{\ra}{\rightarrow}
\newcommand{\bra}{\langle}
\newcommand{\ket}{\rangle}
\newcommand{\be}{\begin{equation}}
\newcommand{\ee}{\end{equation}}
\newcommand{\bea}{\begin{eqnarray}}
\newcommand{\eea}{\end{eqnarray}}
\newcommand{\ds}{\displaystyle}
\newtheorem{lem}{Lemma}[section]
\newtheorem{thm}{Theorem}[section]
\newtheorem{cor}{Corollary}[section]
\renewcommand{\thefootnote}{\alph{footnote}}
\catcode`\@=11
\font\fivmsy=msbm5
\font\sevmsy=msbm7
\font\tenmsy=msbm10
\font\hfivmsy=msbm5\@halfmag
\font\hsevmsy=msbm7\@halfmag
\font\htenmsy=msbm10\@halfmag
\font\bfivmsy=msbm5\@magscale1
\font\bsevmsy=msbm7\@magscale1
\font\btenmsy=msbm10\@magscale1
\newfam\msyfam
\textfont\msyfam=\tenmsy \scriptfont\msyfam=\sevmsy
\scriptscriptfont\msyfam=\fivmsy
\def\smallBbb#1{\fam\msyfam #1}%
\newfam\hmsyfam
\textfont\hmsyfam=\htenmsy \scriptfont\hmsyfam=\hsevmsy
\scriptscriptfont\hmsyfam=\hfivmsy
\def\halfBbb#1{\fam\hmsyfam #1}%
\newfam\bmsyfam
\textfont\bmsyfam=\btenmsy \scriptfont\bmsyfam=\bsevmsy
\scriptscriptfont\bmsyfam=\bfivmsy
\def\bigBbb#1{\fam\bmsyfam #1}%
%\let\Bbb\smallBbb %%%%for 10pt documents
%\let\Bbb\halfBbb %%%%for 11pt documents
\let\Bbb\bigBbb %%%%for 12pt documents
\catcode`\@=12
\def\M{{\Bbb M}}
\def\R{{\Bbb R}}
\def\N{{\Bbb N}}
\def\Z{{\Bbb Z}}
\begin{document}
\null
\vspace{4cm}\noindent
{\bf
LARGE DEVIATIONS AND THE THERMODYNAMIC FORMALISM:\\
A NEW PROOF OF THE EQUIVALENCE OF ENSEMBLES\footnotemark}
\footnotetext{Lecture delivered by J.T. Lewis}
\\ \\ \\
J.T. Lewis\footnotemark[2], C.--E. Pfister\footnotemark[3], W.G. Sullivan
\footnotemark[2],\footnotemark[4]
\\ \\
\hspace*{2.6cm}\setcounter{footnote}{1}
\footnotemark Dublin Institute for Advanced Studies\newline\hspace*{2.6cm}
10 Burlington Road\newline\hspace*{2.6cm}
Dublin 4, Ireland\newline\hspace*{2.6cm}
\footnotemark Ecole Polytechnique F\'ed\'erale de Lausanne\newline
\hspace*{2.6cm}
D\'epartement de Math\'ematiques\newline\hspace*{2.6cm}
CH--1015 Lausanne Switzerland\newline\hspace*{2.6cm}
\footnotemark University College\newline\hspace*{2.6cm}
Department of Mathematics\newline\hspace*{2.6cm}
Belfield, Dublin 4, Ireland\hspace*{2.6cm}
\renewcommand{\thefootnote}{\arabic{footnote}}
\setcounter{footnote}{0}
\\\\
\section{The Equivalence of Ensembles}
In statistical mechanics the problem of the equivalence of ensembles
goes back to Boltzmann and Gibbs. Here it is the problem of proving that,
in the thermodynamic limit, the microcanonical measures and the grand
canonical measures are equivalent; making precise the meaning
of ``equivalent'' is part of the problem. It is commonly believed that in
good statistical mechanical models such an equivalence holds, even
in the presence of a phase--transition. On the other hand,
it is believed that equivalence of ensembles fails in mean--field
models such as the Curie--Weiss model.
There is a second statement which is also known as the equivalence of
ensembles: in the thermodynamic limit, the negative of the entropy
and the pressure are conjugate functions in the sense of convexity theory.
In statistical mechanics, the entropy function is
defined directly in the microcanonical setting and the pressure in
the grand canonical setting. We refer to this statement as the
equivalence of ensembles at the level of thermodynamic functions.
This form of the equivalence of ensembles is known to hold for good statistical
models and to fail for mean--field models. One version of our main
result may be stated roughly as: {\it equivalence of ensembles
holds at the level of measures whenever it holds at the
level of thermodynamic functions.}
The problem of the equivalence of ensembles is not confined to
statistical mechanics; it can be found in other areas of applied
probability theory -- in information theory,
for example. Here the problem is to prove that a sequence of conditioned
measures is
equivalent, in an appropriate sense, to a sequence of ``tilted'' measures.
Our choice of setting is sufficiently general to cover such applications.
Probabilistic methods have been used for at least fifty years to prove
results about the equivalence of ensembles: Khinchin (1943) used a
local central limit theorem to prove it for a classical ideal
(non--interacting) gas; Dobrushin and Tirozzi (1977) proved it for
lattice gas models for which they were able to
establish a local central limit theorem -- a restriction which
ruled--out models which exhibit first--order phase transitions.
Typically, local central limit theorems hold on the scale of the
square--root of the volume. The right scale for the investigation
of the equivalence of ensembles, however, turns out to be that of the
volume itself; this is the scale on which a large deviation
principles hold. Deuschel et al.
(1991) and Georgii (1993) used a large deviation
principle for empirical measures to prove the equivalence of ensembles.
One draw--back with this approach is that it is technically difficult:
since it involves measures on a space of measures, there are subtle
points to be settled. Another is that
the connection with thermodynamic functions is obscured. Our approach
is more elementary
and direct: we go back to the common origin of large deviation
theory and statistical
mechanics, the Principle of the Largest Term, and prove a result
about the specific
information gain of a sequence of conditioned measures with respect to
a sequence of tilted
measures. This is a ``soft'' theorem -- it uses nothing deeper
than the order--completeness
of the reals, but it has a wide applicability. For non--interacting systems,
the equivalence of
ensembles for measures then follows from an inequality relating the
information gain
${\cal H}(\mu|\nu)$ of $\mu$ with respect to $\nu$ to the total
variation norm $\|\cdot \|_{TV}$
of the difference of the two measures:
\be\label{1.1}
2{\cal H}(\mu|\nu)\geq \|\mu-\nu\|_{TV}^2.
\ee
For interacting systems, our ``soft'' theorem has to be supplemented
by a ``hard'' theorem,
proved using the combinational devices introduced in Sullivan (1973) and
perfected by Preston (1976); using it, we prove the equivalence of ensembles
at the level of measures for a lattice gas with translation invariant
summable potentials.
In order to state this result precisely, we have to describe this
setting in detail; this
we do in ${\cal \char 120}$ 2. In ${\cal \char 120}$ 3 we discuss the
Principle of the Largest Term
and its consequences, sketching the proof of our ``soft'' theorem. In
${\cal \char 120}$ 4,
we give an application to the non--interacting case. In ${\cal \char 120}$
5, we state precisely the general result for the lattice gas.
Detailed proofs will be
published elsewhere.
\section { Conditioning and Tilting}
\setcounter{equation}{0}
Let $\{(\Omega_n,{\cal F}_n,\rho_n)\}_{n\geq 1}$ be a sequence of
measure spaces; here $\rho_n$
is a positive measure referred to as the {\it reference measure}, which
may or may not be
normalized. Let $V_{\circ}:=\{V_n\in (0,\infty)\}_{n\geq 1}$ be a {\it scale},
a sequence of positive numbers diverging to $+\infty$ as $n\ra \infty$.
Typically, in the applications
to statistical mechanics, $V_n$ will be the volume of a region
$\Lambda_n$ in a Euclidean
space $\R^{d}$ or the number of lattice sites in a box $\Lambda_n$ in
an integer lattice
$\Z^d$, and $\Omega_n$ will be a configuration space associated with
$\Lambda_n$.
Let $T_{\circ}:=\{T_n:\Omega_n\ra X\}_{n\geq 1}$ be a sequence
of random variables
taking values in $X$, a closed convex subset of $E$, a locally convex
topological vector space;
we denote the Borel subsets of $X$ by ${\cal B}(X)$ and the topological dual
of $E$ by $E^*$.
In this exposition we will assume that $X$ is compact and that
$E=\R^k$ $(k\geq 1)$.
These assumptions are not necessary (for the general case, see Lewis et al.
(1993)) but they simplify the proofs and yet are adequate to cover the
applications we make to the lattice gas.
For $C\in {\cal B}(X)$ such that $0<\rho_n[T_n^{-1}C]<\infty$ for all $n$
sufficiently large,
we define the {\it conditioned measures} on ${\cal F}_n$ by
\be
\nu^{C}_{n}[d\omega]:={1_{T_n^{-1}C}(\omega)\rho_n[d\omega]\over
\rho_n[T_n^{-1}C]}\;;
\ee
for $t\in E^*$ such that
$0<\int_{\Omega_n}\exp(V_n\bra t,T_n(\omega)\ket)\rho_n[d\omega]<\infty$
for all $n$ sufficiently large, we define the {\it tilted measures} on
${\cal F}_n$ by
\be
\gamma^t_n[d\omega]:=
{\exp(V_n\bra t,T_n(\omega)\ket)\rho_n[d\omega]\over
\int_{\Omega_n}\exp(V_n\bra t,T_n(\omega')\ket)\rho_n[d\omega']}\; .
\ee
We shall compute the specific information gain $\lim_{n\ra \infty}{1\over V_n}
{\cal H}(\nu^C_n|\gamma^t_n)$; recall that ${\cal H}(\lambda_1|\lambda_2)$,
the information gain
of $\lambda_1$ with respect to $\lambda_2$, is defined by
\be
{\cal H}(\lambda_1|\lambda_2):=
\cases{\int_{\Omega_n}\ln
\displaystyle{d\lambda_1\over d\lambda_2}(\omega)\lambda_1[d\omega],
& $\lambda_1\ll \lambda_2$,\cr
+\infty ,& otherwise.\cr}
\ee
In the statistical mechanical applications, the $T_n$ are $k$--tuples
of functions such as energy--per--site and
magnetization--per--site; then $\nu^C_n$ is the microcanonical
measure conditioned on $T_n$
taking values in $C$ and $\gamma^t_n$ is the grand canonical
measure at generalized
chemical potential $t$.
Notice that both $\nu_n^C$ and $\gamma_n^t$ are absolutely
continuous with respect to the
reference measure $\rho_n$ and their densities are both functions of $T_n$;
we exploit this by using the change of variable formula in computing
the specific
information gain. Define the distribution $\M_n$ of $T_n$ under $\rho_n$ by
$\M_n:=\rho_n\circ T_n^{-1}$; we have
\bea
\nu^C_n\circ T_n^{-1}&=&\M_n[\;\cdot\;|C]:=
{\M_n[\;\cdot\;\cap C]\over \M_n[C]}\;,\\
\gamma^t_n\circ T_n^{-1}&=&\M_n^t[\;\cdot\; |X]:=
{\M_n^t[\;\cdot\;]\over \M_n^t[X]}\;,
\eea
where $\M_n^t[dx]:=\exp(V_n\bra t,x\ket)\M_n[dx]$. Thus we have
\be\label{2.6}
{\cal H}(\nu^C_n|\gamma_n^t)={\cal H}(\M_n[\;\cdot\;|C]|\M_n^t[\;\cdot\;|X])\;.
\ee
We shall see that this formula is the basic manoeuvre in our treatement;
it reduces an
integral over $\Omega_n$ to an integral over $X$ and relates the
information gain
${\cal H}(\nu^C_n|\gamma^t_n)$ to the thermodynamic functions which we
are about to
define in this setting.
\section{The Principle of the Largest Term}
\setcounter{equation}{0}
We need to examine the behaviour as $n\ra\infty$ of
the measures on $X$ defined in
${\cal \char 120}$ 2. Since the spaces $(\Omega_n,{\cal F}_n,\rho_n)$ and
the random variables $T_n$ play no part in the considerations
of this section it is best to start afresh.
Let $\M_{\circ}:=\{\M_n\}_{n\geq1}$ be a sequence of locally
finite positive measures on
${\cal B}(X)$, the Borel subsets of $X$, a
compact convex subset of $E=\R^k$. Let $V_{\circ}$
be a scale; define set--functions $m_n$, $\underline{m}$,
$\overline{m}$ on ${\cal B}(X)$:
\bea
m_n[B]&:=&{1\over V_n}\ln\M_n[B]\;,\label{3.1}\\
\underline{m}&:=&\liminf_{n\ra\infty}m_n[B]\;,\\
\overline{m}&:=&\limsup_{n\ra\infty}m_n[B]\;.
\eea
The following properties are straightforward consequences of the definitions
\bea
& & \underline{m}[B]\leq \overline{m}[B]\;\;{\rm for\; all}
\;\,B\in{\cal B}(X)\;;\\
& & \underline{m}\;\;{\rm and}\;\;\overline{m}\;\;{\rm are\; increasing\; on }
\;\; {\cal B}(X)\;.
\eea
The next property is an abstract version of the Principle of the Largest
Term, well--known in traditional accounts of statistical mechanics
(see, for example, Huang (1963)). Since
it is central to our development, we give a proof.
(For $a$, $b\in \R$, we denote the
maximum of $a$ and $b$ by $a\vee b$.)
\begin{lem}\label{lem3.1}
On ${\cal B}(X)$, we have
\be\label{3.6}
\overline{m}[B_1\cup B_2]=\overline{m}[B_1]\vee\overline{m}[B_2]\;.
\ee
\end{lem}
{\bf Proof:}
\newline\noindent
For $j=1,2$, we have
\be
\M_n[B_j]\leq\M_n[B_1\cup B_2]\leq\M_n[B_1]+\M_n[B_2]
\ee
so that
\be
\M_n[B_1]\vee\M_n[B_2]\leq\M_n[B_1\cup B_2]\leq 2\M_n[B_1]\vee \M_n[B_2]\;;
\ee
it follows that
\be\label{3.9}
\overline{m}[B_1\cup B_2]=\limsup_{n\ra\infty}(m_n[B_1]\vee m_n[B_2])\;.
\ee
But for each pair $\{a_n\}_{n\geq 1}$, $\{b_n\}_{n\geq 1}$ of sequences
of real numbers, we have
\be\label{3.10}
\limsup_{n\ra\infty}(a_n\vee b_n)=
(\limsup_{n\ra\infty}a_n)\vee(\limsup_{n\ra\infty}b_n)\;.
\ee
Thus (\ref{3.6}) follows from (\ref{3.9}) and (\ref{3.10}).\hfill$\Box$
\newline
Define functions $\underline{\mu}$, $\overline{\mu}$ on $X$ as follows:
\bea
\underline{\mu}(x)&:=&\inf_{G\ni x}\underline{m}[G]\;\;,\;\;G\;
{\rm open}\;,\label{3.11}\\
\overline{\mu}(x)&:=&\inf_{G\ni x}\overline{m}[G]\;\;,\;\;G\;
{\rm open}\;.\label{3.12}
\eea
The following properties are direct consequences of the definitions:
\bea
& &\underline{\mu}\;\;{\rm and}\;\;\overline{\mu}\;\;
{\rm are\;upper\;semicontinuous\;functions}\;;\\
& &\overline{m}[G]\geq\sup_{x\in G}\overline{\mu}(x)\;,\;\;G\;
{\rm open}\;,\label{3.14}\\
& &\underline{m}[G]\geq\sup_{x\in G}\underline{\mu}(x)\;,\;\;G\;
{\rm open}\;.\label{3.15}
\eea
The lower bound (\ref{3.14}) for $\overline{m}$ on open sets is rarely used; of
greater importance is the following upper bound for
$\overline{m}$ on compact sets, a
consequence of the Principle of the Largest Term (\ref{3.6})
\be\label{3.16}
\overline{m}[K]\leq\sup_{x\in K}\overline{\mu}(x)\;,\;\;K\;{\rm compact}\;.
\ee
Our first application of (\ref{3.16}) is to the
{\it concentration of measures}. Let $\M_{\circ}$
be a sequence of probability measures on ${\cal B}(X)$; if
$\M_{\circ}$ converges weakly to a
Dirac measure $\delta_{x}$ at some point $x\in X$,
we say $\M_{\circ}$ obeys a weak law of
large numbers (WLLN). In the absence of a first--order phase
transition, a WLLN holds in the
grand canonical ensemble. We
require a substitute for a WLLN which holds regardless of phase
transitions. We say that a
sequence $\M_{\circ}$ of probability measures on
${\cal B}(X)$ is {\it eventually concentrated
on a set} $A$ if, for each open neighbourhood $G$ of $A$, we have
\be
\lim_{n\ra\infty}\M_n[G]=1\;.
\ee
[ If $A=\{x\}$ and $\M_{\circ}$ is eventually concentrated on
$A$, then $\M_{\circ}$ converges
weakly to the Dirac measure $\delta_{x}$.]
We shall need the following
\begin{lem}\label{lem3.2}
Let $\M_{\circ}$ be a sequence of probability measures
on ${\cal B}(X)$ which is eventually
concentrated on a set $A$; if $f:X\ra\R$ is lower semicontinuous
and bounded below on $X$,
then
\be
\inf_{x\in A}f(x)\leq\liminf_{n\ra\infty}\int_{X}f(x)\M_n[dx]\;.
\ee
\end{lem}
[ There is an obvious complementary upper bound;
together they yields the usual characterization
of the WLLN in terms of bounded continuous functions when
$A$ reduces to a single point.]
The function $\overline{\mu}$, defined at (\ref{3.12})
for the pair $(\M_{\circ},V_{\circ})$,
enables us to determine a concentration--set for the
sequence $\M_{\circ}$. (How useful it is
depends on how well we have chosen the scale $V_{\circ}$.)
Notice that, for probability
measures, the function $\overline{\mu}$ is bounded above
by zero; in fact, it always attains
this bound and the set on which it attains it is a
concentration--set for $\M_{\circ}$.
Let $N_{\overline{\mu}}$ be the set defined by
\be\label{3.19}
N_{\overline{\mu}}:=\{x\in X:\overline{\mu}(x)=0\}
\ee
\begin{lem}\label{lem3.3}
Let $\M_{\circ}$ be a sequence of probability measures and
$V_{\circ}$ a scale. Then\newline
\hspace*{1cm}(a) $N_{\overline{\mu}}$ is compact and non--empty;\newline
\hspace*{1cm}(b) the sequence $\M_{\circ}$ is eventually
concentrated on $N_{\overline{\mu}}$.
\end{lem}
The proofs of both (a) and (b) make use of the bound (\ref{3.16})
Let $\overline{\mu}^t$, $\underline{\mu}^t$ be the upper
and lower functions determined by
the pair $(\M_{\circ}^t,V_{\circ})$; they are related to
$\overline{\mu}$ and $\underline{\mu}$
as follows:
\bea
\overline{\mu}^t(x)&=&\overline{\mu}(x)+\bra t,x\ket\,,\label{3.20}\\
\underline{\mu}^t(x)&=&\underline{\mu}(x)+\bra t,x\ket\;.\label{3.21}
\eea
These relations are a consequence of the continuity of the function
$x\mapsto \bra t,x\ket$. We
are now ready for our third application of the bound (\ref{3.16}):
we prove a special case of
Varadhan's Theorem (see Varadhan (1966)). If
$\overline{\mu}(x)=\underline{\mu}(x)$ for all
$x\in X$, we say the {\it Ruelle--Lanford function (RL--function) $\mu$
exists} for the pair
$(\M_{\circ},V_{\circ})$ and is given by
\be
\mu(x):=\underline{\mu}(x)=\overline{\mu}(x)\;.
\ee
When the RL--function exists, the bounds (\ref{3.15}) and
(\ref{3.16}) can be restated as
\bea
\overline{m}[K]&\leq&\sup_{x\in K}\mu(x)\;,\;\;K\;
{\rm compact}\;,\label{3.23}\\
\underline{m}[G]&\geq&\sup_{x\in G}\mu(x)\;,\;\;G\;{\rm open}\;;\label{3.24}
\eea
When (\ref{3.23}) and (\ref{3.24}) hold, we say
(following Varadhan (1966)) that a large deviation
principle (LDP) holds with rate--function $I=-\mu$ for
the pair $(\M_{\circ},V_{\circ})$.
This means that the sequence $m_{\circ}$ of set--functions $m_n$, defined
at (\ref{3.1}), converges to the set--function
\be\label{RL}
B\mapsto\sup_{x\in B}\mu(x)
\ee
in {\it exactly the same sense} that a sequence of probability
measures $\M_{\circ}$ converges
to a measure $\delta_{x}$ in a WLLN (remember that $X$ is
assumed to be compact). [ We have
given $\mu$ the name ``Ruelle--Lanford function'' because, in
the setting of a lattice gas with
translation--invariant summable potentials, our definition coincides
with the definition of
entropy given by Ruelle (1965) and Lanford (1973). Ruelle
and Lanford understood that
giving precise meaning to Boltzmann's formula
\be
S=k\ln W\;,
\ee
relating the entropy $S$ of a macroscopic equilibrium state
to the number $W$ of corresponding
microscopic states is the {\it same problem} as that of
making sense of the convergence of the
sequence $m_{\circ}$ to the set--function (\ref{RL}); by so
doing, they introduced a new
technique to the theory of large deviations (compare
Bahadur and Zabel (1979)).]
We are now ready to begin the calculation of the specific
information gain using (\ref{2.6}).
First we have a result which is proved using (\ref{3.23}) and (\ref{3.24}):
\begin{lem}\label{lem3.4}
Suppose the RL--function $\mu$ exists for the pair
$(\M_{\circ},V_{\circ})$ and the set
$C\in{\cal B}(X)$ is such that
\be\label{*}
-\infty<\sup_{x\in C}\mu(x)=\underline{m}[C]=\overline{m}[C]=
\sup_{x\in\overline{C}}\mu(x)\;;
\ee
then the sequence $\M_{\circ}[\;\cdot\;|C]$ of
probability measures is eventually concentrated on
the set
\be
X_{\overline{C}}:=\{x\in \overline{C}:\mu(x)=\sup_{y\in\overline{C}}\mu(y)\}\;.
\ee
\end{lem}
\begin{lem}\label{lem3.5}
Suppose that the RL--function $\mu$ exists for the pair
$(\M_{\circ},V_{\circ})$; then\newline
\hspace*{1cm}(a) the RL--function $\mu^t$ exists for the pair
$(\M_{\circ}^t,V_{\circ})$;\newline
\hspace*{1cm}(b) the pair $(\M_{\circ}^t,V_{\circ})$ obeys an LDP:
\bea
\overline{m}^t[K]&\leq&\sup_{x\in K}\mu^t(x)\;,\;\;K\;
{\rm compact}\;,\label{LDPU}\\
\underline{m}^t[G]&\geq&\sup_{x\in G}\mu^t(x)\;,\;\;G\;{\rm open}\;;\label{LDPL}
\eea
\hspace*{1cm}(c) $\mu^t$ is given by
\be
\mu^t(x)=\bra t,x\ket +\mu(x)\;.
\ee
\end{lem}
If $\overline{m}^t[X]=\underline{m}^t[X]$ for all $t\in E^*$, we say that the
{\it scaled generating function} $p$ exists for the pair
$(\M_{\circ},V_{\circ})$ and is
given by
\be\label{3.25}
p(t):=\overline{m}^t[X]=\underline{m}^t[X]\;.
\ee
(In the statistical mechanical setting, $p$ is called the
{\it grand canonical pressure.})
Recall that if $f:X\ra \overline{\R}$, then $f^*:E^*\ra\overline{\R}$ is
defined by
\be\label{3.26}
f^*(t):=\sup_{x\in X}\{\bra t,x\ket-f(x)\}\;.
\ee
\begin{cor}\label{cor3.6}
Suppose the RL--function $\mu$ exists for the pair $(\M_{\circ},V_{\circ})$;
then the
scaled generating function $p$ exists and is given by
\be\label{3.27}
p(t)=(-\mu)^*(t)
\ee
\end{cor}
{\bf Proof:}
\newline
Since $X$ is both compact and open (as a topological space), we have
\be
\sup_{x\in X}\mu^t(x)\leq\underline{m}^t[X]\leq\overline{m}^t[X]
\leq\sup_{x\in X}\mu^t(x)\;.
\ee
\hspace*{\fill} $\Box$
\newline
We define the set $X^t$ for $t\in E^*$ by
\be\label{3.30}
X^t:=\{x\in X:p(t)=\bra t,x\ket +\mu(x)\}\;.
\ee
\begin{thm}\label{thm3.7}
Suppose the RL--function $\mu$ exists for the pair
$(\M_{\circ},V_{\circ})$ and condition
(\ref{*}) holds; if $X_{\overline{C}}\subset X^t$, then the
specific information gain is zero:
\be
\lim_{n\ra\infty}{1\over V_n}{\cal H}(\nu_n^C|\gamma_n^t)=0\;.
\ee
\end{thm}
{\bf Proof:}
\newline
By (\ref{2.6}), we have
\bea
{1\over V_n}{\cal H}(\nu_n^C|\gamma_n^t)&=&{1\over V_n}{\cal H}
(\M_n[\;\cdot\;|C]|\M_n^t[\;\cdot\;|X])\\
&=&-\int\bra t,x\ket\M_n[dy|C]+m_n^t[X]-m_n[C]\;.\nonumber
\eea
By Lemmas \ref{lem3.4}, \ref{lem3.2}, Corollary
\ref{cor3.6} and condition (\ref{*}), we have
\bea
0\leq\limsup_{n\ra\infty}{1\over V_n}{\cal H}(\nu_n^C|\gamma_n^t)&\leq&
-\inf_{y\in X_{\overline{C}}}\bra t,x\ket +p(t)-
\sup_{y\in X_{\overline{C}}}\mu(y)\\
&=&\sup_{y\in X_{\overline{C}}}\{ p(t)-\bra t,y\ket-\mu(y) \}\nonumber\\
&=& 0\nonumber
\eea
if $X_{\overline{C}}\subset X^t$.\hspace*{\fill} $\Box$
\section{An Application}
\setcounter{equation}{0}
To illustrate how Theorem \ref{thm3.7} may be applied,
we consider a case of sums of independent
identically distributed random variables. We set
$\Lambda_n:=\{1,\ldots ,n\}$, and in this example
$V_n:=|\Lambda_n|=n$, $\Omega_n:=\{0,1\}^{\Lambda_n}$,
${\cal F}_n:={\cal P}(\Omega_n)$. For
$\omega\in\Omega_n$, put $\xi_j(\omega):=\omega(j)$, $j\in\Lambda_n$, and set
$\rho_n[\xi_j=0]={1\over 2}=\rho_n[\xi_j=1]$.
Then $T_n:=V_n^{-1}\sum_{j\in\Lambda_n}\xi_j$,
$X:=[0,1]$, $E:=\R=E^*$. Define $s:X\ra [0,1]$ by
\be
s(x)=-x\ln x-(1-x)\ln(1-x)\;,\;x\in(0,1)\;, \;s(0)=s(1)=0\;.
\ee
Choose $C=(c_1,c_2)\subset [0,1]$; the RL--function $\mu$ exists for the pair
$(\M_{\circ},V_{\circ})$ and is given by
\be
\mu(x)=s(x)-\ln 2\;;
\ee
the set $X_{\overline{C}}=\{x^*\}$ where
\be
x^*=\cases{c_1,& ${1\over 2}\leq c_1$,\cr
{1\over 2},& $c_1<{1\over 2}< c_2$,\cr
c_2, & $c_2\leq {1\over 2}$;\cr}
\ee
$p$ is given by
\be
p(t)=\ln(1+{\rm e}^t)-\ln 2\;;
\ee
and the set $X^t=\{x_t\}$ where
\be
x_t=p'(t)={{\rm e}^t\over 1+{\rm e}^t}\;.
\ee
Given $C$, we can find $t^*$ such that $X_{\overline{C}}=X^{t^*}$; thus we have
\be\label{4.1}
\lim_{n\ra\infty}{1\over V_n}{\cal H}(\nu_n^C|\gamma_n^{t^*})=0
\ee
We can use (\ref{4.1}) to obtain a result on the limit of the sequence
$\{\nu_{n,\Delta}^{C}\}_{n\geq 1}$, where
$\nu_{n,\Delta}^{C}$ is the restriction to a finite
subset $\Delta$ of $\N$. Notice that $\gamma_n^t$ is
a product measure; this has two
important consequences:
\begin{enumerate}
\item the restriction of $\gamma^t_n$ to $\Delta\subset \{1,\ldots ,n\}$
is independent
of $n$ and we denote it by $\gamma^t_{\Delta}$;
\item if $\Delta_1$ and $\Delta_2$ are disjoint copies of $\Delta$ such that
$\Delta_1\cup\Delta_2\subset\{1,\ldots ,n\}$, then
\end{enumerate}
\be
{\cal H}(\nu^C_{n,\Delta_1\cup\Delta_2}|\gamma^t_{\Delta_1\cup\Delta_2})\geq
{\cal H}(\nu^C_{n,\Delta_1}|\gamma^t_{\Delta_1})+
{\cal H}(\nu^C_{n,\Delta_2}|\gamma^t_{\Delta_2})\;.
\ee
But
\be\label{4.3}
{\cal H}(\nu^C_{n,\Delta_1}|\gamma^t_{\Delta_1})={\cal H}
(\nu^C_{n,\Delta_2}|\gamma^t_{\Delta_2})\;,
\ee
so that
\be
{\cal H}(\nu^C_{n}|\gamma^t_{n})\geq \left [{V_n\over |\Delta|}\right ]
{\cal H}(\nu^C_{n,\Delta}|\gamma^t_{\Delta})\;;
\ee
hence ({\ref{4.1}}) implies that
\be
\lim_{n\ra\infty}{\cal H}(\nu^C_{n,\Delta}|\gamma^{t^*}_{\Delta})=0\;.
\ee
It now follows from (\ref{1.1}) that $\{\nu^{C}_{n,\Delta}\}_{n\geq 1}$
converges in total
variation norm to the product measure $\gamma_{\Delta}^{t^*}$.
\section{The Lattice Gas}
\setcounter{equation}{0}
We consider the lattice gas model: let $\Z^d$ $(d\leq 1)$
be an integer--lattice, let
$\{\Lambda_n\}_{n\geq 1}$ be an increasing sequence of cubes in $\Z^d$ with
$V_n:=|\Lambda_n|\ra\infty$ as $n\ra\infty$; at each site
$j\in\Lambda_n$ we have a
configuration space $S_j$ which is a copy of some
fixed compact Hausdorff space $S$.
For each $n\geq 1$, the configuration space $\Omega_n$ is the space
$\Omega_n=\prod_{j\in \Lambda_n}S_j$
which we regard as a subspace of the product space
$\Omega=\prod_{j\in \Z^d}S_j$
equipped with the product topology, hence which
is compact; the $\sigma$--field ${\cal F}_n$
is the $\sigma$--field of Borel subsets of
$\Omega$ generated by the coordinate projections
$\Omega\ra S_j$. For each $j\in\Z^d$ we have
the action of $\Z^d$ on itself given by
$i\mapsto i+j$, $i\in\Z^d$; this lifts to
$\theta_j:\Omega\ra\Omega$ given by
$(\theta_j\omega)(i)=\omega(i-j)$ for each
configuration $\omega\in\Omega$.
On each $S_j$ we define a reference measure
$\rho^j$, a copy of a fixed positive measure on
$S$ with $\rho^j(S_j)=1$; on $\Omega$ we define the product measure
$\rho=\prod_{j\in\Z^d}\rho^j$
and we take $\rho_n$ to be the restriction of $\rho$
to ${\cal F}_n$. The interaction in the
model is given by a $k$--dimensional vector
of translation--invariant absolutely summable
potentials with either free or fixed boundary conditions.
Using these potentials, we define
mappings $T_n:\Omega_n\ra X$ which give
the energy per site of a configuration; here $X$ is
a compact convex subset of $E=\R^k$. We now
define the conditioned measures $\nu^C_{n}$ and
the tilted measures $\gamma^t_n$ as in
${\cal \char 120}$ 2; in this setting, the measure
$\nu^C_{n}$ is the microcanonical measure on
the cube $\Lambda_n$ condition on $T_n$ taking
values in $C$ (if $C$ is an open neighbourhood of
a point in $X$, then $T^{-1}_nC$ is what
is sometimes called a ``thin energy--shell'' in
$\Omega_n$) and $\gamma_n^t$ is a Gibbs
measure on $\Lambda_n$ with generalized chemical
potential $t\in E^*=\R^k$. Using
standard methods, we prove that $\overline{\mu}$ and
$\underline{\mu}$ are independent of
boundary conditions. Let $B_{\varepsilon}(x)$ be
an open ball of radius $\varepsilon$ and
centre $x$ in $X$; we prove, in the case of
free boundary conditions, the following result.
\begin{lem}\label{lem5.1}
Let $x_0$, $x_1$, $x_2\in X$ satisfy $x_0+x_1=2x_2$
and let $0<\varepsilon '<\varepsilon$;
then
\be
2\underline{m}[B_{\varepsilon}(x_2)]\geq \overline{m}[B_{\varepsilon'}(x_0)]+
\overline{m}[B_{\varepsilon '}(x_1)]\;.
\ee
\end{lem}
>From this and the independence of $\overline{\mu}$ and
$\underline{\mu}$ on the boundary
conditions, we deduce the
\begin{cor}\label{cor5.2}
The RL--function $\mu$ exists for the pair
$(\M_{\circ},V_{\circ})$ and is concave on $X$.
\end{cor}
We have reserved the name ``entropy'' for the
RL--functions which are concave; henceforth
in this section, we refer to $\mu$ as the
entropy of the pair $(\M_{\circ},V_{\circ})$ and
to $p$, given by $p(t)=(-\mu)^*(t)$, as the
grand canonical pressure. We now choose $C$
to be an open convex subset of $X$; using convexity theory, we
prove
\begin{lem}\label{lem5.2}
Let $C$ be an open convex subset of $X$; if $\mu$ is concave, then\newline
\hspace*{1cm} (a)
$\sup_{x\in C}\mu(x)=\underline{m}[C]=\overline{m}[\overline{C}]=
\sup_{x\in \overline{C}}\mu(x)$;\newline
\hspace*{1cm} (b) the entropy
$\mu_{C}$ of the pair $(\M_{\circ}[\;\cdot\;|C],V_{\circ})$
is given by
\be
\mu_{C}(x)=\cases{\mu(x)-\sup_{y\in \overline{C}}\mu(y)\;,&
$y\in \overline{C}$,\cr
-\infty\;,& $y\in X\backslash \overline{C}$.\cr}
\ee
\end{lem}
We see from (a) that, provided $C$ is chosen so that
it contains a point at which $\mu$ is finite,
condition (\ref{*}) is satisfied. Part (b)
gives an interpretation of $X_{\overline{C}}$ in this
case: $X_{\overline{C}}=N_{\mu_{C}}$, the set
on which the entropy attains its supremum. There
is also an interpretation of the set $X^t$ which
follows from the concavity of $\mu$:
using convexity theory we can show that
\be
X^t=\partial p(t)\;,
\ee
($\partial f$ denotes the subgradients to a convex function $f$;
when $\dim X=1$, the interval $\partial p(t)$
is ``a phase--transition segment'' in the grand
canonical ensemble; it reduces to a point in
the absence of a first order transition.)
We see that Theorem \ref{thm3.7} now yields
\begin{thm}\label{thm5.1}
Let $\mu$ be the entropy of a lattice gas with
translation invariant summable potential. Let
$C$ be an open convex neighbourhood of a point
at which $\mu$ is finite. Then there exists
$t^*$ such that
\be\label{5.7}
\lim_{n\ra\infty}{1\over V_n}{\cal H}(\nu_n^C|\gamma_n^{t^*})=0\;.
\ee
\end{thm}
Because, in the presence of a non--trivial interaction, the
Gibbs measures $\gamma_n^t$ are not
product measures, the subadditivity argument used in
${\cal \char 120}$ 4 fails. There
is a second difficulty: in ${\cal \char 120}$ 4 we
exploited permutation--invariance (exchangeability) at
(\ref{4.3}); here we must replace it by
translation--invariance, but the measures $\nu_n^C$ associated
with the cubes $\Lambda_n$ are
not translation--invariant. The way--out is to introduce
translation--averages: define
\be
\overline{\nu}_n^C:=
{1\over V_n}\sum_{j\in\Lambda_n}\nu_n^C\circ \theta_j^{-1}\;,
\ee
where $\nu_n^C$ is extended to $\Omega$
in the usual way. We are able to prove
\begin{thm}\label{thm5.2}
Suppose that (\ref{5.7}) holds; then any weak limit point of the sequence
$\{\overline{\nu}_n^C\}_{n\geq 1}$ is a Gibbs
state with respect to the specification associated
with $\{\gamma_n^{t^*}\}_{n\geq 1}$.
\end{thm}
The statement of this theorem make precise the
sense in which the measures $\overline{\nu}_n^C$
and $\gamma_n^{t^*}$ are ``equivalent'' in the
thermodynamic limit -- something we said in
${\cal \char 120}$ 1 was part of the problem.
Putting Theorems \ref{thm5.1} and \ref{thm5.2}
together, we see that the entropy $\mu$ can be
used to find a value $t^*$ of the chemical potential
such that any weak limit of the sequence
$\{\overline{\nu}_n^C\}_{n\geq 1}$ is a Gibbs
state with respect to the specification
determined by $\{\gamma_n^{t^*}\}_{n\geq 1}$.
This is possible because, as a consequence of the
concavity of $\mu$, we have
$\mu(x)=-p^*(x)$
as well as
$p(t)=(-\mu)^*(t)$;
but these statements together constitute the
equivalence of ensembles at the level of
thermodynamic functions. It is in this sense that
{\it equivalence of ensemble holds at
the level of measures whenever it holds at the
level of thermodynamic functions.}
\\
\noindent{\bf REFERENCES}
\small \newcounter{ref}
\vspace*{.2cm}
\begin{list}{\arabic{ref}.}{\usecounter{ref}\leftmargin=.5cm
\labelwidth=.2cm \labelsep=.3cm
\rightmargin=.5cm}
\item R.R. Bahadur and S.L. Zabell,
Large deviations of the sample mean in general vector
spaces, {\em Ann. Prob.} 7: 587 (1979).
\item J.--D. Deuschel, D.W. Stroock and H. Zessin,
Microcanonical distribution for lattice gases,
{\em Commun. Math. Phys.} 139: 83 (1991).
\item R.L. Dobrushin and B. Tirozzi,
The central limit theorem and the problem of
equivalence of ensembles, {\em Commun. Math. Phys.} 54: 173 (1973).
\item H.--O. Georgii,
Large deviations and maximum entropy principle for interacting random
fields on $\Z^d$, {\em Ann. Prob.} to appear (1993).
\item K. Huang, Statistical Mechanics, Wiley, New--York (1963).
\item A.Ya. Khinchin, Matematicheskie Osnovaniya Statisticheskoi Mekhaniki,
Gostekhizdat,
Moscow--Leningrad (1943). Translation: Mathematical Foundations
of Statistical Mechanics,
Dover, New--York (1949).
\item O.E. Lanford,
Entropy and equilibrium states in classical mechanics, in Statistical
Mechanics and Mathematical Problems, A. Lenard, ed.,
Lecture Notes in Physics 20, Springer (1973).
\item J.T. Lewis, C.--E. Pfister and W.G. Sullivan, DIAS preprint (1993).
\item C.J. Preston,
Random Fields. Lecture Notes in Mathematics 534, Spinger, Berlin (1976).
\item D. Ruelle, Correlation functionals, {\em J. Math. Physics} 6: 201 (1965).
\item W.G. Sullivan,
Potentials for almost markovian random fields, {\em Commun. Math. Phys.}
33: 61 (1973).
\item S.R.S. Varadhan,
Asymptotic probabilities and differential equations, {\em Comm.
Pure App. Math.} 19: 261 (1966).
\end{list}
\end{document}