INSTRUCTIONS
The text between the lines BODY and ENDBODY is made of
1249 lines and 47686 bytes (not counting or )
In the following table this count is broken down by ASCII code;
immediately following the code is the corresponding character.
30236 lowercase letters
1889 uppercase letters
1028 digits
2 ASCII characters 9
5635 ASCII characters 32
20 ASCII characters 34 "
1074 ASCII characters 36 $
7 ASCII characters 37 %
74 ASCII characters 38 &
14 ASCII characters 39 '
555 ASCII characters 40 (
552 ASCII characters 41 )
69 ASCII characters 42 *
78 ASCII characters 43 +
481 ASCII characters 44 ,
351 ASCII characters 45 -
406 ASCII characters 46 .
2 ASCII characters 47 /
27 ASCII characters 58 :
38 ASCII characters 59 ;
12 ASCII characters 60 <
231 ASCII characters 61 =
27 ASCII characters 62 >
1 ASCII characters 64 @
40 ASCII characters 91 [
2466 ASCII characters 92 \
40 ASCII characters 93 ]
485 ASCII characters 94 ^
502 ASCII characters 95 _
6 ASCII characters 96 `
522 ASCII characters 123 {
295 ASCII characters 124 |
521 ASCII characters 125 }
BODY
\documentstyle {amsppt}
\magnification \magstep1
\openup3\jot
\NoBlackBoxes
\pageno=1
\hsize 6 truein
%\hoffset -.5 truein
%\voffset -.5 truein
%here are the needed definitions
\font\slv=cmsl10 scaled \magstep1
\def\missingstuff{\vskip1cm\centerline{\bf MISSING STUFF}\vskip1cm}
\def\wt{\widetilde}
\def\wh{\widehat}
\def\ov{\overline}
\def\R{\Cal S}
\def\E{\Bbb E}
\def\F{\Cal F}
\def\M{\Cal M}
\def\ve{\varepsilon}
\def\C{\Cal C}
\def\Z{\Bbb Z}
\def\To{\Bbb T}
\font\sob=cmss10
\def\Id{{\hbox{\sob 1\kern-.8mm l}}} %this is the identity for matrices
%here I redifine "\enddemo"
\predefine\enddimost{\enddemo}
\redefine\enddemo {\penalty 5000\hskip15pt plus1pt minus5pt\penalty1000
\qed\enddimost}
\font\bit=cmmi10 scaled\magstep 1
\def\e{\hbox{\bit e}}
\def\tb{|\hskip-.08em |\hskip-.08em |}
\def\today{\ifcase\month\or
January\or February\or March\or April\or May\or June\or July\or August\or
September\or October\or November\or December\fi
\space\number\day, \number\year}
%here starts the document
\topmatter
\title
CENTRAL LIMIT THEOREM FOR DETERMINISTIC SYSTEMS
\endtitle
\author
Carlangelo Liverani
\endauthor
\affil University of Rome {\sl Tor Vergata}
\endaffil
\address
Liverani Carlangelo,
Mathematics Department,
University of Rome II, Tor Vergata,
00133 Rome, Italy.
\endaddress
\email
liverani\@mat.utovrm.it
\endemail
\date
July 27, 1995
\enddate
\abstract
A unified approach to obtaining the central limit theorem for hyperbolic
dynamical systems is presented. It builds on previous results for one dimensional maps
but it applies to the multidimensional case as well.
\endabstract
\thanks
\bf This paper originated out of discussions with D. Szasz and A. Kramli, and was
made possible by D.Szasz key suggestion to use K-partitions. I wish to thank E.
Olivieri, E. Presutti, B. Tot, and L. Triolo for helpful discussions. In addition, I
am indebted to S.Olla for explaining me the subtleties of the Kipnis-Varadhan
approach. This work has been partially supported by grant CIPA-CT92-4016 of
the Commission of the European Community.
\endthanks
\endtopmatter
\vskip -.5cm
\centerline{\bf CONTENT}
%\vskip -.5cm
\newdimen\riga
\newdimen\rigat
\riga=\baselineskip
\rigat=\lineskip
\baselineskip=.5\baselineskip
\lineskip=.5\lineskip
\roster
\item"0." Introduction\dotfill p. \ 2
\item"1." A general probabilistic result\dotfill p.\ 3
\item"2." Non invertible maps\dotfill p. 11
\item"3." Invertible maps\dotfill p. 12
\item" " References\dotfill p. 16
\endroster
\baselineskip=\riga
\lineskip=\rigat
\vfil\par\newpage
\document
\vskip1cm
\subhead \S 0 Introduction
\endsubhead
\vskip1cm
A discrete time dynamical system consists of a measurable space $X$ together with a
$\sigma$-algebra $\Cal F$, a measurable map $T:X\to X$
which describes the dynamics, and a probability measure $P$ invariant with
respect to $T$. This setting is particularly well suited to study
problems involving statistical properties of the motion of deterministic
systems.
Typically the properties of interest are ergodicity, mixing, bounds on the
decay of correlations, Central Limit Theorems ( CLT ) and so on. Several approaches
have been developed to tackle such problems at various levels. Given a
system, one first explores the weaker statistical
properties and then tries to investigate the stronger ones using the already
obtained results plus some extra properties.
The position of this paper in the above mentioned hierarchy is between obtaining
bounds on the decay of correlations and CLT. In other words we discuss a general
approach that gives checkable conditions under which, in a mixing system, an
observable enjoys the CLT. Such general approaches already exist but they are
either limited to one dimensional systems \cite{Ke} or relay on the existence of
special partitions of the phase space \cite{Ch}, partitions which concrete
construction may be far from trivial \cite{BSC1}, \cite{BSC2}; for a very
nice review of the state of affairs up to 1989 (but still actual) see \cite{De}.
Here, I want to put forward the following point of view: the above described
dynamical systems are most naturally viewed as giving rise to a (deterministic)
Markov process. It is therefore tempting to think that there should exists some
general probabilistic theorem that states abstract conditions for the validity of
the CLT, and that all the concrete cases can simply be obtained by the direct
application of such a theorem to the system under consideration (without having to
code the system in some symbolic type dynamics). General theorems of this type are
well known in probability theory but they are normally not well suited for
applications to the case at hand. Two such general theorems, tailored for dynamical
systems, can be found in this paper.
Attempts in this directions already exists for some time \cite{Go}, \cite{IL},
but they are satisfactory only for the one dimensional case (the equivalent of
Theorem 1.1 in this paper). Particular mention must be given to \cite{DG}, the
results obtained there are essentially comparable to the one presented here in
section 1 and could be applied to the multidimensional case. Unfortunately, not
much attention is given there to applications, so that the possibility to bypass a
symbolic representation of the system is completely overlooked.
The approach used here is a martingale approximation inspired by \cite{KV}.
Since this is a typical probabilistic technique, I think it underlines very
well the purely probabilistic nature of the result hereby clarifying which
characteristics of a deterministic system yield such a drastic
statistical behavior.
As we will see, a major difference with the analogous type results in probability is
that the CLT holds for a much smaller class of observables than the square summable
ones. This is not an artifact of the proof: it is an inevitable consequence of the
deterministic nature of the systems under consideration so that only observables
that operates some ``coarse graining" (and therefore enjoy some degree of smoothness)
can yield strong statistical behavior. Here no particular attempt is made to find
the most general class of observables to which the Theorems apply;
nonetheless, the technique
put forward lends itself to an extension in such a direction.
The paper includes some
concrete examples as well. Their aim is to show how the general theorems can be
applied in special cases. The cases discussed belong to quite general classes
(expanding one dimensional maps, area preserving piecewise smooth uniformly
hyperbolic maps in two and more dimensions), yet no real new result is contained in
such examples. This reflects the spirit of the paper of presenting an approach to
the problem rather than new implementations. Nevertheless, the application of the
present results in technically complex situations (e.g., hyperbolic billiards)
greatly simplifies the proof of the validity of the CLT. In addition, it is
conceivable that some new results can be obtained by this approach since the two
above mentioned theorems hold in more general cases than the ones already present in
the literature (a brief comparison with previously known results is inserted after
the proof of each theorem).
The plan of the paper is as follows.
Section 1 contains two probabilistic theorems that are well suited for the
study of dynamical systems. In fact, they may seem a bit unnatural from the
pure probabilistic point of view. On the one hand, both theorems deal only
with functions in $L^\infty$ instead than $L^2$. The reason is that
normally the decay of correlations in dynamical systems can be obtained
only for classes of functions with some amount of smoothness, which makes
them automatically bounded. The issue is not purely a matter of taste: a
look at the proofs will show that such an hypothesis has really been used
and that many key estimates would not hold in $L^2$.
On the other hand, in Theorem 2 are introduced $\sigma$-algebras $\F_i$
that behave nicely with respect to the dynamics. This may make little
sense from the purely probabilistic point of view but it is instead a
cornerstone in the treatment of hyperbolic dynamical systems.
In the above sense the results of section 1, although purely probabilistic in
nature, are expressly developed for applications to dynamical systems.
Section 2 describe how the technique applies to non-invertible maps. The
case of piecewise smooth expanding maps of the interval is discussed in
detail.
Section 3 deals with the most interesting applications: the
multidimensional case. As an example I treat a large subclass of piecewise
smooth symplectic maps. Such maps are well studied in the literature
for some relevant physical models (e.g. billiards) are naturally
described in their terms. It is shown that very general considerations imply the
applicability of the results developed in section one.
\vskip1cm
\subhead \S 1 A general probabilistic result
\endsubhead
\vskip1cm
Let $X$ be a complete separable metric space, $\F$ a
$\sigma$-algebra, $P$ a probability
measure ($P(X)=1$) and $T:X\to X$ a measurable map.\footnote{Actually, we
assume that, for each $A\in\F$, not only $T^{-1}A\in\F$ but also $TA\in\F$.}
We will call $\E$ the expectation with respect to $P$.
In addition, we require that $P$ is invariant with respect to $T$ (i.e.,
for all $A\in\Cal F$ holds $P(T^{-1}A)=P(A)$), and that the dynamical
system $(T,\,X,\,P)$ be ergodic.
For each $\phi\in L^2(X)$ define $\wh T:L^2(X)\to L^2(X)$ by
$$
\wh T\phi=\phi\circ T ,
$$
and let $\wh T^* :L^2(X)\to L^2(X)$ be the dual of $\wh T$.
If $\E(f)=0$, then by ergodicity $\lim\limits_{n\to\infty}\frac 1n
\sum_{i=0}^{n-1}\wh T^n f=\E(f)=0$.
The CLT gives us informations on the speed of
convergence; namely the conditions under which there exists $\sigma\in\Bbb R^+$:
for each interval $I\subset\Bbb R$
$$
\lim_{n\to\infty}P\left(\left\{\frac1{\sqrt n}\sum_{i=0}^{n-1}\wh T^n
f\in I\right\}\right)=
\frac 1{\sqrt {2\pi}\sigma}\int_I e^{-\frac {x^2}{2\sigma^2}}dx ;
$$
this is called ``convergence in law (or distribution)" to a Gaussian random variable
of zero mean and variance $\sigma$.
Consider a sub-$\sigma$-algebra $\F_0$ of $\F$ and define
$\F_i=T^{-i}\F_0$, $i\in\Bbb Z$, then the following holds.
\proclaim{Theorem 1.1} If $\F_i$ is coarser than $\F_{i-1}$ and, for each
$\phi\in\L^\infty(X)$, we have
$$
\E(\wh T\wh T^*\phi|\F_1)=\E(\phi|\F_1),
$$
then, for each $f\in\L^\infty(X)$, $\E(f)=0$ and $\E(f|\F_0)=f$, such that
\roster
\item $\sum_{n=0}^\infty|\E(f\wh T^n f)|<\infty$,
\item the series $\sum_{n=0}^\infty\E(\wh T^{*n}f|\F_0)$ converges absolutely
almost surely,\footnote{As we will see in the proof, this implies that there exists
an almost everywhere finite $\F_0$-meausurable function
$g$, such that $f=g-\E(\wh T^*g|\F_0)$.}
\endroster
the sequence
$$
\frac 1{\sqrt n}\sum_{i=0}^{n-1}\wh T^i f
$$
converges in law to a Gaussian random variable of zero mean and finite
variance $\sigma$, $\sigma^2\leq -\E(f^2)+2\sum_{n=0}^\infty\E(f\wh T^n f)$.
In addition, $\sigma=0$ if and only if there exists a $\F_0$--mesurable
function $g$ such that
$$
\wh Tf=\wh Tg -g .
$$
Finally, if (2) converges in $L^1(X)$, then
$\sigma^2= -\E(f^2)+2\sum_{n=0}^\infty\E(f\wh T^n f)$.
\endproclaim
\demo{Proof}
The key idea is to use a Martingale
approximation. That is, to find
$Y_i\in L^2(X)$ and $g$ $\F_0$--measurable, and almost everywhere finite,
such that
$$
\E(Y_{i-1}|\F_{i})=Y_{i-1}\; ;\;\;\;\E(Y_{i}|\F_i)=0 ,
\tag 1.1
$$
(i.e., $Y_i$ is a reverse Martingale difference with respect to the filtration
$\{\F_i\}_{i=0}^\infty$), and
$$
\wh T^if=Y_{i}+\wh T^ig-\wh T^{i-1}g \quad \forall i>0.
\tag 1.2
$$
Accordingly,
$$
\frac 1{\sqrt n}\sum_{i=0}^{n-1}\wh T^i f=
\frac 1{\sqrt n}\sum_{i=0}^{n-1} Y_i+
\frac 1{\sqrt n}[\wh T^ng -g].
\tag 1.3
$$
Equation (1.3) shows that we can obtain the central limit theorem for our random
variable provided we have the central limit theorem for the martingale difference
$Y_i$. In fact, $\frac 1{\sqrt n}[\wh T^ng -g]$ converges to zero in probability when
$n\to\infty$.
Note that $(1.1)$ and $(1.2)$ are equivalent to
$$
\E(\wh T^if|\F_{i})=\E(\wh T^ig|\F_{i})-\E(\wh T^{i-1}g|\F_{i})
\quad
\forall i>0.
$$
Since by the definition of $\F_i$ follows that, for each $\phi\in
L^1(X)$,
$$
\E(\wh T^i\phi|\F_{i})=\wh T^i\E(\phi|\F_0) \quad\forall i>0 ,
$$
and because the invariance of $\E$ with respect to $T$ implies
$\wh T^*\wh T=\Id$,
we have
$$
\aligned
f&=\E(g|\F_0)-\wh T^*\E(g|\F_1)=g-\wh T^*\E(\wh T\wh
T^*g|\F_1)\\
&=g-\E(\wh T^*g|\F_0).
\endaligned
\tag 1.4
$$
It is immediate to see that $g=\sum\limits_{n=0}^\infty\E(\wh T^{*n}f|\F_0)$
(the convergence of the series is the hypothesis (2) in the statement of
the theorem) is a solution
of the above equation, and therefore of (1.2), (clearly, $Y_i=\wh T^{i-1}
Y_1$).\footnote{It is remarkable that, once we have $g$, the $Y_i$ are defined by
(1.2) itself, and will automatically satisfy (1.1).}
In fact, setting $T_0\phi=\Bbb E(\wh T^*\phi|\Cal F_0)$, the
solution of $(1.4)$ is given by the Neumann series $\sum_{n=0}^\infty
T_0^nf$. But $T^n_0f=\Bbb E(\wh T^{*n}f|\Cal F_0)$ since
$$
\align
\E(\wh T^{*}\E(\wh T^{*n}f|\F_0)|\F_0)=& \wh T^*\wh T\E(\wh T^*\E(\wh
T^{*n}f|\F_0)|\F_0)
=\wh T^*\E(\wh T\wh T^*\E(\wh T^{*n}f|\F_0)|\F_1)\\
=&\wh T^*\E(\E(\wh T^{*n}f|\F_0)|\F_1)=\wh T^*\E(\wh T^{*n}f|\F_1)\\
=&\wh T^*\E(\wh T\wh T^{*(n+1)}f|\F_1)=\E(\wh T^{*(n+1)}f|\F_0) .
\endalign
$$
To insure that the central limit theorem for $Y_i$ holds, we need
only to show that $Y_i$ is square summable due to the following \cite{Ne}:
\proclaim{Theorem} Let $(Y_n)_{n\geq 1}$ be a stationary, ergodic, martingale
difference (or reversed martingale difference) with respect to the filtration
$\{\F_n\}_{n\geq 1}$. If $Y_1\in L^2(X)$, then $\sigma^2=\E(Y_1^2)$ and the CLT
holds.
\endproclaim
The above theorem applies to our case since the stationarity of $(Y_n)$ is implied
by the invariance of the measure with respect to $T$, while the ergodicity follows
from the ergodicity of the dynamical system $(X,\,T,\,P)$.
If the series $\sum_{n=0}^\infty\E(\wh T^{*n}f|\F_0)$ would converge in
$L^2(X)$, then $Y_1\in L^2(X)$ would hold and the Theorem would be proven. It
is however a remarkable fact that $Y_i$ can be in $L^2(X)$ without $g$
being even integrable \cite{KV}.
Unfortunately, the road to this result is a bit indirect and
consists in carrying out an argument similar to the one above but producing a sequence
of martingale differences $Y_i(\lambda)$ that approximate $Y_i$.
Let us look for $Y_i(\lambda)$, $\lambda>1$, such that
$$
\E(Y_{i-1}(\lambda)|\F_{i})=Y_{i-1}(\lambda)\; ;\;\;\;\E(Y_{i}(\lambda)|\F_i)=0 ;
\tag 1.5
$$
and
$$
\wh T^i f=Y_{i}(\lambda)+\wh T^ig(\lambda)-\lambda^{-1}\wh T^{i-1}g(\lambda)
\quad
\forall i>0,\;\lambda>1.
\tag 1.6
$$
In analogy with what we have seen before
$g(\lambda)=\sum\limits_{n=0}^\infty\lambda^{-n}\E(\wh T^{*n}f|\F_0)$, only now
$g(\lambda)\in L^2(X)$ for each $\lambda>1$. Since $\lim\limits_{\lambda\to
1}g(\lambda)= g(1)=g$ almost surely, it follows that $\lim\limits_{\lambda\to
1}Y_i(\lambda)=Y_i$ almost surely. In addition,
$$
\align
\E(Y_i(\lambda)^2)=&\E(Y_1(\lambda)^2)=\E([\wh Tf-\wh T
g(\lambda)+\lambda^{-1}g(\lambda)]^2)\\
=&\E(\wh Tf[\wh Tf-\wh T g(\lambda)+\lambda^{-1}g(\lambda)])\\
&-\E([\wh Tg(\lambda)-
\lambda^{-1}g(\lambda)][\wh Tf-\wh T g(\lambda)+\lambda^{-1}g(\lambda)]),
\endalign
$$
since $\E(\wh Tf-\wh T g(\lambda)+\lambda^{-1}g(\lambda)|\F_1)=\Bbb E(Y_1|
\Cal F_1)=0$. Hence,
$$
\align
\E(Y_i(\lambda)^2)=&-\E((\wh T f)^2)+\E([\wh T
g(\lambda)-\lambda^{-1}g(\lambda)]^2)\\
=&-E(f^2)+\E(\wh Tg(\lambda)[\wh T g(\lambda)-\lambda^{-1}g(\lambda)])\\
&-\lambda^{-1}\E(g(\lambda)\wh T g(\lambda))+\lambda^{-2}\E(\wh Tg(\lambda)^2)\\
=&-\E(f^2)+2\E(\wh Tg(\lambda)[\wh T g(\lambda)-\lambda^{-1}g(\lambda)])
-(1-\lambda^{-2})\E(g(\lambda)^2)\\
=&-\E(f^2)+ 2\E(\wh Tg(\lambda)\wh T f)-(1-\lambda^{-2})\E(g(\lambda)^2)\\
=&-\E(f^2)+2\E(g(\lambda) f)-(1-\lambda^{-2})\E(g(\lambda)^2)\\
\leq&-\E(f^2)+ 2\sum_{n=0}^\infty\lambda^{-n}\E(f\wh T^n f)\leq
-\E(f^2)+2\sum_{n=0}^\infty|\E(f\wh T^n f)| .
\endalign
$$
The wanted estimates follows from
$$
\E(Y_1^2)=\E(\liminf_{\lambda\to 1} Y_1(\lambda)^2)\leq
\liminf_{\lambda\to 1} \E(Y_1(\lambda)^2)\leq-\E(f^2)+
2\sum_{n=0}^\infty\E(f\wh T^n f) .
$$
In conclusion, we have seen that the random variable under consideration converges in
law to a Gaussian of variance $\sigma^2=\E(Y_1^2)<\infty$. If $\sigma=0$ then
the second assertion of the statement follows since
$$
\E(Y_1^2)=\E([\wh Tf-\wh T g+g]^2) .
$$
If we assume that the series in (2) converges in $L^1(X)$, then it is
possible to obtain the much sharper result
$$
\lim_{\lambda\to 1}\E(Y_1(\lambda)^2)=-\E(f^2)+
2\sum_{n=0}^\infty\lambda^{-n}\E(f\wh T^n f).
$$
In fact, for each $\varepsilon>0$
$$
\aligned
\bigg|\E(Y_i(\lambda)^2)-&\E(f^2)+2\sum_{n=0}^\infty\lambda^{-n}
\E(f\wh T^n f)\bigg|
\leq \sum_{n=0}^\infty (1-\lambda^{-n})\E(f\wh T^n f)\\
&+(1-\lambda^{-2})\E(g(\lambda)^2)
\leq (1-\lambda^{-M})\sum_{n=0}^\infty\E(f\wh T^{n}f)+\sum_{n=M}^\infty
\E(f\wh T^n f)\\
&+(1-\lambda^{-2})\E(g(\lambda)^2)
\leq \varepsilon+(1-\lambda^{-2})\E(g(\lambda)^2)
\endaligned
$$
where $M$ has been chosen sufficiently large and $\lambda$ sufficiently close
to one. In order to continue we need to estimate the last term in the above
expression. For further use we will deal with a more general estimate: for
each $\lambda,\,\mu\in(1,\,\infty)$ holds
$$
\aligned
\E(&g(\lambda)g(\mu))=\sum_{n,m=0}^\infty\lambda^{-n}\mu^{-m}
\E(\wh T^{*n}f\E(\wh T^{*m}f|\F_0))\\
&\leq\sum_{n=0}^\infty\lambda^{-n}\sum_{m=0}^{M-1}\|f\|_\infty\E(|\E(\wh
T^{*n}f|\F_0)|)
+\sum_{n=0}^\infty\lambda^{-n}\sum_{m=M}^\infty\E(|\E(\wh
T^{*m}f|\F_0)|)\\
&\leq M\|f\|_\infty\sum_{n=0}^\infty\E(|\E(\wh
T^{*n}f|\F_0)|)+\frac{\|f\|_\infty}{1-\lambda^{-1}}\sum_{m=M}^\infty\E(|\E(\wh
T^{*m}f|\F_0)|).
\endaligned
\tag 1.7
$$
That is, choosing again $M$ large and $\lambda$ sufficiently close to 1,
$$
(1-\lambda^{-1})\E(g(\lambda)^2)\leq 2\varepsilon.
$$
This is not the end of the story: it is possible to prove that
$Y_1$ is the limit of $Y_1(\lambda)$ in $L^2(X)$.
To see this it suffices to estimate
$$
\aligned
\E([Y_1(\lambda)-Y_1(\mu)]^2)&=\E([\lambda^{-1}g(\lambda)-\mu^{-1}g(\mu)]
[Y_1(\lambda)-Y_1(\mu)])\\
&= \E([\lambda^{-1}g(\lambda)-\mu^{-1}g(\mu)]^2)+
\E([g(\lambda)-g(\mu)]^2)\\
&\leq (1-\mu^{-1}\lambda^{-1})\E(g(\lambda)g(\mu)),
\endaligned
$$
since no generality is lost by choosing $\lambda\geq\mu>1$, the result
follows thanks to the estimate $(1.7)$.
\enddemo
Let us discuss briefly how the above result compares with the ones present in the
literature. In the work of Gordin \cite{Go}, used by Keller \cite{Ke}, a very
similar theorem is present. The main difference is that condition (1) and (2) are
replaced by the much stronger condition
$$
\sum_{n=0}^\infty\E(\E(\wh T^{*n}f|\F_0)^2)<\infty .
$$
A similar comment applies to \cite{DG}, where moreover there is no discussion of the
case $\sigma=0$.
Theorem 1.1 often is
applicable in cases in which
$T$ is not invertible, where sometime it is possible to choose $\F_0=\F$ (see \S2 ).
When $T$ is invertible the choice $\F_0=\F$ is likely to
yield $\F_i=\F$ for each $i\in\Bbb Z$, this would undermine the possibility of
capturing any type of dynamical coarse graining effect, whereby nullifying the
hope of obtaining an interesting statistical behavior. In such a case, there
are situations in which a natural choice for
$\F_0$ exists (see \S 3), but it would be too restrictive to require
$f $ to be $\F_0$--measurable.
The above difficulties can be dealt with by the following Theorem.
\proclaim{Theorem 1.2} Suppose $T$ one to one and onto. If $\F_i$ is coarser than
$\F_{i-1}$, then, for each $f\in\L^\infty(X)$, $\E(f)=0$ such that
\roster
\item $\sum_{n=0}^\infty|\E(f\wh T^n f)|<\infty$,
\item the series $\sum_{n=0}^\infty|\E(\wh T^{*n}f|\F_0)|$ converges in
$L^1$,
\item $\exists$ $\alpha>1$: $\sup\limits_{k\in\Bbb N}k^\alpha
\E(|\E(f|\F_{-k})-f|)<\infty$,\footnote{This condition it is not optimal,
as it can be seen by looking at the proof, yet I do not know of any
application in which a weaker condition could be of interest.}
\endroster
the sequence
$$
\frac 1{\sqrt n}\sum_{i=0}^{n-1}\wh T^i f
$$ converges in law to a Gaussian random variable of zero mean and finite
variance $\sigma$, $\sigma^2=-\E(f^2)+2\sum_{n=0}^\infty\E(f\wh T^n f)$.
In addition, if $\sum_{n=0}^\infty n|\E(f\wh T^n f)|<\infty$, then
$\sigma=0$ if and only if there exists $g\in L^2(X)$ such that
$$
\wh Tf=\wh Tg -g .
$$
\endproclaim
\demo{Proof} The key idea is to first approximate $f$ by $\E(f|\F_{-k})$ and then use
the same type of Martingale approximation introduced in Theorem 1.1. That is, to find
$Y_i(k,\,\lambda)\in L^2(X)$ and $g(k,\,\lambda)\in L^2(X)$ such that, given $k>0$,
for each $i>0$ and $\lambda>1$
$$
\E(Y_{i-1}(k,\,\lambda)|\F_{i-k})=Y_{i-1}(k,\,\lambda)\; ;\;\;\;
\E(Y_{i}(k,\,\lambda)|\F_{i-k})=0 ,
\tag 1.8
$$
(i.e., $Y_i(k,\,\lambda)$ is a reverse Martingale difference with respect to the
filtration $\{\F_i\}_{i=-k}^\infty$) and
$$
\wh T^i\E(f|\F_{-k})=Y_{i}(k,\,\lambda)+\wh T^ig(k,\,\lambda)-\lambda^{-1}\wh
T^{i-1}g(k,\,\lambda)
\quad
\forall i>0,\;\lambda \geq 1.
\tag 1.9
$$
Note that $(1.8)$ and $(1.9)$ are equivalent to
$$
\E(f|\F_{-k})=g(k,\,\lambda)-\lambda^{-1}\E(\wh T^*g(k,\,\lambda)|\F_{-k}).
$$
It is immediate to see that $g(k,\,\lambda)=\sum\limits_{n=0}^\infty
\lambda^{-n}\E(\wh
T^{*n}f|\F_{-k})\in L^2(X)$ for each $\lambda>1$ and in $L^1(X)$ for $\lambda=1$
(this is a consequence of hypothesis (2) in the statement of the Theorem) is a
solution of the above equation (see the analogous discussion in Theorem 1.1).
Again we want to show that the $Y_i(k,\,1)$ are square summable, actually, in this
case, we need a uniform estimate in $k$.
In partial analogy with Theorem 1.1, we have
$$
\align
\E(Y_i(k,\,\lambda)^2)=&\E(Y_1(k,\,\lambda)^2)=-\E(\E(f|\F_{-k})^2)+
\E([\wh Tg(k,\,\lambda)-\lambda^{-1}g(k,\,\lambda)]^2)\\
=&-\E(\E(f|\F_{-k})^2)+2\E(g(k,\,\lambda)\E(f|\F_{-k}))-(1-\lambda^{-2})
\E(g(k,\,\lambda)^2).
\endalign
$$
In addition, for each $\lambda>1$,
$$
\align
\E(g(k,\,\lambda)\E(f|\F_{-k}))
\leq&\sum_{n=0}^\infty|\E(f\E(\wh T^{*n}f|\F_{-k}))|
=\sum_{n=0}^\infty|\E(\wh T^{*n}f\E(f|\F_{-k}))|\\
\leq&2\|f\|_\infty k\E(|\E(f|\F_{-k})-f|)+\sum_{n=k}^\infty\E(\wh T^k f\E(\wh
T^{*n}f|\F_0))\\
&+\sum_{n=0}^{2k-1}\E(\wh T^n ff)<\infty,
\endalign
$$
where the uniform bound follows from the hypotheses (1), (2), (3) of the
Theorem.
The previous estimates show that $Y_i(k,\,1)$ are uniformly
square integrable martingale differences. Moreover,
$$
\lim_{k \to \infty}\lim_{\lambda\to 1}\E(Y_1(k,\,\lambda)^2)=-\E(f^2)+
2\sum_{n=0}^\infty\E(f\wh T^n f)=\sigma^2.
$$
To see this it, it is enough to compute
$$
\aligned
\E(g(k,\,\lambda)g(k,\,\mu))&=\sum_{n,m=0}^\infty\lambda^{-n}\mu^{-m}
\E(\wh T^{*n}f\E(\wh T^{*m}f|\F_{-k}))\\
&\leq\sum_{n=0}^\infty\lambda^{-n}M\|f\|_\infty\E(|\E(\wh T^{*n}f|\F_{-k})|
)\\
&+\sum_{n=0}^\infty\lambda^{-n}\|f\|_\infty
\sum_{m=M}^\infty\E(|\E(\wh T^{*m}f|\F_{-k})|)\\
&\leq M\|f\|_\infty^2k+M\|f\|_\infty\sum_{n=0}^\infty\E(|\E(\wh T^{*n}f|
\F_0)|)\\
&+(1-\lambda^{-1})^{-1}\|f\|_\infty
\sum_{m=M-k}^\infty\E(|\E(\wh T^{*m}f|\F_0)|),
\endaligned
$$
so, since $M$ can be chosen arbitrarily large,
$\lim\limits_{\lambda\to 1}(1-\lambda)\E(g(k,\,\lambda)^2)=0$.
Furthermore, in analogy with Theorem 1.1, easily follows that $Y_1(k,\,\lambda
)$ converges to $Y_1(k,\,1)$ in $L^2(X)$.
This implies that, defining
$$
S_n=\frac 1{\sqrt n}\sum_{i=0}^{n-1}\wh T^i f\; ;\quad
S_n^k=\frac 1{\sqrt n}\sum_{i=0}^{n-1}\wh T^i \E(f|\F_k),
$$
the $S_n^k$ converges in law to a gaussian with zero means and variance
$\E(Y_1(k,\,1)^2)$.
The next step is to obtain the needed convergence as $k$ goes to infinity.
$$
\aligned
\E([S_n^k-S_n]^2)=&\frac 1n \sum_{i,\,j=0}^{n-1}\E(\wh T^i[f-\E(f|\F_{-k})]
\wh T^j[f-\E(f|\F_{-k})])\\
\leq& \E([f-\E(f|\F_{-k})]^2)+2\sum_{i=1}^{n-1}
|\E([f-\E(f|\F_{-k})]\wh T^i[f-\E(f|\F_{-k})])|\\
\leq& 2\|f\|_\infty\E(|f-\E(f|\F_{-k})|)+2\sum_{i=1}^{n-1}|\E(\wh T^if
[f-\E(f|\F_{-k})])|\\
=&2\|f\|_\infty\E(|f-\E(f|\F_{-k})|)+2\sum_{i=1}^{n-1}|\E(\wh T^{*i}f
[f-\E(f|\F_{-k-i})])|\\
&\leq 2\|f\|_\infty\sum_{i=k}^\infty\E(|f-\E(f|\F_{-i})|),
\endaligned
$$
which it is smaller than $\varepsilon$ uniformly in $n$, since (3) implies
the convergence of the series $\sum_{i=0}^\infty\E(|f-\E(f|\F_{-i})|)$.
Collecting the previous estimates follows that $S_n$ converges to a
Gaussian of zero mean and variance $\sigma^2$.
Next, suppose that $\sigma^2=0$ and $\sum_{n=0}^\infty n|\E(f\wh T^n f)|
\leq\infty$, then
$$
\aligned
(1-\lambda^{-2})\E(g(k,\,\lambda)^2)&+\E(Y_1(k,\,\lambda)^2)\leq
-\E(\E(f|\F_{-k})^2)+2\E(g(k,\,\lambda)f)\\
&\leq\E(f^2)-\E(\E(f|\F_{-k})^2)+2\sum_{n=1}^\infty(1-\lambda^{-n})\E(f\wh
T^nf)\\
&+ 2\sum_{n=0}^\infty\lambda^{-n}\left[\E(\E(\wh T^{*n}f|\F_{-k})f)-
\E(f\wh T^n f)\right]\\
&\leq\|f\|_\infty\E(|f-\E(f|\F_{-k})|)+2(1-\lambda^{-1})\sum_{n=0}^\infty
n|\E(f\wh T^n f)|\\
&\;+ 2\|f\|_\infty\left(\sum_{n=0}^{2k-1}\E(|\E(f|\F_{-k})-f|)+
\sum_{n=k}^\infty\E(|\E(\wh T^{*n}f|\F_0)|)\right)\\
&+2\sum_{n=2k}^\infty\E(f\wh T^n f).
\endaligned
$$
Accordingly, it is possible to define $\phi:(0,\,1)\to\Bbb N$,
$\lim_{\lambda\to 1}\phi(\lambda)=\infty$, such that
$$
\aligned
&\E(g(\phi(\lambda),\,\lambda)^2)\leq M\quad\forall \lambda>1\\
&\lim_{\lambda \to 1}\E(Y_1(\phi(\lambda),\,\lambda)^2)=0 ,
\endaligned
$$
where $M$ is some fixed positive number.
Since $L^2(X)$ is a Hilbert space, and therefore reflexive, the unit ball is
compact in the weak topology, so $\{g(\phi(\lambda),\,\lambda\}_{\lambda>1}$ is a
weakly compact set and we can extract a subsequence $\{\lambda_j\}$,
$\lim_{j\to\infty}\lambda_j=1$, such that
$\{g(\phi(\lambda_j),\,\lambda_j\}$ converges weakly to a function $g\in L^2(X)$.
In addition, (1.9) implies, for each $\varphi\in L^2(X)$,
$$
\E(\wh T^*\varphi\E(f|\F_{-k}))=\E(Y_1(\phi(\lambda_j),\,\lambda_j)\varphi)+
\E(\wh T^*\varphi g(\phi(\lambda_j),\,\lambda_j))-\lambda_j^{-1}\E(\varphi
g(\phi(\lambda_j),\,\lambda_j)),
$$
and taking the limit $j\to\infty$ yields
$$
\E(\wh T^*\varphi f)=\E(\wh T^*\varphi g)-\E(\varphi g)
\quad \forall \varphi\in L^2(X).
$$
That is
$$
\wh Tf=\wh T g - g.
$$
\enddemo
This theorem is rather similar to Theorem 4.4 in \cite{DG}, the
main difference is the absence, in \cite{DG}, of a discussion of the degenerate
case $\sigma=0$.
The only other results known to the author that have a breath similar
to Theorem 2.1 are contained in
\cite{Ch}. The comparison it is not so easy because the results in
\cite{Ch} are stated directly in the language of special families of finite
partitions. This language it is well suited for applications to the case in which
the system is studied by the type of coding called Markov sieves, but it is not so
transparent in an abstract contest. At any rate an evident different is that
Chernov's result requires the existence of the first moment of the correlations
(i.e., $\sum_{n=0}^\infty n\E(ff\circ T^n)<\infty$) in order to obtain the
CLT while in Theorem 1.2 such a condition is not necessary, unless one
wants the coboundary characterization of the functions that yields to a
degenerate limit.
\vskip1cm
\subhead \S 2 Non invertible maps
\endsubhead
\vskip1cm
In this section we will see how the results of the previous section apply
to the case in which $T$ is onto but not one to one.
We choose $\F=\F_0$, so $\F_i=\F$ for all $i\leq 0$. Note that if
$\E(\phi|\F_1)=\phi$, then $g(x)=\phi(T^{-1}x)$ is well defined, hence
Range$(T)$ is
exactly the $\F_1$-measurable functions. Moreover, $\wh T\wh T^*$ is an
orthogonal projection onto Range$(T)$, while $\E(\cdot|\F_1)$ is an
orthogonal projection onto the $\F_1$-mesuarable functions. That is, for
each $\phi\in L^1(X)$
$$
\wh T\wh T^*\phi=\E(\phi|\F_1).
$$
The first condition of Theorem 1.1 is then satisfied quite generally.
To see how the theorem works let us apply it to the case of one dimensional
maps (i.e. $X=[0,\,1]$).
Let us consider a partition of $[0,\,1]$ into finitely many intervals
$\{I_k\}_{k=1}^p$. And $T:[0,\,1]\to[0,\,1]$ such that
\roster
\item $T\big|_{\overline I_k}\in\Cal C^{(2)}$ for each $k\in\{1,\,...,\,p
\}$
\item $\inf\limits_{x\in[0,\,1]}|D_xT|\geq \lambda>1$.
\endroster
That is a piecewise smooth expanding map.
If the reader wants to consider a concrete example, here is a very simple one:
the piecewise linear map $T:[0,\,1]\to[0,\,1]$ define by
$$
T(x)=\left\{\aligned
\frac 92 \left(\frac 19 -x\right)&\quad x\in\left(0,\,\frac
19\right)\\
\frac 92 \left(x-\frac 19\right) &\quad x\in\left(\frac 19,\,
\frac 39\right)\\
\frac 92 \left(\frac 59-x\right)&\quad x\in\left(\frac 39,\,
\frac 59\right)\\
\frac 92 \left(x-\frac 59\right)&\quad x\in\left(\frac 59,\,
\frac 79\right)\\
\frac 92 \left(1-x\right)&\quad x\in\left(\frac 79,\,1\right)
\endaligned
\right.
$$
The map satisfies our assumptions since $|DT|=\frac 92>1$.
The following result is well known \cite{HK}:
\proclaim{Theorem 2.1} There exists a unique probability measure $\mu$,
absolutely continuous with respect to Lebesgue, which is invariant with
respect to the map $T$. In addition, there exist $\Lambda\in(0,\,1)$ and
$K>0$ such
that, for each $f\in BV([0,\,1])$ (the space of functions of bounded
variation), and $g\in L^1([0,\,1],\,\mu)$
$$
\left|\int_0^1 f g\circ T^n d\mu-\int_0^1 fd\mu\int_0^1 gd\mu\right|
\leq K\Lambda^n\|f\|_{\text{BV}}\|g\|_1
$$
\endproclaim
Since $\mu$ is absolutely continuous with respect to the Lebesgue measure
$m$ the Radon--Nicod\'ym derivative $h=\frac{d\mu}{dm}$ is in $L^1([0,\,1],
\,m)$. For simplicity assume $h\geq\varepsilon>0$,\footnote{This is always verified
if $T$ is continuous, like in our example; but see \cite{L2} for a discussion of the
general case.} then it follows
$$
\wh T^*f(x)=h(x)^{-1}\sum_{y\in T^{-1}(x)}h(y)f(y)|D_yT|^{-1} .
$$
Such a representation implies that the last statement of Theorem 2.1
can be rephrased as follows: for each $f\in BV([0,\,1])$, $\int_0^1 fd\mu=
0$
$$
\|\wh T^{*n}f\|_\infty\leq K\Lambda^n\|f\|_{\text{BV}}.
$$
It is then immediate to see that Theorem 1.1 applies to this situation
yielding the central limit theorem for all functions of bounded variation.
The reader can easily see that such a result can be improved obtaining
the central limit theorem for functions with less regularity (e.g., by an
approximation argument) but this is not the main focus here. In addition, similar
results can be obtained for several cases in which the map $T$ consists of
infinitely many smooth pieces.
It is also immediate to verify that the theorem will yield the CLT for BV
functions also for some non-hyperbolic maps (such as the quadratic family
\cite {Y}) or maps that are non-uniformly hyperbolic (\cite{LSV}).
\vskip1cm
\subhead \S 3 Invertible maps
\endsubhead
\vskip1cm
In this case it would be useless to choose $\F_0=\F$: typically this would
yield $\F_i=\F$ for each $i\in\Bbb Z$. So the choice of $\F_0$ must be
motivated by dynamical considerations. Here we will discuss a general class
of systems for which such a choice is quite natural: the hyperbolic
systems.\footnote{More generally this strategy can be applied to K-systems.}
For simplicity I will confine the discussion to the case in which $X$ is as
compact symplectic manifold with a Riemannian structure that yields a
volume form equivalent to the symplectic one
and $T$ a piecewise $\Cal C^2$ symplectic map, but see
\cite{KS} and \cite{LW} for more general possibilities. By hypothesis the
symplectic (or Riemannian) volume $\mu$ is invariant. (The more general case of
dissipative systems can also be treated with the same arguments, again the details
are left to the reader).
We will assume $T$ uniformly hyperbolic, since almost nothing is known on
the decay of correlations for non-uniformly hyperbolic systems.
By this we mean that at each point $x\in X$ there exists two subspaces
$E^u(x),\,E^s(x)\in\Cal T_x X$, $E^u(x)\cap E^s(x)=\{0\}$ and
$E^u(x)\oplus E^s(x)=\Cal T_xX$, invariant (i.e., $D_xT E^{u,s}(x)=E^{u,s}(
Tx)$), and there exists $\lambda>1$ such that
for each $x\in X$,
$$
\aligned
&\|D_xTv\|\geq \lambda \|v\|\quad\forall v\in E^u(x)\\
&\|D_xTv\|\leq \lambda^{-1} \|v\|\quad\forall v\in E^s(x).
\endaligned
$$
Also, we assume that $E^{u, s}(x)$ depends continuously with respect to $x$
(the above systems are called Anosov, in the smooth case).
In the smooth case such systems are known to be ergodic (in fact, Bernoulli
), one can see \cite{LW} for sufficient conditions that insure ergodicity
also in the non smooth case.
To help the reader in better visualizing the following discussion let us consider
the simplest possible non-trivial example.
We consider a family of linear maps of the plane
defined by
$$
\aligned
x_1' &= x_1 + a x_2 \\
x_2' &= x_2 ,
\endaligned
$$
where $a$ is a real parameter. We use these linear maps to define
(discontinuous if $a\not\in\Bbb N$) maps of the torus by restricting the formulas to
the strip $\{ 0 \leq x_2 \leq 1 \}$ and further taking them modulo 1. In
this way we define a mapping $T_1$ of the torus $\To ^2 =
\Bbb R^2/\Z^2$ which is discontinuous on the circle $\{ x_2 \in \Z
\}$ (except when $a$ is equal to an integer) and preserves the
Lebesgue measure $\mu$.
Similarly we define another family of maps depending on the same
parameter $a$ by restricting the formulas
$$
\aligned
x_1' &= x_1 \\
x_2' &= a x_1 + x_2
\endaligned
$$
to the strip $\{ 0 \leq x_1 \leq 1 \}$ and then taking them modulo 1. Thus
for each
$a$ we get a mapping $T_2$ of the torus which is discontinuous on the circle
$\{ x_1 \in \Z \}$ (except when $a$ is equal to an integer) and
preserves the Lebesgue measure $\mu$.
Finally we introduce the composition of these maps $T =
T_2 T_1$ which depends on one real parameter $a$.
An alternative way of describing the map
$T$ is by introducing two fundamental domains for the
torus
$\M^+ = \{ 0 \leq x_1 + a x_2 \leq 1,\, 0 \leq x_2 \leq 1 \}$
and $\M^- = \{ 0 \leq x_1 \leq 1, \,0 \leq - a x_1 + x_2 \leq 1,\, \}$.
The linear map defined by the matrix
$$
\left(\matrix 1 & a \\ a & 1 + a^2\endmatrix\right)=
\left(\matrix 1 & 0 \\ a& 1 \endmatrix\right)
\left(\matrix 1 & a \\ 0 & 1 \endmatrix\right)
$$
takes $\M^+$ onto $\M^-$ thus defining a map of the torus which is
discontinuous at most on the boundary of $\M^+$ and preserves the
Lebesgue measure. This is the map $T$ that constitute our toy model.
Let us go back to the more general case,
according to \cite{KS} such systems have a
natural measurable partition (in fact a K-partition): the partition into stable
manifolds.
Such a partition $\Cal P$ can be constructed as to satisfy the following
requirements:
\roster
\item there exists a finite number of codimension one smooth
manifolds $\{S_i\}_{i=1}^{m_0}$,
transversal to the stable direction, such that each $p\in\Cal P$ has the
boundaries points belonging to the set
$\cup_{j=1}^{m_0}\cup_{n=0}^\infty T^{-n}S_i$;\footnote{In the discontinuous
case such manifolds can be simply chosen as the set of points at which $T$
is not $\Cal C^{(2)}$.}
\item for each $p\in\Cal P$ diam$(p)\leq 2\delta$;\footnote{$\delta$ is
some previously fixed number.}
\item for each $p\in\Cal P$ there exists $\{p_i\}_{i=1}^k\subset\Cal P$
such that $T^{-1}p= \cup_{i=1}^k p_i$.
\endroster
The above properties imply that choosing as $\F_0$ the $\sigma$-algebra
generated by the partition $\Cal P$, then $\{\F_i\}_{i=0}^\infty$
has the dynamical
properties requested in the hypotheses of Theorem 1.2.
To make the previous statement more clear let us see how such a partition looks like
in the concrete example mentioned above.
The map $T$ is piecewise linear and it has constant contracting direction $v$. Let
us call $S$ the discontinuity set of $T^{-1}$ and
$S_\infty=\cap_{n=0}^\infty T^{-n}S$. Then the stable partition is made of
segments along the direction $v$ with the endpoints belonging to
$S_\infty$.\footnote{See \cite{LW} for the details of such a construction and the
proof that almost every point belongs to one such segment.} Since $S_\infty$ is
an invariant set, properties (1)-(3) are readily verified.
Further, we will assume that the manifolds
$\{S_i\}$ satisfy the following property:
\proclaim{Property 0} For each $i\neq j$ $\overline{S}_i\cap \overline{S}_j$
is either empty or
consists of smooth submanifolds $I_{ij}$ of codimension at least two.
Moreover, setting\footnote{By $\sharp B$ I mean the cardinality of the set
$B$.}
$$
M\equiv\sup_{ij}\sharp\{k\in\{1,\,...,\,m_0\}\;|\;
\overline{S}_k\cap I_{ij}\neq\emptyset\},
$$
we require
$$
\nu\equiv\lambda^{-1} M<1 .
$$
\endproclaim
Note that Property 0 may not be satisfied by $T$ but may be enjoyed by
$T^q$, for some $q>1$. In fact, it is not so hard to see that
``generically" this will be the case (i.e., Property 0 will hold for some
iterate of the map). In such a situation we can apply all the following to
the dynamical system $(X,\,\mu,\,T^q)$ obtaining the same conclusions as
far as the CLT is concerned. Here, for simplicity, we restrict ourselves to
the case $q=1$.
If we think to our model example we see that $M=2$, so that $\lambda^{-1}M<1$ if
$|a|>\frac 1{\sqrt{2}}$. The reader can easily compute $M$ for powers of $T$ and
see that Property 0 is satisfied for smaller and smaller values of $a$. Of course,
$a=0$ corresponds to the identity, for which no hyperbolicity is present.
For the
systems under consideration holds the following (see
\cite{KS} for details)
\proclaim{Property 1}For each $p\in\Cal P$ define the measure $\mu_p$ by
$$
\Bbb E(g|\Cal F_0)(x)\equiv\int_p g d\mu_p,
$$
for $g\in\Cal C^{(0)}(X)$, and $x\in p$.
Then, calling $m_p$ the measure induced by the Riemannian structure on $p$,
and $\phi_p=\frac{d\mu_p}{dm_p}$ the Radon--Nicod\'ym derivative, there
exist $c_0$ such that $\sup_p\|\phi_p\|_\infty\leq c_0$.
\endproclaim
For our simple example we see that $\mu_p=\frac 1{m_p(p)}m_p$.
The map is invertible, thus $\wh T^* f=f\circ T^{-1}$. A very important
consequence of Property 1 is that, if $p\in\Cal P$ and $\Cal P'\subset
\Cal P$ is such that $\bigcup\limits_{q\in\Cal P'}q=T^{-n}p$, then for each $f\in
L^1(X,\,\mu)$
$$
\int_pf\circ T^{-n}d\mu_p=\sum_{q\in\Cal P'}\mu_p(T^nq)\int_q fd\mu_p .
$$
In addition, one can prove the following (see \cite{L1} for a complete
discussion of the two-dimensional case).
\proclaim{Property 2} There exists $K\in \Bbb R$ and $\Lambda>1$, such that, for each
$x\in X$ that belongs to $p\in\Cal P$ with diam$(p)\geq\delta$, and for each
$g,\,f\in\Cal C^{\alpha}(X)$ (H\"older continuous of class $\alpha>0$),
$\int_Xg=0$,\footnote{By $\|f\|_\alpha$ we mean the usual $\Cal C^{(\alpha)}$ norm,
while $\|f\|_\alpha^s=\sup
\limits_{p\in\Cal P}\sup\limits_{x,\,y\in p}\frac{|f(x)-f(y)|}
{\|x-y\|^\alpha}+\|f\|_\infty$; and $\|f\|^u_\alpha$ is defined analogously
by using the unstable partition. Essentially, This norms measure the
H\"older derivative in the stable (or unstable) direction only.}
$$
\E(f\wh T^{*n}g|\F_0)(x)\leq K \Lambda^{-n}\|g\|_\alpha^s\|f\|_\alpha^u
$$
\endproclaim
In the rest of the section we will see that Properties 0-2 imply, for the
systems under consideration, the hypotheses of Theorem 1.2.
\proclaim{Lemma 3.1} Calling $\Cal A_\varepsilon=\{x\in X\;|\; \text{diam}(p(x))\leq
\varepsilon\}$ we have\footnote{By $m(\cdot)$ we mean the symplectic or
Riemannian metric that, according to our hypotheses, is the invariant measure
of the system.}
$$
m(\Cal A_\varepsilon)\leq C\varepsilon
$$
for some fixed $C\in\Bbb R^+$.
\endproclaim
\demo{Proof}
Since $\partial p$ is made up of points belonging to the preimages
of the manifolds $S_i$, it follows
that if diam$(p)\leq\varepsilon$ then there exists $z\in\partial p$ and
$n\in\Bbb N$, $i\in\{1,\,...,\,m_0\}$ such that $T^n z\in S_i$. Accordingly,
$T^np$ must lie in a $\lambda^{-n}\varepsilon$ neighborhood of $S_i$. Such a
neighborhood has measure $c_1\lambda^{-n}\varepsilon$, for some fixed $c_1$.
It is then clear that
$$
m(\Cal A_\varepsilon)\leq \sum_{n=0}^\infty m_0c_1\lambda^{-n}\varepsilon=
\frac{m_0 c_1}{1-\lambda}\varepsilon.
$$
\enddemo
The problem in applying our theorem comes from the possible presence in
$\Cal P$ of very small elements. On such elements Property 2 does not
provide any direct control. To our advantage instead works Lemma 3.1
that informs us that the total measure of the very small pieces is small.
Yet, small pieces may be present.\footnote{In fact, this is certainly the
case in the non-smooth case. If $T$ is smooth, then it is possible to
construct $\Cal P$ in such a way that diam$(\Cal P)\geq \delta$ for some
fixed $\delta$, by using Markov partitions. When finite Markov partitions are
available the present method boils down to a repackaging of well known facts.} The
idea to deal with them consists in iterating them: if
$T^{-n}|_p$ is smooth, then diam$(T^{-n}p)\geq\lambda^n
\text{diam}(p)$. Unfortunately, in general $T$ is not smooth so we have to
handle the iteration with more care.
Fix $p\in\Cal P$, by construction there exists $\Cal P_1\subset\Cal P$ such
that $T^{-1}p=\bigcup\limits_{q\in\Cal P_1}q$. Call $\Cal P_1^-=\{q\in\Cal P_1
\;|\;\text{diam}(q)\leq\delta\}$ and
$p_1=\bigcup\limits_{q\in\Cal P_1^-}Tq\subset p$.
In other words $p_1$ consists in the part of $p$ that, under the actions
of $T^{-1}$, does not give rise to sufficiently large elements of the
partition. The process can obviously be iterated: let $\Cal P_2$ be the
collection such that $T^{-2}p_1=\bigcap\limits_{q\in\Cal P_2}T^{-2}q$,
$\Cal P_2^-=\{q\in\Cal P_2\;|\;\text{diam}(q)\leq\delta\}$,
$p_2=\bigcup\limits_{q\in\Cal P_2^-}T^2q\subset p_1$ and so on.
\proclaim{Lemma 3.2} If $\delta$ is chosen sufficiently small and
$p\in\Cal A_\varepsilon$, then for
$n\geq \frac{\log\varepsilon^{-1}\delta}{\log\nu^{-1}}+m$,
$$
m_p(p_n)\leq\varepsilon\nu^m .
$$
\endproclaim
\demo{Proof}
By choosing $\delta$ sufficiently small we can insure, thanks to Property 0,
that each element with diameter less than $\delta$ can intersect at most
$M$ manifolds $S_i$. Since the $S_i$ describe all the possible
discontinuities in our system, it follows that $\sharp \Cal P_1\leq M$.
But the same argument applies to each connected piece of $p_j$: since the
diameter of $T^{-l}p_j$ is, by definition, less than $\delta$, for $l