instructions:
This paper is a plain TeX file followed by a post-script file which creates a
figure,refered in the text as Fig.1.
I inserted dashed lines specifying the beginning and the end of the TeX file
and of the ps file.
BODY
---------------------------BEGINNING OF TeX FILE------------------------------
\magnification=1200
\def\doublespace{\baselineskip=\normalbaselineskip \multiply\baselineskip by 2}
\def\singlespace{\baselineskip=\normalbaselineskip}
\def\today{\ifcase\month\or
January\or February\or March\or May\or April\or June\or
July\or August\or September\or October\or November\or December\fi
\space\number\day, \number\year}
\rightline {RU--91--46}
\rightline {October 15, 1991}
\vskip 1.5cm
\centerline {\bf A possible barrier at $z\approx 1$ for local algorithms. }
\vskip 1cm
\centerline {by}
\vskip 1cm
\centerline{Georgios Bathas and Herbert Neuberger}
\vskip .2cm
\centerline{\it Department of Physics and Astronomy}
\centerline{\it Rutgers University}
\centerline{\it Piscataway, NJ 08855--0849}
\vskip 2cm
\centerline{\bf Abstract}
\doublespace
It is shown that a certain class of generalizations of over-relaxation
algorithms is incapable to further reduce the dynamical
exponent $z$ below its standard over-relaxed value of $z\approx 1$. The $\approx$ sign means that
the mean field value is one, while the true value
can be somewhat different in certain dimensions. The generalizations
are obtained by viewing over-relaxation as a slightly deformed deterministic
algorithm and should, therefore, hold for Hybrid Monte Carlo algorithms as well.
\vfill
\eject
\vskip 5cm
Massively parallel computer architectures are best utilized by local Monte-Carlo
algorithms [1,2]. By locality we mean the following: At each iteration an old
site variable, generically denoted by $\phi (x) $, where $x$ is the site, is replaced by a
new site variable, $\phi \prime (x) $, and $\phi \prime (x) = F_x ( \phi (y))$
with $y$ restricted to a finite vicinity of $x$. By a finite vicinity we mean one
that contains a finite, volume independent, number of sites. Such an algorithm is
parallelizable because we can split the lattice into a finite
number of equal sub-lattices (depending on the range of site interdependence at updates)
such that updates of a variable in any given sub-lattice can depend
only on the variables in the other sub-lattices and thus perform the update
of an entire sub-lattice in parallel. The sub-lattices intertwine building
up the
complete lattice.
Let us denote by $\phi(x,n)$ the site variable at $x$ during the $n$-th iteration.
In most applications the system being simulated is translational invariant.
This implies that the autocorrelation matrix for the site variables, asymptotically, decouples
into finite blocks of size determined by the number of sub-lattices, $N$:
By translational invariance we mean that, assuming that we
started from a uniform configuration, the probability distribution of the variable $\phi(x,n)
\phi(y,n+m)~~m > 0$ remains unchanged under a simultaneous shift of $x$ and $y$ restricted to keep each site
in its original sub-lattice. It's easy to see that ordinary algorithms respect this, as a
one to one mapping between configurations can be established by simply shifting around the random numbers
generated when updating the sites. As a consequence we obtain that the autocorrelation matrix
can be block diagonalized for $n$ large enough that the probability for the set $\phi(x,n)$ be invariant
under shifts in $x$ [3]. Each block is of size $N \times N$, corresponding to the $N$ regions the Brillouin zone becomes devided into as a result of the
restriction on the allowed translations.
Since the algorithm is local it is reasonable to accept the supposition that there exists a field
theoretical representative of its dynamical universality class [4]. We are trying to find then the
dynamical renormalization group trajectory connecting the ordinary dynamical GL fixed point to the true fixed point.
Thus it makes some sense to start at the single place that we have under control calculationally,
namely the free field case. One might be suspicious at having something interesting to say about a free field system;
however the problem is not entirely trivial because finding a
{\sl local} algorithm (equivalently: dynamics) that has a critical exponent $z$ smaller than 2 isn't that easy even for the free field case. Once we know the behavior for the free field case
we ought to proceed to investigate the effect of interactions by the ordinary field theoretical
tools, namely, characterizing the strength of perturbations, finding the dimension where the
theory is renormalizable, etc. In the present note we shall only deal with the simplest first step.
As we shall come up with a negative result there is no room for the above mentioned subsequent steps.
However, our result does indicate that no new sort of dynamical fixed point, with a $z$ smaller than
1, and continuously connected to the dynamical GL fixed point can exist, leaving us only with ordinary
Langevin dynamics and with Over-relaxation. We don't have a rigorous proof of this assertion because we shall
be making some additional assumptions on the way. We do suspect however that these assumptions are
not restrictive in reality and that the assertion holds as stated.
Let us review first how over-relaxation works [5]. It is sufficient to have only two sub-lattices. The
block that contains the slowest modes is the one coupling the two sub-lattice field averages, $\phi_{1,2}$. In the
critical case the mass term is zero and the energy depends only on the difference of the modes. Hence
the Hamiltonian, when restricted to the $\phi_{1,2}$ subspace, can be written as:
$$
H_0 =(\phi_1 - \phi_2 )^2 = \phi_i h_{ij} \phi_j;~~~~~~~h_{11}=h_{22}=-h_{12}=1
\eqno{(1)}
$$
A mass perturbation will change the Hamiltonian by
$$
\delta H_0 = m^2 \phi_j \phi_j\eqno{(2)}
$$
The structure of the auto-correlation matrix is compatible with the assumption that the algorithm at each
iteration only couples the two modes of the block (every mode couples only to the one whose crystal momentum is shifted by $(\pi, \pi,\ldots, \pi)$). Over-relaxation algorithms fulfill this and,
moreover, have a limit (when an over-relaxation parameter is taken to an extreme value) where they become deterministic.
In this limit the algorithm becomes a mapping under which $H$ is preserved. This mapping should be viewed
as some discrete generalization of Hamiltonian mechanics and hence is required to preserve the measure
$d\phi_1 d\phi_2$ as an analogue of Liouville's theorem. This is needed for
the algorithm to produce the right limiting distribution. The simplest way to
ensure this is to make the mapping a linear operator $T$:
$$
\phi_{\alpha}(n+1) = T_{\alpha,\beta} \phi_\beta (n) \eqno{(3)}
$$
The preservation of the measure means that $T$ has unit determinant and,
for convergence, we forbid any of the eigenvalues of $T$ to be outside the unit circle. This implies that
all of $T$'s eigenvalues are on the unit circle.
When the mass is zero, we expect no convergence (we shall consider oscillatory
behavior with a constant amplitude as ``convergence'' and the
time scale governing this convergence as given by the related phase) and therefore one of the eigenvalues of $T$ (the one corresponding to the total average) is real and equal to 1. When the mass term is turned on this eigenvalue of $T$ changes, and for
small $m$ should go as $\lambda = \exp (i\theta ) ~{\rm with}~ \theta \propto m^{z}$ where $z$
is the dynamical exponent. Ordinarily, $T$ would depend linearly on $H$ and the order $m^2$ perturbation would
translate into order $m^2$ changes in the eigenvalues hence $z=2$. We assume that, from the point of
view of matrix perturbation theory, the perturbation induced by the mass term in $T$ is generic.\footnote{*}{Even if this
is not true for one particular block containing a soft mode it can't be true of all
the blocks that contain soft modes.} Essentially the same
picture holds in all the blocks that contain a very low momentum (instead of
strictly zero) mode. To achieve something else
we need a situation where a generic order $m^2$ perturbation of $T$ causes a
{\sl larger} change in the eigenvalues
of $T$. This can happen only if, at $m=0$, $T$ is {\sl nondiagonalizable}, because, otherwise, we can use
ordinary perturbation theory. In our case this leaves us with
the single possibility, that in some basis, $T$ contains a sub-block of Jordan form with unity on the diagonal, i.e.:
$$T=\pmatrix{\underbrace{\pmatrix{1&1&0&0&\ldots&0\cr
0&1&1&0&\ldots&0\cr
0&0&1&\ddots& &\vdots\cr
&&&\ddots&\ddots& \cr
&&&0&1&1\cr
0&0&\ldots&&0&1\cr}}_{n \times n
\rm\;Jordan\rm\; block}&0
\cr
0&
\pmatrix{ &\ldots & \cr
&\ldots & \cr
&\ldots & \cr}} \eqno{(4)}$$
For example, with ordinary over-relaxation applied to
$$
S[\phi ]={1\over 2} \sum_{x,\mu } [\phi (x+\mu ) - \phi (x) ]^2 \eqno{(5)}
$$
where $x$ are sites on a periodic $d$ dimensional hypercubic lattice we have, in the deterministic limit,
$$
T=\pmatrix {1&-4\cr 0&1 } \eqno{(6)}
$$
for the sub-block operating in the space spanned by $\phi_1 \pm \phi_2 $. This matrix is similar to a $2\times2$
Jordan block.
When $m \neq 0 $ there are two conjugate eigenvalues, $\exp ({\pm i\theta})$ and $\theta$ is of order $m$. When
the over-relaxation parameter is tuned down from the deterministic limit both eigenvalues decrease in
absolute magnitude until, at some point, they become real. Throughout the range where they are complex they
are at an order $m$ distance from unity in the complex plane. The range the over-relaxation parameter is varying in
is of the same order $m$ also so its effect on the eigenvalues is of the same order as that of the direct mass
perturbation. The main observation is that the variability of the over-relaxation parameter away from the
deterministic limit is not the main mechanism by which the exponent $z$ gets reduced, but rather, it is the
deterministic limit that de ``main job'' of reducing $z$ and the variability of the over-relaxation
parameter only permits the introduction of dissipation (noise) without disrupting this ``achievement'' of the deterministic limit.
It is clear now how one might try to generalize and do even better. We wish to
make the arc AO in Fig. 1 of length $ ~O(m^z)$ with $z$ now less than unity. When there are more sub-lattices one will have
higher dimensional sub-blocks in the matrix $T$ and to get a smaller $z$ we would like to arrange for the sub-block of $T$ containing the zero-mode to be similar to a
matrix with a larger irreducible Jordan block. If the whole matrix is a single block of size $n \times n$ we could get
$z={2 \over n}$. A perturbation of order $m^2$ at every entry of an $ n \times n$ Jordan
block will change its characteristic equation, from $(\lambda -1)^n = 0$ to
$(\lambda - 1)^n = const. \cdot m^2 + ~O(m^2 (1-\lambda))$ and hence $(\lambda-1) \sim ~O(m^{2/n})$\footnote{*}{It is easy to see that
$const. \neq 0$ if a certain matrix element of the perturbation (the lower-left
corner) doesn't vanish; this will happen in the generic case. To get a given
$z={2 \over n}$ the minimal needed size of the Jordan block is $n \times n$.}.
We shall show below that, regretfully, any acceptable $T$ cannot contain a block with $n > 2$.
\eject
Since $H$ is translationally invariant we take it as being block diagonal in the same basis that we assume $T$ is
and we again concentrate only on the block containing the slowest mode. In it we have:
$$
H_0 =\phi_\alpha h_{\alpha , \beta } \phi_\beta ~~~~~~\alpha , \beta = 1,2,.....,n. \eqno{(7)}
$$
In the same block we assume that there exists a nonsingular matrix $S$ such that $S^{-1} T S=J$ where $J_{\alpha , \beta} =
\delta_{\alpha ,\beta} +\delta_{\alpha +1 ,\beta}$ for $\alpha, \beta =1,2,...,n$. Energy conservation, $T^T h T= h$,
implies:
$$
J^T k J = k. \eqno{(8)}
$$
The above equations can be viewed as a set of equations for the matrix elements of $S$ where $h$ is given and $k=S^T
h S$. Explicitly, they read:
$$
k_{\alpha , \beta-1 } +k_{\alpha -1 , \beta } + k_{{\alpha-1}, {\beta-1} }=0
\eqno{(9a)}
$$
$$
k_{0,\alpha} \equiv k_{\alpha , 0} \equiv k_{0,0} \equiv 0 \eqno{(9b)}
$$
The relation between the bilinear forms $h$ and $k$ means that they have equal numbers of positive, vanishing and
negative eigenvalues. On the other hand we know that $h$ has a single zero eigenvalue corresponding to the total
field average and all other eigenvalues are strictly positive. Therefore this must also be true of $k$. The above
equations in conjunction with the semi-definiteness of $k$ lead to a
contradiction if $n > 2$.
To see this let us agree that we shall call a line of elements in the matrix $k$ defined by $\{k_{\alpha \beta} ;~
\alpha + \beta =m\}$ an $m$ ``off-diagonal''. The equation on $k$ implies that the $m$ off-diagonals with $m < n$
vanish. The $n$ off-diagonal also has to vanish because the determinant of $k$ vanishes. The first nonvanishing
off-diagonal has, as a result of the equation for $k$, entries that are equal in magnitude, but alternate in
sign. Since $k$ is symmetric it must be that the $m$ of the first nonvanishing off-diagonal is odd. So $k$ looks like:
\eject
\def\a{\matrix{0&0&0\cr\cdot&\cdot&\cdot\cr\cdot&\cdot&\cdot}}
\def\b{\matrix{0&\cdot&\cdot\cr0&\cdot&\cdot\cr0&\cdot&\cdot}}
\def\c{\matrix{0&0&\gamma\cr0&-\gamma&\star\cr\gamma&\star&\star}}
\def\e{\matrix{\cdot&\cdot&\cdot\cr\cdot&\cdot&\cdot\cr\cdot&\cdot&\cdot}}
\def\d{\matrix{0&0&0\cr0&\cdot&\cdot\cr0&\cdot&\cdot}}
\def\f{\pmatrix{0&0&\gamma\cr0&(-\gamma)&\star\cr\gamma&\star&\star}}
\def\g{\matrix{\star&\star&\star\cr\star&\star&\star\cr\star&\star&\star}}
$$k=\pmatrix{\d&\a&\a&\a&\a&\a\cr
\b&\e&\e&\e&\e&\c\cr
\b&\e&\e&\e&\e&\g\cr
\b&\e&\e&\f&\g&\g\cr
\b&\e&\e&\g&\g&\g\cr
\b&\c&\g&\g&\g&\g\cr}
\eqno{(10)}
$$
The determinants of the sub-matrices of size $l \times l$ ($l$ odd),
containing a piece of the original nonvanishing $m$ off-diagonal as an $l$ off-diagonal must all be positive. This cannot be true because their signs alternate with $l$. The
single way out is that the largest $l$ available be $l=1$ and this happens in ordinary over-relaxation, when $n=2$.
The previous argument also excludes the possibility that the sub-block of $T$ containing the zero mode is not fully reducible but only has a further sub-block of Jordan form with $n > 2$. One simply applies the above logic to the appropriate principal sub-matrix of $k$
(by ``principal'' we mean that it contains a piece of the original diagonal as its diagonal). This sub-matrix must also be positive semi-definite (it could no longer have a zero mode, but this doesn't help) and the above argument applies again.
We implicitly assumed in the above that $S$ is real; we know that $T$ is real but $S$ will be complex if the
eigenvalues of $T$ have imaginary parts. It is easy to see that in this case the equations $k$ has to satisfy are
even more restrictive and again no solution exists.
In conclusion we have shown that, quite generally, one cannot come up with a deterministic local algorithm whose
$z$ would be lower than unity in the free field case. Intuitively, since a local algorithm must change
just one spin in a vicinity at a time, we would expect that the time to reverse a long wavelength excitation would grow at least linearly with wavelength [6]. Therefore the result seems reasonable. The main new point of view presented in this note is that over-relaxation works
just because it is a delicate deformation of a deterministic algorithm.\footnote{*}{As shown by Creutz [7], this deformation may not be needed at all in practice.} This puts over-relaxation closer conceptually to the Hybrid Monte Carlo algorithms [2,8] currently in use - the single difference is that here the system itself supplies also the conjugate momenta, while in Hybrid Monte Carlo, extra variables are introduced. For example, the zero mode of the
field ($\Phi$) in Hybrid Monte Carlo is coupled to the external conjugate variable ($\Pi$) by
a $2\times 2$ matrix, which, in the massless limit, with leap frog updating, takes the form:
$$
T=\pmatrix {1&{\delta \tau }\cr 0&1 } \eqno{(11)}
$$
Here the basis is $\Pi, \Phi$ and $\delta \tau$ is the time step. This equation
is the direct analogue of the corresponding equation in the over-relaxed case (eq. (6)). We have little doubt that the main conclusion regarding the dynamical critical exponent $z$ of the present note holds for all possible local generalizations of Hybrid Monte Carlo algorithms as well.
\vfill
\eject
\noindent {\bf Acknowledgements.} This research was supported in part by the DOE under grant \# DE-FG05-90ER40559. We would like to thank Gyan Bhanot, Urs Heller and Sorin Solomon for useful comments.
\vskip 1cm
\leftline {\bf References}
\vskip .5cm
\item {[1]} S. Adler, {\bf Nucl. Phys. B. (Proc. Suppl.) 9} (1989) 437.
\item {[2]} A. Kennedy, {\bf Nucl. Phys. B. (Proc. Suppl.) 4} (1988) 576.
\item {[3]} U. Heller, H. Neuberger, {\bf Phys. Rev. D39} (1989) 616.
\item {[4]} P. C. Hohenberg, B. I. Halperin, {\bf Rev. Mod. Phys. 49} (1977) 435.
\item {[5]} S. Adler, {\bf Phys. Rev. D37} (1988) 458; H. Neuberger, {\bf
Phys. Rev. Lett. 59} (1987) 1877.
\item {[6]} A. Sokal, {\bf Nucl. Phys. B. (Proc. Suppl.) 20} (1991) 55.
\item {[7]} M. Creutz, {\bf Phys. Rev. D36} (1987) 515.
\item {[8]} A. Kennedy, B. Pendleton, {\bf Nucl. Phys. B. (Proc. Suppl.) 20} (1991) 118.
\vskip 1cm
\leftline {\bf Figure Caption}
\vskip .5cm
\noindent\bf {Fig. 1: }\sl{$T$-eigenvalue movement in the complex plane: On $OA$ and $OB$ the over-relaxation parameter is set to its deterministic limit and the mass is increased from $m=0$ corresponding to $O$. On $AC$ and $BC$ the mass is fixed to a non-zero value and the over-relaxation parameter is varied away from the deterministic limit introducing dissipation. The whole line ACB is at a distance of order $m$ from $O$, a property inherited from the point $A$.}
\vfill \eject \input psfig \centerline{\psfig{figure=ky19.ps}} \vskip .5cm \centerline{\bf Fig. 1} \vfill \eject
\bye
--------------------------END OF TeX FILE-------------------------------------
THIS SPACE SEPARATES THE TeX FILE FROM THE ps FILE USED FOR FIG.1
--------------------------BEGINNING OF ps FILE--------------------------------
%!
%%Title: ky19.fig
%%Creator: fig2dev
%%CreationDate: Mon Oct 14 09:03:13 1991
%%For: bathas@upquark.rutgers.edu (George Bathas)
%%Pages: 0
%%BoundingBox: 0 0 216 216
%%EndComments
/$F2psDict 32 dict def
$F2psDict begin
$F2psDict /mtrx matrix put
/DrawEllipse {
/endangle exch def
/startangle exch def
/yrad exch def
/xrad exch def
/y exch def
/x exch def
/savematrix mtrx currentmatrix def
x y translate xrad yrad scale 0 0 1 startangle endangle arc
savematrix setmatrix
} def newpath 0 0 0 0 0 1 DrawEllipse stroke
/DrawSplineSection {
/y3 exch def
/x3 exch def
/y2 exch def
/x2 exch def
/y1 exch def
/x1 exch def
/xa x1 x2 x1 sub 0.666667 mul add def
/ya y1 y2 y1 sub 0.666667 mul add def
/xb x3 x2 x3 sub 0.666667 mul add def
/yb y3 y2 y3 sub 0.666667 mul add def
x1 y1 lineto
xa ya xb yb x3 y3 curveto
} def
end
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
/$F2psEnd {$F2psEnteredState restore end} def
%%EndProlog
$F2psBegin
1 setlinecap 1 setlinejoin
-180 126 translate
0.000000 216.000000 translate 0.900 -0.900 scale
3.000 setlinewidth
newpath 313.375 259.000 85.625 -39.966 39.966 arc
stroke
1.000 setlinewidth
% Ellipse
newpath 319 259 81 81 0 360 DrawEllipse stroke
% Polyline
newpath 369 264 moveto 389 264 lineto stroke
stroke
newpath 381.000 262.000 moveto 389.000 264.000 lineto 381.000 266.000 lineto stroke
% Polyline
newpath 369 254 moveto 349 254 lineto stroke
stroke
newpath 357.000 256.000 moveto 349.000 254.000 lineto 357.000 252.000 lineto stroke
% Polyline
newpath 319 379 moveto 319 139 lineto stroke
stroke
newpath 317.000 147.000 moveto 319.000 139.000 lineto 321.000 147.000 lineto stroke
% Polyline
newpath 199 259 moveto 439 259 lineto stroke
stroke
newpath 431.000 257.000 moveto 439.000 259.000 lineto 431.000 261.000 lineto stroke
% Open spline
newpath 379.000 204.000 moveto 376.500 211.500 lineto
376.500 211.500 374.000 219.000 371.500 229.000 DrawSplineSection
371.500 229.000 369.000 239.000 369.000 246.500 DrawSplineSection
369.000 254.000 lineto stroke
% Open spline
newpath 379.000 314.000 moveto 376.500 306.500 lineto
376.500 306.500 374.000 299.000 371.500 289.000 DrawSplineSection
371.500 289.000 369.000 279.000 369.000 271.500 DrawSplineSection
369.000 264.000 lineto stroke
/Palatino-BoldItalic findfont 18.000 scalefont setfont
409 284 moveto
1 -1 scale
(O) gsave 0.000 rotate show grestore 1 -1 scale
/Palatino-BoldItalic findfont 18.000 scalefont setfont
344 284 moveto
1 -1 scale
(C) gsave 0.000 rotate show grestore 1 -1 scale
/Palatino-BoldItalic findfont 18.000 scalefont setfont
379 334 moveto
1 -1 scale
(A) gsave 0.000 rotate show grestore 1 -1 scale
/Palatino-BoldItalic findfont 18.000 scalefont setfont
379 199 moveto
1 -1 scale
(B) gsave 0.000 rotate show grestore 1 -1 scale
$F2psEnd
----------------------------END OF ps FILE-----------------------------------
ENDBODY