Before examining the effect of statistical errors in histogram reweighting,
it is instructive to review our understanding of errors in MC simulations.
Because of the finite number of measurements, any quantity measured in a
simulation will suffer from statistical and systematic errors.
[4] This is further complicated by the fact that the measurements
are not, in general, independent. The first careful study of statistical
errors in MC simulations was performed by Müller-Krumbhaar and Binder
[2] more than 20 years ago. They considered the statistical error in
the average value of some quantity f measured in a simulation. If
is the value of f at the
step of the simulation, the average
value of f, <f>, computed from a simulation consisting of N
measurements (after discarding a sufficient number of measurements for
equilibration), is
To calculate the statistical error in <f>, Müller-Krumbhaar and Binder started with the expression for the variance of a sum of N correlated random variables[8]

and related the covariance term to a sum of time-displaced averages. Their final expression is traditionally written as
where
is the integrated correlation time[9] for the
quantity f

and
is the time-displaced autocorrelation function

Our description of statistical errors in histogram reweighting will
follow the Müller-Krumbhaar--Binder formalism rather closely. To
see how this is possible, we first point out that MC data can be
reweighted without using histograms. Consider a MC simulation
performed at
. The average value of some
quantity
calculated using the single histogram method is
where
and E is the total energy of
a configuration. The histogram
is constructed from the time sequence
of energies generated during the simulation
where
is the Kronecker delta function and the sum runs over the N
measurements made during the simulation. By inserting the definition
of
(5), into (4), and performing the sum over
E first, we get the equation for ``reweighting on the fly'', or
reweighting without histograms.
When
, this reduces to the standard expression for the
average of a quantity (1). The ``reweighting on the fly''
approach is useful for analyzing data requiring a multi-dimensional
histogram, or for continuous systems to avoid the need to bin the data.
To simplify the formalism, we define a ``curly-bracket'' notation for
averages that include the reweighting factor
. Each
term inside the
carries along a reweighting factor. Examples of this
notation are:

Note that these averages can also be calculated using the corresponding histograms. The single-histogram equation itself (4) expressed in this notation is

The analysis of errors is complicated by the fact that once reweighting has
been performed, both the numerator
and denominator
in
(6) will suffer from statistical error (in (1), the
denominator is simply the number of measurements, N, which has no error).
We represent the square of these errors by
and
respectively. In addition, we expect that the error in the numerator is
correlated with that of the denominator because both are calculated from the
same set of measurements. It is important to note that this correlation is
present even if there is no correlation between measurements during the
simulation.
If
and
were independent, the square of the
statistical error in
would be given by

or
which is the standard expression for the propagation of error in a function
of two independent variables [10]. However, because
and
are not independent, (7) is not correct,
and in fact overestimates the true error. To properly take this
correlation into account, we must include the covariance
. The
correct expression for the square of the statistical error in
is
then given by

or

The square of the relative error in
takes on a particularly
simple form:

To facilitate this derivation, let us consider the covariance
of
two arbitrary functions R and Q which have the same form as
and
:

From this, we can then easily calculate
,
and
by replacing the functions r and q with f and 1
appropriately. By generalizing the analysis of Müller-Krumbhaar and
Binder, we can define the covariance
as
The double sum over i and j can be replaced by a single sum of time-displaced averages
where t is the time displacement index. The covariance (9) is thus expressed as
To complete the generalization of the Müller-Krumbhaar--Binder formalism, we define a reweighted time-displaced cross-correlation function

and a reweighted correlation time
to finally obtain
With the proper substitutions for r and q in (13) we can now
evaluate
,
and
:

and the relative error in f
The error in a thermodynamic quantity like the energy can be obtained
simply by replacing f with E in (14). To compute the
error in a response function, for example the specific heat, we need
to find the appropriate function
whose average value gives us
the desired quantity. For the specific heat, the function is

so that the specific heat C is given by

The quantities of interest are then:

The time-displaced correlation functions are quite complex. For example, to calculate

for the specific heat, we need
which is is given by

and
which is given by

where

A different approach to calculating the error in C is to make two
independent passes through the data, calculating
in the first
pass, then directly evaluating
in the second. We found
that this second approach is easier to implement, and is more stable
numerically.
The expressions for the error have two different kinds of terms: some that
depends on the simulation algorithm used, containing the reweighted
correlation times, and others that represent equilibrium averages and are
therefore independent of the simulation algorithm. However, unlike the
non-reweighted case, we cannot simply factor out the correlation time
dependence; this will lead to non-trivial differences in how the error
increases with
when we change from one simulation algorithm to
another! Examples making use of the formalism developed here are given in
the next two sections.