In [85], a general method has been proposed to create random data which fulfill specified constraints. With this method, the artefacts and remaining imprecision of the Fourier based randomization schemes can be avoided by specifying the autocorrelation function rather than the Fourier transform. The former does not assume periodic continuation. Maybe more importantly, the restriction to a rather narrow null hypothesis can be relaxed since in principle arbitrary statistical observables can be imposed on the surrogates. A desired property of the data has to be formulated in terms of a cost function which assumes an absolute minimum when the property is fulfilled. States arbitrarily close to this minimal cost can be reached by the method of simulated annealing. The cost function is minimised among all possible permutations of the data. See [85] for a description of the approach.

The TISEAN package contains the building blocks for a library of surrogate data routines implementing user specified cost functions. Currently, only the autocorrelation function with and without periodic continuation have been implemented. Further, a template is given from which the user may derive her/his own routines. A module is provided that drives the simulated annealing process through an exponential cooling scheme. The user may replace this module by other scheme of her/his choice. A module that performs random pair permutations is given which allows to exclude a list of points from the permutation scheme. More sophisticated permutation schemes can be substituted if desired. Most importantly, the cost function has to be given as another module. The autocorrelation modules use , where is the autocorrelation function with or without periodic continuation.

Figure:Upper trace: Data from a stationary Gaussian linear stochastic process () measured by . Samples 200-220 are an artefact. With the Fourier based scheme (middle trace) the artefact results in an increased number of spikes in the surrogates and reduced predictability. In the lower trace, the artefact has been preserved along with the distribution of values and lags 1,...,25 of the autocorrelation function.

In Fig. we show an example fulfilling the null hypothesis of a rescaled stationary Gaussian linear stochastic process which has been contaminated by an artefact at samples 200-220. The Fourier based schemes are unable to implement the artefact part of the null hypothesis. They spread the structure given by the artefact evenly over the whole time span, resulting in more spikes and less predictability. In fact, the null hypothesis of a stationary rescaled Gaussian linear stochastic process can be rejected at the 95% level of significance using nonlinear prediction errors. The artefact would spuriously be mistaken for nonlinearity. With the program randomize_auto_exp_random, we can exclude the artefact from the randomization scheme and obtain a correct test.

Figure:Randomization of 500 points generated by the the Hénon map.(a)Original data;(b)Same autocorrelations and distribution;(c)-(f)Different stages of annealing with a cost functionCinvolving three and four-point correlations.(c)A random shuffle,C=2400;(d)C=150;(e)C=15;(f)C=0.002. See text.

As an example of a more exotic cost function, let us show the randomization
of 500 iterates of the Hénon map, Fig. **(a)**. Panel **
(b)** shows the output of surrogates having the same spectrum and
distribution. Starting from a random permutation **(c)**, the cost function

is minimized (randomize_generic_exp_random). It involves are all the higher order
autocorrelations which would be needed for a least squares fit with the ansatz
and in this sense fully specifies the quadratic
structure of the data. The random shuffle yields *C*=2400, panels
**(c)**-**(f)** correspond to *C*=150,15,0.002 respectively.

Since the annealing process can be very CPU time consuming, it is important to
provide efficient code for the cost function. Specifying
lags for *N* data points requires
multiplications for the calculation of the
cost function. An update after a pair has been exchanged, however, can be
obtained with multiplications. Often, the
full sum or supremum can be truncated since after the first terms it
is clear that a large increase of the cost is unavoidable. The driving
Metropolis algorithm provides the current maximal permissable cost for that
purpose.

The computation time required to reach the desired accuracy depends on the
choice and implementation of the cost function but also critically on the
annealing schedule. There is a vast literature on simulated annealing which
cannot be reviewed here. Experimentation with cooling schemes should keep in
mind the basic concept of simulated annealing. At each stage, the system -
here the surrogate to be created - is kept at a certain ``temperature''.
Like in thermodynamics, the temperature determines how likely fluctuations
around the mean energy - here the value of the cost function *C* - are. At
temperature *T*, a deviation of size occurs with the Boltzmann
probability . In a Metropolis simulation, this is
achieved by accepting *all* downhill changes (), but also
uphill changes with probability . Here the changes are
permutations of two randomly selected data items. The present implementation
offers an exponential cooling scheme, that is, the temperature is lowered by a
fixed factor whenever one of two conditions is fulfilled: Either a specified
number of changes has been *tried*, or a specified number of changes has
been *accepted*. Both these numbers and the cooling factor can be chosen
by the user. If the state is cooled too fast it gets stuck, or ``freezes'' in
a false minimum. When this happens, the system must be ``melted'' again and
cooling is taken up at a slower rate. This can be done automatically until a
goal accuracy is reached. It is, however, difficult to predict how many steps
it will take. The detailed behavior of the scheme is still subject to ongoing
research and in all but the simplest cases, experimentation by the user will
be necessary. To facilitate the supervision of the cooling, the current state
is written to a file whenever a substantial improvement has been
made. Further, the verbosity of the diagnostic output can be selected.

Wed Jan 6 15:38:27 CET 1999