next up previous
Next: Computational issues of simulated Up: General constrained randomisation Previous: General constrained randomisation

Null hypotheses, constraints, and cost functions

As we have discussed previously, we will often have to specify a null hypothesis in terms of a complete set of observable properties of the data. Only in specific cases (e.g. the two point autocorrelation function), there is a one-to-one correspondence to a class of models (here the ARMA process). In any case, if tex2html_wrap_inline1962 denotes a surrogate time series, the constraints will most often be of (or can be brought into) the form
 equation1064
Such constraints can always be turned into a cost function
 equation1066
The fact that tex2html_wrap_inline2102 has a global minimum when the constraints are fulfilled is unaffected by the choice of the weights tex2html_wrap_inline2104 and the order q of the average. The least squares or tex2html_wrap_inline2108 average is obtained at q=2, tex2html_wrap_inline2112 at q=1 and the maximum distance when tex2html_wrap_inline2116. Geometric averaging is also possible (and can be formally obtained by taking the limit tex2html_wrap_inline2118 in a proper way). We have experimented with different choices of q but we haven't found a choice that is uniformly superior to others. It seems plausible to give either uniform weights or to enhance those constraints which are particularly difficult to fulfil. Again, conclusive empirical results are still lacking.

Consider as an example the constraint that the sample autocorrelation function of the surrogate tex2html_wrap_inline2122 (data rescaled to zero mean and unit variance) are the same as those of the data, tex2html_wrap_inline2124. This is done by specifying zero discrepancy as a constraint tex2html_wrap_inline2126. If the correlations decay fast, tex2html_wrap_inline2128 can be restricted, otherwise tex2html_wrap_inline2130 (the largest available lag). Thus, a possible cost function could read
 equation1072
Other choices of q and the weights are of course also possible.

In all the cases considered in this paper, one constraint will be that the surrogates take on the same values as the data but in different time order. This ensures that data and surrogates can equally likely be drawn from the same (unknown) single time probability distribution. This particular constraint is not included in the cost function but identically fulfilled by considering only permutations without replacement of the data for minimisation.

By introducing a cost function, we have turned a difficult nonlinear, high dimensional root finding problem (21) into a minimisation problem (22). This leads to extremely many false minima whence such a strategy is discouraged for general root finding problems [42]. Here, the situation is somewhat different since we need to solve Eq.(21) only over the set of all permutations of tex2html_wrap_inline1972. Although this set is big, it is still discrete and powerful combinatorial minimisation algorithms are available that can deal with false minima very well. We choose to minimise tex2html_wrap_inline2102 among all permutations tex2html_wrap_inline1962 of the original time series tex2html_wrap_inline1972 using the method of simulated annealing. Configurations are updated by exchanging pairs in tex2html_wrap_inline1962. The annealing scheme will decide which changes to accept and which to reject. With an appropriate cooling scheme, the annealing procedure can reach any desired accuracy. Apart from simulated annealing, genetic algorithms [35] have become very popular for this kind of problems and there is no reason why they couldn't be used for the present purpose as well.


next up previous
Next: Computational issues of simulated Up: General constrained randomisation Previous: General constrained randomisation

Thomas Schreiber
Mon Aug 30 17:31:48 CEST 1999