Rainer Hegger |
Holger Kantz |
Thomas Schreiber |

Part II: Linear models and simple prediction

Download the data set

- Inspect the time series visually, e.g. by gnuplot (amount of data, obvious
artefacts, typical time scales, qualitative behaviour on short times)

- Compute the autocorrelation function (corr)

- Which is a reasonable order for an AR-model?

Use ar-model to fit AR-models to the data.

Study the residuals, i.e. the differences between determinsitic part of the AR-model and the next observations. Inside gnuplot:

plot [0:1000]'< ar-model amplitude.dat -p10' u($0+10):1, '< ar-model amplitude.dat -p50' u ($0+50):1

Plot the data also in reversed order (since one curve partly hides the other), and together with amplitude.dat. Read the description of ar-model to understand what you see in the plot, and reduce and increase the order of the model (controlled by the -p option) as far as your patience allows you to go (the computation time increases quadratically in p).

- Result: the residuals have pronounced spikes at certain points of the time
series even for very large order of the model.
This demonstrates that the data do not stem from a linear
stochastic process. Nonetheless, their magnitude compared to the
amplitude of the signal is small. Hence,
if one wants to use a linear model,
p=10 is a reasonable compromise between
model complexity and performance.

- Now use ar-model to
produce a new time series:

ar-model -s5000 amplitude.dat -p10 -o, the output in amplitude.dat.ar is now, with the -s5000 option, the iterated model time series of length 5000.

- Compare the two time series in the time domain.
Also, compute the histograms using the
routine histogram:

mycomputer> histogram amplitude.dat -b0

Using amplitude.dat as datafile, reading column 1

Use 5000 lines.

Writing to stdout

#interval of data: [-1.463000e+01:1.727000e+01]

#average= 1.463300e-01

#standard deviation= 7.994755e+00

The ar-data have zero mean by construction. If you wish to superimpose the two histograms, you thus should shift the one with respect to the other by the mean value of the data:

set data style histep

plot '< histogram amplitude.dat' u ($1-.146):2,'< histogram ar.dat'

Result: The data sets are differnt: the distribution of ar.dat is closer to a Gaussian (and converges to a Gaussian for longer time series, try plot '< ar-run -l100000 amplitude.dat.ar | histogram' ).

- Compute the auto-correlation functions and the power
spectra (by either mem_spec or
spectrum) of both of them:

corr amplitude.dat -D500 -o

corr ar.dat -D500 -o

set data style lines

plot 'ar.dat.cor','amplitude.dat.cor'

spectrum amplitude.dat -o

spectrum ar.dat -o

set logscale y

plot 'amplitude.dat_sp','ar.dat_sp'

Result: The AR-data contain the same temporal correlations, but they decay much faster than in amplitude.dat.

The spectra have to be compared with both linear and logarithmic y-scale. The frequency around 0.03 is dominant in both data sets, the harmonics of that visible in amplitude.dat_sp are suppressed in ar.dat_sp. This reflects that the AR-model contains the relevant time scales, but has shortcomings in a quantitative comparison. However, these are not too dramatic when only viewed with second order statistics. The differences will be more evident in the higher order correlations and other nonlinear concepts.

- Repeat the exercise starting from the ar-data you generated (file
ar.dat). You should observe that fitting an ar-model to ar-data will
yield residuals with a gaussian distribution, and that the
histograms, auto-correlation
functions and power spectra of the model data are identical to those
of the input data, if the order of the fit ( -p) is not smaller than the order of the model by
which the data were produced.

- Visualize both amplitude.dat and ar.dat in a delay embedding (do
not forget to
reset the gnuplot, e.g., set nologs), using
delay :

Start with -d1 and increase it, at least up to 50. What is optimal by a) visual impression, and what should be optimal when b) considering the auto-correlation function?

**Answers:**

amplitude.dat: a) About 8, when unfolding is good but overlap is still small. b) about 8: the first zero of the autocorrelation function would be optimal for a harmonic, periodic signal embedded in 2 dimensions.

ar.dat: a) for delay 8, the shape of the blob of lines comes close to circular, hence indicating sufficient decorrelation of the components of the delay vectors. b) The auto-correlation function yields about the same as for amplitude.dat.

- compute the false nearest neighbour statistics
(false_nearest):

false_nearest amplitude.dat -M8 -d8 -o -t200 -f5

Study the output, amplitude.dat.fnn, and observe the invariance of the result (namely that the embedding dimension 3 is insufficient but 4 is o.k.) under change of the time lag.

- Use the zeorth-order predictor
(zeroth)
on amplitude.dat and on ar.dat.

zeroth amplitude.dat -m1,4 -d8 -o -s250

zeroth ar.dat -m1,4 -d8 -o -s250

plot [][0:1.5] 'amplitude.dat.zer','ar.dat.zer',.05*exp(.02*x)

You should be able to verify the following observations:

For increasing prediction horizion, the prediction errors of amplitude.dat show two regimes: Exponential increase of the error due to chaos (the regime of nonlinear deterministic dynamics), slow linear increase due to loss of phase locking (the regime of linear correlations due to the rather constant period of the oscillations), constant when the predictions lose all correlations to the actual values (limit of unpredictability for a large prediction horizon of more time steps than can be computed with this data set, the relative prediction error saturates at 1. In order to arrive a prediction horizons larger than one half of the data set, you must switch off the causality window by the -C0 option in zeroth).

No succesful prediction for ar.dat beyond the linear correlations. Since ar.dat is a linear stochastic data set, it does not contain phase space information.