next up previous
Next: Spike trains Up: Various Examples Previous: Multivariate data

Uneven sampling

  Let us show how the constrained randomisation method can be used to test for nonlinearity in time series taken at time intervals of different length. Unevenly sampled data are quite common, examples include drill core data, astronomical observations or stock price notations. Most observables and algorithms cannot easily be generalised to this case which is particularly true for nonlinear time series methods. (See [41] for material on irregularly sampled time series.) Interpolating the data to equally spaced sampling times is not recommendable for a test for nonlinearity since one could not a posteriori distinguish between genuine structure and nonlinearity introduced spuriously by the interpolation process. Note that also zero padding is a nonlinear operation in the sense that stretches of zeroes are unlikely to be produced by any linear stochastic process.

For data that is evenly sampled except for a moderate number of gaps, surrogate sequences can be produced relatively straightforwardly by assuming the value zero during the gaps and minimising a standard cost function like Eq.(23) while excluding the gaps from the permutations tried. The error made in estimating correlations would then be identical for the data and surrogates and could not affect the validity of the test. Of course, one would have to modify the nonlinearity measure to avoid the gaps. For data sampled at incommensurate times, such a strategy can no longer be adopted. We then need different means to specify the linear correlation structure.

Two different approaches are viable, one residing in the spectral domain and one in the time domain. Consider a time series sampled at times tex2html_wrap_inline2310 that need not be equally spaced. The power spectrum can then be estimated by the Lomb periodogram, as discussed for example in Ref. [42]. For time series sampled at constant time intervals, the Lomb periodogram yields the standard squared Fourier transformation. Except for this particular case, it does not have any inverse transformation, which makes it impossible to use the standard surrogate data algorithms mentioned in Sec. 4. In Ref. [43], we used the Lomb periodogram of the data as a constraint for the creation of surrogates. Unfortunately, imposing a given Lomb periodogram is very time consuming because at each annealing step, the O(N) spectral estimator has to be computed at tex2html_wrap_inline2314 frequencies with tex2html_wrap_inline2316. Press et al. [42] give an approximation algorithm that uses the fast Fourier transform to compute the Lomb periodogram in tex2html_wrap_inline2318 time rather than tex2html_wrap_inline2320. The resulting code is still quite slow.

As a more efficient alternative to the commonly used but computationally costly Lomb periodogram, let us suggest to use binned autocorrelations. They are defined as follows. For a continuous signal s(t) (take tex2html_wrap_inline2324, tex2html_wrap_inline2326 for simplicity of notation here), the autocorrelation function is tex2html_wrap_inline2328. It can be binned to a bin size tex2html_wrap_inline2330, giving tex2html_wrap_inline2332. We now have to approximate all integrals using the available values of tex2html_wrap_inline2334. In general, we estimate
equation1080
Here, tex2html_wrap_inline2336 denotes the bin ranging from a to b and tex2html_wrap_inline2342 the number of its elements. We could improve this estimate by some interpolation of tex2html_wrap_inline2344, as it is customary with numerical integration but the accuracy of the estimate is not the central issue here. For the binned autocorrelation, this approximation simply gives
 equation1082
Here, tex2html_wrap_inline2346. Of course, empty bins lead to undefined autocorrelations. If we have evenly sampled data and unit bins, tex2html_wrap_inline2348, then the binned autocorrelations coincide with ordinary autocorrelations at tex2html_wrap_inline2350.

Once we are able to specify the linear properties of a time series, we can also define a cost function as usual and generate surrogates that realise the binned autocorrelations of the data. A delicate point however is the choice of bin size. If we take it too small, we get bins that are almost empty. Within the space of permutations, there may be only a few ways then to generate precisely that value of tex2html_wrap_inline2352, in other words, we over-specify the problem. If we take the bin size too large, we might not capture important structure in the autocorrelation function.

As an application, let us construct randomised versions of part of an ice core data set, taken from the Greenland Ice Sheet Project Two (GISP2) [44]. An extensive data base resulting from the analysis of physical and chemical properties of Greenland ice up to a depth of 3028.8 m has been published by the National Snow and Ice Data Center together with the World Data Center-A for Palaeoclimatology, National Geophysical Data Center, Boulder, Colorado [45]. A long ice core is usually cut into equidistant slices and initially, all measurements are made versus depth. Considerable expertise then goes into the dating of each slice [46]. Since the density of the ice, as well as the annual total deposition, changes with time, the final time series data are necessarily unevenly sampled. Furthermore, often a few values are missing from the record. We will study a subset of the data ranging back 10000 years in time, corresponding to a depth of 1564 m, and continuing until 2000 years before present. Figure 14 shows the sampling rate versus time for the particular ice core considered.

 figure1084
Figure:   Sampling rate versus time for an ice core time series.

We use the tex2html_wrap_inline2354O time series which indicates the deviation of the tex2html_wrap_inline2356 tex2html_wrap_inline2358 ratio from its standard value tex2html_wrap_inline2360: tex2html_wrap_inline2362. Since the ratio of the condensation rates of the two isotopes depends on temperature, the isotope ratio can be used to derive a temperature time series. The upper trace in Fig. 15 shows the recording from 10000 years to 2000 years before present, comprising 538 data points.

In order to generate surrogates with the same linear properties, we estimate autocorrelations up to a lag of tex2html_wrap_inline2364 years by binning to a resolution of 5 y. A typical surrogate is shown as the lower trace in Fig. 15. We have not been able to detect any nonlinear structure by comparing this recording with 19 surrogates, neither using time asymmetry nor prediction errors. It should be admitted, however, that we haven't attempted to provide nonlinearity measures optimised for the unevenly sampled case. For that purpose, also some interpolation is permissible since it is then part of the nonlinear statistic. Of course, in terms of geophysics, we are asking a very simplistic question here. We wouldn't really expect strong nonlinear signatures or even chaotic dynamics in such a single probe of the global climate. All the interesting information -- and expected nonlinearity -- lies in the interrelation between various measurements and the assessment of long term trends we have deliberately excluded by selecting a subset of the data.

 figure1085
Figure:   Oxygen isotope ratio time series derived from an ice core (upper trace) and a corresponding surrogate (lower trace) that has the same binned autocorrelations up to a lag of 1000 years at a resolution of 5 years.


next up previous
Next: Spike trains Up: Various Examples Previous: Multivariate data

Thomas Schreiber
Mon Aug 30 17:31:48 CEST 1999