logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-10052020-173535


Tipo di tesi
Tesi di laurea magistrale
Autore
SCARCIGLIA, ANDREA
Indirizzo email
a.scarciglia2@studenti.unipi.it, andrea.scarciglia1@gmail.com
URN
etd-10052020-173535
Titolo
Embedding and Complexity of Time Series: Theory and Applications
Dipartimento
MATEMATICA
Corso di studi
MATEMATICA
Relatori
relatore Prof. Bonanno, Claudio
correlatore Ing. Valenza, Gaetano
Parole chiave
  • AIC
  • Approximate entropy
  • approximate entropy of order 2
  • Fractal Delay Prevalence Embedding Theorem
  • Kolmogorov-Sinai entropy
  • Lempel-Ziv
  • Sample entropy
  • Time Series Analysis
Data inizio appello
23/10/2020
Consultabilità
Completa
Riassunto
In this thesis we introduce an approach to the study of time series study by nonlinear dynamical systems.
The main assumption we make throughout this work is that time series samples can be associated to points of a nonlinear ergodic dynamical system through a map, called measurement map.
We are interested in showing how to classify time series in terms of predictability of their behavior.
Firstly, we give a mathematical framework for some of the quantifiers that will be described later. We face the embedding problem: "almost any" measurement map has to be an embedding, that is an injective immersion. The choice of the embedding is motivated by the fact that differential structure and injectivity are preserved in order to compute quantifiers (e.g. Lyapunov exponents). Moreover, "almost every" has to be intended in the sense of prevalence, a generalization of the notion of probability measure. Since attractors may not have integer dimension, fractal sets are introduced. The main theorem we use to solve the embedding problem is the Fractal Delay Embedding Prevalence Theorem, a slight modification of Witney's embedding theorems and more suitable for time series: it states that "almost every" measurement map is an embedding, provided that the samples are taken and organized in a particular manner.
After introducing notions of ergodic theory and discrete-time stochastic processes, we describe the Kolmogorov-Sinai (K-S) entropy as a quantifier of the predictability of an ergodic dynamical system (and consequently of a time series). Since the K-S entropy computation may be difficult, we obtain K-S entropy by using the Algorithmic Information Content (AIC) of strings.
We convert time series in strings composed by a finite number of symbols. For each of the symbolic string, we define its AIC as the length of the shortest binary program that reproduces the string as output, while its complexity as the mean AIC for each symbol. The notion of complexity is related to the notion of K-S entropy by Brudno's theorem.
It seems reasonable to determine K-S entropy by the AIC, but this is not possible since AIC cannot be performed by any algorithm. In order to approximate the AIC, simple Lempel-Ziv data compression is introduced: it is a lossless prefix-free code which asymptotically compresses almost every string from an ergodic stochastic process, using the less storage as possible, in such a way that the expected binary code length per symbol is equal to the entropy of the process and that it is possible to reconstruct the original series from the encoded one: these properties allow us to think of Lempel-Ziv data compression as an approximation of the AIC.
All tools introduced for computing K-S entropy require limits and consequently a large number of samples which is not always available. Other quantifiers of unpredictability are shown: generalized entropy of order 2 (K2), approximate entropy and sample entropy. They all rely on the geometrical attractor reconstruction and they all estimate the probability of finding similar finite length patterns in a given time series. K2 is a lower bound to the K-S entropy while the other two entropies are seen to work well with a small number of points. Finally, in numerical simulations, we test algorithms of approximate entropy, sample entropy and Lempel-Ziv data compression on circle rotations, logistic maps and physiological data. The aim is to relate entropies to the behavior of the dynamical system underlying the time series. In the case of physiological data, these quantifiers are shown to distinguish clinical behaviors.
File