A Time series is a data set collected through time.
What makes it different from other datasets that we used for regular regression problems are two things:
- It is time dependent. So the basic assumption of a linear regression model that the observations are independent doesn’t hold in this case.
- Most time series have some form of trend – either an increasing or decreasing trend – or some kind of seasonality pattern, i.e. variations specific to a particular time frame.
Basically, this means that the present is correlated with the past.
A value at time T is correlated with the value at T minus 1 but it may also correlated with the value at time T minus 2, maybe not quite as much as T minus 1.
And even at 20 times steps behind, we could still know something about the value of T because they’re still correlated, depending on which kind of time series it is.
And this obviously is not true with normal random data.
Time series are everywhere, for example in:
- Financial data (stocks, currency exchange rates, interest rates)
- Marketing (click-through rates for web advertising)
- Economics (sales and demand forecasts)
- Natural phenomenon (water flow, temperature, precipitation, wind speed, animal species abundance, heart rate)
- Demographic and population and so on.
What might you want to do with time series?
- Smoothing – extract an underlying signal (a trend) from a noise.
- Modelling – explain how the time series arose, for intervention.
- Forecasting – predict the values of the time series in the future.
We first see here which specific characteristics the Time Series (TS) have, and will then see in a second part a concrete example of TS analysis (smoothing + modelling + forecasting).
You can follow along with the associated notebook in GitHub. Continue reading “Introduction to time series – Part I: the basics”