A Time series is a data set collected through time.

What makes it different from other datasets that we used for regular regression problems are two things:

- It is
**time dependent**. So the basic assumption of a linear regression model that the observations are **independent** doesn’t hold in this case.
- Most time series have some form of
**trend – **either an increasing or decreasing trend – or some kind of seasonality **pattern**, i.e. variations specific to a particular time frame.

Basically, this means that the present is correlated with the past.

A value at time T is correlated with the value at T minus 1 but it may also correlated with the value at time T minus 2, maybe not quite as much as T minus 1.

And even at 20 times steps behind, we could still know something about the value of T because they’re still correlated, depending on which kind of time series it is.

And this obviously is not true with normal random data.

Time series are **everywhere**, for example in:

- Financial data (stocks, currency exchange rates, interest rates)
- Marketing (click-through rates for web advertising)
- Economics (sales and demand forecasts)
- Natural phenomenon (water flow, temperature, precipitation, wind speed, animal species abundance, heart rate)
- Demographic and population and so on.

What might you want to do with time series?

**Smoothing** – extract an underlying signal (a trend) from a noise.
**Modelling** – explain how the time series arose, for intervention.
**Forecasting** – predict the values of the time series in the future.

We first see here which specific characteristics the Time Series (TS) have, and will then see in a second part a concrete example of TS analysis (smoothing + modelling + forecasting).

You can follow along with the associated notebook in GitHub. Continue reading “Introduction to time series – Part I: the basics” →