Introduction to time series – Part II: an example

Exploring a milk production Time Series

Time series models are used in a wide range of applications, particularly for forecasting, which is the goal of this example, performed in four steps:

– Explore the characteristics of the time series data.
– Decompose the time series into trend, seasonal components, and remainder components.
– Apply time series models.
– Forecast the production for a 12 month period.

Part I of this mini-series explored the basic terms of time series analysis.

Load and clean the data

The dataset is the production amount of several diary products in California, month by month, for 18 years.
Our goal: forecast the next year production for one of those products: milk.

You can follow along with the associated notebook in GitHub

As usual, first we load the data into a Pandas dataframe.

import pandas as pd

X = pd.read_csv('../../datasets/cadairydata.csv', index_col=0)
Year.Month Month.Number Year Month Cotagecheese.Prod Icecream.Prod Milk.Prod N.CA.Fat.Price
1 1995.01 1 1995 Jan 4.370 51.595 2.112 0.9803
2 1995.02 2 1995 Feb 3.695 56.086 1.932 0.8924
3 1995.03 3 1995 Mar 4.538 68.453 2.162 0.8924
4 1995.04 4 1995 Apr 4.280 65.722 2.130 0.8967
5 1995.05 5 1995 May 4.470 73.730 2.227 0.8967
 Int64Index: 228 entries, 1 to 228
 Data columns (total 8 columns):
 Year.Month        228 non-null float64
 Month.Number      228 non-null int64
 Year              228 non-null int64
 Month             228 non-null object
 Cotagecheese.Prod 228 non-null float64
 Icecream.Prod     228 non-null float64
 Milk.Prod         228 non-null float64
 N.CA.Fat.Price    228 non-null float64
 dtypes: float64(5), int64(2), object(1)
 memory usage: 16.0+ KB

The first step is to have a Pandas date column which could be used as the time index.

Add Pandas date-time column

The fastest way is to convert year and month to strings, then create a new Pandas datetime column that combine them.

X["Year"] = X["Year"].astype(str)
X["Month.Number"] = X["Month.Number"].astype(str)
X["YearMonth"] = pd.to_datetime(X["Year"] + X["Month.Number"], format="%Y%m") # new column

Now can drop the other date columns:

X.drop(["Year","Month", "Year.Month"], axis=1, inplace=True)

and I prefer to rename some of the other columns, for clarity:

                  'Month.Number' : 'MonthNumber'},
MonthNumber CottageCheeseProd IceCreamProd MilkProd Price YearMonth
1 1 4.370 51.595 2.112 0.9803 1995-01-01
2 2 3.695 56.086 1.932 0.8924 1995-02-01
3 3 4.538 68.453 2.162 0.8924 1995-03-01
4 4 4.280 65.722 2.130 0.8967 1995-04-01
5 5 4.470 73.730 2.227 0.8967 1995-05-01
MonthNumber CottageCheeseProd IceCreamProd MilkProd Price YearMonth
224 8 2.169 78.382 3.400 1.6273 2013-08-01
225 9 2.193 74.871 3.197 1.5519 2013-09-01
226 10 2.284 70.011 3.338 1.5566 2013-10-01
227 11 1.865 61.394 3.275 1.7537 2013-11-01
228 12 2.345 53.653 3.473 1.6747 2013-12-01
 Int64Index: 228 entries, 1 to 228
 Data columns (total 6 columns):
 MonthNumber 228 non-null object
 CottageCheeseProd 228 non-null float64
 IceCreamProd 228 non-null float64
 MilkProd 228 non-null float64
 Price 228 non-null float64
 YearMonth 228 non-null datetime64[ns]
 dtypes: datetime64[ns](1), float64(4), object(1)
 memory usage: 12.5+ KB

Now, we can perform both arithmetic and logical operations on datetime objects (“YearMonth“).
As an example, here is how to select the last 12 months of dairy data:

X[X.YearMonth > "2012-12-01"]
MonthNumber CottageCheeseProd IceCreamProd MilkProd Price YearMonth
217 1 2.282 62.612 3.462 1.7744 2013-01-01
218 2 1.941 66.274 3.231 1.6224 2013-02-01
219 3 2.150 75.770 3.676 1.6968 2013-03-01
220 4 2.217 76.396 3.622 1.7444 2013-04-01
221 5 2.341 80.899 3.720 1.8552 2013-05-01
222 6 2.694 92.091 3.489 1.8314 2013-06-01
223 7 2.433 84.026 3.373 1.6923 2013-07-01
224 8 2.169 78.382 3.400 1.6273 2013-08-01
225 9 2.193 74.871 3.197 1.5519 2013-09-01
226 10 2.284 70.011 3.338 1.5566 2013-10-01
227 11 1.865 61.394 3.275 1.7537 2013-11-01
228 12 2.345 53.653 3.473 1.6747 2013-12-01

The Milk production should be in million of pounds.


There are no NaNs. The dataset is ready to be explored.

Explore the data

We concentrate now on the milk production (the MilkProd column).

Create a Time Series Plot

First useful thing to do is always to plot the data:

import matplotlib.pyplot as plt
%matplotlib inline
fig,ax = plt.subplots()
ax.plot(X.YearMonth, X.MilkProd)
fig.suptitle("Time series of milk production");
Our dataset

For most of the time period shown, the production of milk increased year over year. However, there is a decline in milk production starting in 2009 as a results of the recession. Also, notice that this time series exhibits a strong seasonal component with an annual cycle.

Since the time series has both a visible trend and a seasonality it is clearly a non-stationary series.

Statistical Properties of the Time Series

Autocorrelation is a fundamental property of time series. The Autocorrelation Function or ACF provides information on the dependency of the time series values of previous values.

import as sm_pl
sm_pl.plot_acf(X.MilkProd, lags=24);


Lags are the months in this case.
Note that the values of the ACF at the various lags decays only slowly. This indicates there is considerable auto correlation between the time series values at the various lags, mostly likely from the trend. Plotting a histogram provides information on the distribution of values of the time series.

fig,ax = plt.subplots()
ax.hist(X.MilkProd, bins=40)
fig.suptitle("Distribution of Milk Production");


The histogram of the full milk production time series shows considerable dispersion. Again this behaviour is likely the result of the trend.

Stationarity: are mean and standard deviation constant over time?

From the plot is pretty clear that there is trend and seasonality, therefore is not stationary. Anyway, can be checked, via the rolling() function from Pandas:

rollingMean = X.MilkProd.rolling(window=12).mean()
rollingStd = X.MilkProd.rolling(window=12).std()

fig,ax1 = plt.subplots()
ax1.plot(X.MilkProd, color="black", label="Milk production")
ax1.plot(rollingMean, color="red", label="Average")
ax1.set_xlabel("Time in months")
ax1.legend(loc="upper left")

ax2 = ax1.twinx() # Create a twin Axes sharing the xaxis
ax2.plot(rollingStd, color = "blue", label = "StD")
ax2.set_ylabel("Standard Deviation")
ax2.legend(loc="lower right")
fig.suptitle("Milk - Average and Standard Deviation");


The mean is clearly increasing over time (the production is rising) but also the standard deviation of milk production increased in the time frame.

Decomposition of the Time Series

Time series are typically decomposed into three components: trend, seasonal, and the remainders, or residuals.
Trend can be modelled by several methods. We will start by decomposing the time series using a simple moving average model.


The code in the cell below uses moving window method to compute the average of the time series over specified span, or order of the operator. As the moving window operator moves over the data, the average of the values in the windows is calculated.

import numpy as np
def movingAverage(values, order):
    end = len(values)
    out = np.zeros(len(values))

    out[0] = values[1]
    for i in range(1,end):
        if (i - order <= 1):
            j = 1
            j += 1
        out[i] = sum(values[j-1:i]) / (i-j+1)

    return out

One of the best ways to make a series stationary on variance is through transforming the original series through log transform. We go back to our original data series and add a new column, the logarithm of the milk production:

X['LogMilk'] = pd.Series(np.log(X.MilkProd), index = X.index)

Now we perform a 12 months moving average, to extract the trend:

trendM = movingAverage(X.LogMilk, 12)

fig,ax = plt.subplots()
ax.plot(X.LogMilk, color="black", label="Milk production")
ax.plot(trendM, color="blue", label="Trend")

ax.set_xlabel("Time in months")
ax.legend(loc="upper left")

fig.suptitle("Milk & trend – log");


Get the Trend with Lowess

Having tried a simple moving average decomposition, we will now use a Lowess model to determine the trend. Lowess is a sophisticated non-linear regression. The statsmodels package includes a function to calculate the Lowess regression:

import statsmodels.nonparametric.smoothers_lowess as sm

trendL = sm.lowess(X.LogMilk, X.YearMonth, frac=0.25,return_sorted=False)

fig,ax = plt.subplots()
ax.plot(X.YearMonth, X.LogMilk)
ax.plot(X.YearMonth, trendM, label='Trend from Moving Average')
ax.plot(X.YearMonth, trendL, label="Trend from Lowess")
ax.legend(loc="lower right")
fig.suptitle("Trends extracted from milk production");


The time series charts show the original time series along with the two trends. The trend from Lowess regression is a bit smoother than the one was obtained with the simple moving average decomposition.


Now we apply a linear regression on the Milk production using the months as variables to discover the seasonality. But first need to convert the months into categorical variables using the One Hot Encoder (OHE) approach.

monthsOHE = pd.get_dummies(X.MonthNumber)

The order doesn’t matter in this case (being numerical, October and November come before February but it’s ok).

prodWithoutTrend = X.LogMilk - trendM # the difference
fig,ax = plt.subplots()

ax.set_xlabel("Time in months")
ax.set_ylabel("Milk production without trend")

fig.suptitle("Difference looks more stationary");


The milk production – when the trend is removed – seems more stationary, without any immediate pattern. We create now a temporary data frame to perform the regression:

    # a new tmp data frame for decomposition
decX = pd.DataFrame({'LogMilkNoTrend': prodWithoutTrend})
decX = decX.join(monthsOHE)

The regression is applied to the production without trend:

from sklearn import linear_model
model = linear_model.LinearRegression(fit_intercept = False), decX.LogMilkNoTrend)
LinearRegression(copy_X=True, fit_intercept=False, n_jobs=1, normalize=False)

It’s a model with 12 variables (one for each month, keep in mind that they are not in chronological order):

array([ 0.02471208, 0.01099585, -0.02349834, 0.01714495, 
 -0.04996016, 0.06783954, 0.03872409, 0.06733682, 
 0.01530435, 0.02160618,  0.01572798, -0.02670753])

We get the seasonality by predicting the value month by month:

seasonalM = model.predict(monthsOHE)


If we remove the seasonality from the difference between milk production and the trend, we will get the residuals or remainders:

remainderM = prodWithoutTrend - seasonalM

Finally, we put all together in a data frame that contains all decomposition elements: trend, seasonality and residuals

  # a data frame with all decompositions
decomposed = pd.DataFrame({'t': trendM, 's': seasonalM, 'r': remainderM})
r s t
1 -0.024712 0.024712 0.747635
2 -0.039119 -0.049960 0.747635
3 0.000099 0.067840 0.703096
4 -0.008344 0.038724 0.725742
5 -0.000018 0.067337 0.733337
fig,ax = plt.subplots(4, sharex=True, figsize=(15,15))





As we have seen, the trend is of an increasing milk production, beside the last months when it is stable; while there is a clear yearly seasonality.
The remainders in an ideal situation are like white noise, i.e. every pattern has been removed and what is left is only the random variability.

Decomposing using statsmodels

The statsmodels package has also a function to decompose a time series:

from statsmodels.tsa.seasonal import seasonal_decompose

# need to pass the time series as array (values) because of error
# in statsmodels, should be fixed in next version 0.9
decomposition = seasonal_decompose(X.MilkProd.values, freq=12)

trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
fig,ax = plt.subplots(4, sharex=True, figsize=(15,15))





You can see the trend and seasonal components clearly separated in the above plots. The remainder plot looks fairly random, as expected.

Is now stationary?

The question remains, is the remainder from this decomposition stationary? To find out, we plot its ACF:

sm_pl.plot_acf(decomposed.r, lags=24);


sm_pl.plot_pacf(decomposed.r, lags=24);


The ACF has at least 5 significant lag values, indicating the remainder is NOT, in fact, stationary.
Compared to the behavior of the ACF for the simple moving average decomposition, the behavior of the remainder is improved.

Now we plot the histogram of the remainder:

fig,ax = plt.subplots()
ax.hist(decomposed.r, bins=40)
fig.suptitle("Distribution of Residuals");


The distribution of the remainder values is much closer to a Normal distribution than for the original time series we created earlier.
This result, combined with the ACF plot, indicates that the decomposition is effective.

We can further investigate the remainder (non-seasonal residual) component by making a box plot by month of the year:

remaindersByMonth = pd.DataFrame({'resid': decomposed.r, 'month': X["MonthNumber"]})



The remainder component shows only limited variation from month to month. The differences are within the interquartile range, indicating that the seasonal model is a reasonably good fit.

Model a tme series

Now that we have explored the decomposition of the time series we will now construct and test Autoregressive Moving Average (ARMA) models for the remainder of the time series.
We will create and test these models in three steps, first creating a moving average (MA) model, then creating an autoregressive (AR) model and finally an autoregressive moving average (ARMA) model.

Moving Average Models

The ACF of the remainder from the decomposition of the milk production time series had at least 5 significant lag values. As an initial model, we therefore now create an MA model of order 5 and examine the model summary.

from statsmodels.tsa.arima_model import ARIMA
modelMA5 = ARIMA(decomposed.r.values, order=(0,0,5))
resultsMA5 = # disp: do not display covergence info
ARMA Model Results
Dep. Variable: y No. Observations: 228
Model: ARMA(0, 5) Log Likelihood 629.357
Method: css-mle S.D. of innovations 0.015
Date: Sun, 17 Jun 2018 AIC -1244.713
Time: 14:24:03 BIC -1220.708
Sample: 0 HQIC -1235.028
coef std err z P>|z| [0.025 0.975]
const -6.083e-05 0.003 -0.018 0.985 -0.007 0.006
ma.L1.y 0.7585 0.074 10.264 0.000 0.614 0.903
ma.L2.y 0.6238 0.086 7.278 0.000 0.456 0.792
ma.L3.y 0.4338 0.070 6.205 0.000 0.297 0.571
ma.L4.y 0.2543 0.073 3.488 0.001 0.111 0.397
ma.L5.y 0.2421 0.077 3.139 0.002 0.091 0.393

Examining the values of the model coefficients and their standard errors (SE), we notice that the SE of the last two coefficients is the same order than the value of the coefficient itself. This may indicate that the value of this coefficient is poorly determined and could likely be set to zero.
The foregoing result indicates that the order of the MA model could be reduced. Generally, the order of an MA model is reduced in unit steps until all the coefficients appear to be significant.
We try with 3 coefficients and examine the model summary.

modelMA3 = ARIMA(decomposed.r.values, order=(0,0,3))
resultsMA3 =
ARMA Model Results
Dep. Variable: y No. Observations: 228
Model: ARMA(0, 3) Log Likelihood 623.018
Method: css-mle S.D. of innovations 0.016
Date: Sun, 17 Jun 2018 AIC -1236.036
Time: 14:24:53 BIC -1218.889
Sample: 0 HQIC -1229.118
coef std err z P>|z| [0.025 0.975]
const -2.526e-05 0.003 -0.009 0.993 -0.006 0.006
ma.L1.y 0.8000 0.063 12.608 0.000 0.676 0.924
ma.L2.y 0.6155 0.061 10.074 0.000 0.496 0.735
ma.L3.y 0.3563 0.065 5.459 0.000 0.228 0.484

The AIC is slightly lower and the standard error compared to the magnitude of the coefficients is also slightly improved; indicates that the order of the model is more reasonable but probably not yet perfect.
To test how well this model fits the data, and results in a stationary result, we plot the ACF of the residuals of the MA(3) model:

sm_pl.plot_acf(resultsMA3.resid, lags=24);


sm_pl.plot_pacf(resultsMA3.resid, lags=24);


Autoregressive Models

Next, we test an autoregressive model. The PACF of the reminder indicates that an AR model might not be the best choice. None the less, a low order AR model might fit these data.

In general, AR orders will tend to present themselves by a sharp cutoff in the PACF plot and a slow trending or sinusoidal degradation in the ACF plot. The opposite is usually true for MA orders.

modelAR2 = ARIMA(decomposed.r.values, order=(2,0,0))
resultsAR2 =
ARMA Model Results
Dep. Variable: y No. Observations: 228
Model: ARMA(2, 0) Log Likelihood 632.334
Method: css-mle S.D. of innovations 0.015
Date: Sun, 17 Jun 2018 AIC -1256.668
Time: 14:25:50 BIC -1242.951
Sample: 0 HQIC -1251.133
coef std err z P>|z| [0.025 0.975]
const -0.0005 0.005 -0.099 0.921 -0.011 0.010
ar.L1.y 0.7742 0.066 11.709 0.000 0.645 0.904
ar.L2.y 0.0389 0.066 0.584 0.560 -0.091 0.169

The AIC value is the worst among the models tried.
Examining the values of the coefficients and their standard errors, we note that the standard error of the second coefficient is actually bigger than the coefficient. Clearly, the AR(2) model is over parameterized.

Next, we try an AR(1) model.

modelAR1 = ARIMA(decomposed.r.values, order=(1,0,0))
resultsAR1 =
ARMA Model Results
Dep. Variable: y No. Observations: 228
Model: ARMA(1, 0) Log Likelihood 632.163
Method: css-mle S.D. of innovations 0.015
Date: Sun, 17 Jun 2018 AIC -1258.327
Time: 14:26:22 BIC -1248.039
Sample: 0 HQIC -1254.176
coef std err z P>|z| [0.025 0.975]
const -0.0004 0.005 -0.088 0.930 -0.010 0.009
ar.L1.y 0.8055 0.039 20.777 0.000 0.729 0.881

The standard error of the AR(1) model is an order of magnitude less than the value of the coefficient, which is promising.

sm_pl.plot_acf(resultsAR1.resid, lags=24);


sm_pl.plot_pacf(resultsAR1.resid, lags=24);


Note that only the 0 lag of the ACF is significant and that there are no significant lags for the PACF. These observations indicate that the AR(1) model is a good fit. Compare these results to those of the MA(3) model, noting that they are nearly identical. Either the MA(3) or AR(1) model could be a good choice for these data.

Autoregressive Moving Average (ARMA) Models

We have found that both MA(3) and AR(1) models are possibly good fits to the remainder series.
We will now investigate the use of autoregressive Integrated moving average (ARIMA) models on the remainder series, a combination of both MA and AR models.
As a starting point we try an ARIMA(1,3) model.

modelARMA = ARIMA(decomposed.r.values, order=(1,0,3))
resultsARMA =
ARMA Model Results
Dep. Variable: y No. Observations: 228
Model: ARMA(1, 3) Log Likelihood 632.544
Method: css-mle S.D. of innovations 0.015
Date: Sun, 17 Jun 2018 AIC -1253.087
Time: 14:42:32 BIC -1232.511
Sample: 0 HQIC -1244.785
coef std err z P>|z| [0.025 0.975]
const -0.0005 0.005 -0.092 0.927 -0.010 0.009
ar.L1.y 0.8033 0.069 11.705 0.000 0.669 0.938
ma.L1.y -0.0279 0.097 -0.287 0.774 -0.218 0.163
ma.L2.y 0.0548 0.084 0.649 0.517 -0.111 0.220
ma.L3.y -0.0116 0.081 -0.144 0.886 -0.170 0.147

In each case, the standard error is of the same order of magnitude as the value of the coefficient, indicating this model is a poor fit to the data.

Automatic search for the best ARIMA parameters

There are many guidelines and best practices to get the most suitable parameters for an ARIMA model, yet the correct parametrisation of ARIMA models can be a painstaking manual process that requires domain expertise and time. Other statistical programming languages such as R provide automated ways to solve this issue, but not yet in Python.
We can solve this issue by writing Python code to programmatically select the optimal parameter values for our ARIMA(p,d,q) time series model.

We would perform a grid search, a sort of hyper-parameter tuning. We can loop through non-trivial values of the ARIMA orders and see which model has the lowest Akaike Information Criterion (AIC).

import itertools

# Define the p and q parameters to take any value between 0 and 3
p = q = range(0, 4)

# Generate all different combinations of p, d=0 and q triplets
pdq = [(a,0,b) for a,b in itertools.product(p, q)]
best_score = np.inf
best_pdq = None
temp_model = None

for params in pdq:


        temp_model = ARIMA(decomposed.r.values, order=params)
        resultsARMA = 

        if resultsARMA.aic  NEW best AIC: ", best_score)
        #print("Unexpected error:", sys.exc_info()[0])
 (0, 0, 0) -> NEW best AIC: -1020.86399991
 (0, 0, 1) -> NEW best AIC: -1157.4900345419903
 (0, 0, 2) -> NEW best AIC: -1207.5618740712084
 (0, 0, 3) -> NEW best AIC: -1236.036064300405
 (1, 0, 0) -> NEW best AIC: -1258.326773880575

The lowest AIC is from an ARIMA(1,0) model.

bestAICmodel = ARIMA(decomposed.r.values, order=(1,0,0))
resultsARMA =
ARMA Model Results
Dep. Variable: y No. Observations: 228
Model: ARMA(1, 0) Log Likelihood 632.163
Method: css-mle S.D. of innovations 0.015
Date: Sun, 17 Jun 2018 AIC -1258.327
Time: 14:44:00 BIC -1248.039
Sample: 0 HQIC -1254.176
coef std err z P>|z| [0.025 0.975]
const -0.0004 0.005 -0.088 0.930 -0.010 0.009
ar.L1.y 0.8055 0.039 20.777 0.000 0.729 0.881

The standard error of the ar1 coefficient is good.
Next, we plot the ACF and PACF of the model.

sm_pl.plot_acf(resultsARMA.resid, lags=24);


sm_pl.plot_pacf(resultsARMA.resid, lags=24);


fig,ax = plt.subplots()
ax.hist(resultsARMA.resid, bins=70)
fig.suptitle("Distribution of Residuals");



The mean in the residuals is almost zero showing that the bias has been almost totally removed.

How does the model behave?

Now we could split the initial dataset into the usual train and test sets and check how the trained model behaves on the test set.

# split into train and test (last two years) sets
testSize = 24
milkData = X.LogMilk.values
train, test = milkData[0:-testSize], milkData[-testSize:]

We use a one-step-ahead training. For each new observation in the test dataset, a prediction is claculated, then it is added back to the training test and used to improve the model.
Here is an example on the last model that we checked, the ARIMA(1,0).

history = [x for x in train]
predictions = list()
for t in range(len(test)):
    modelARMA = ARIMA(history, order=(1,0,0))
    resultsARMA =
    yhat = resultsARMA.forecast()[0][0] # one step forecast
rmse = np.sqrt(sum((test - predictions)**2)/testSize)

The Rooted Mean Square Error (a classic metric to measure the model accuracy) is quite low.
Let’s see the predictions plotted:

predictions.pop(0) # remove the first element
fig,ax = plt.subplots()
ax.plot(test, label="Observed")
ax.plot(predictions, label='Predictions', color='#ff0066');

ax.legend(loc='upper right')
fig.suptitle("Last months milk production");


The ARMA(1,0) is not bad: the predictions are quite good and the RMSE is quite low.

Seasonal ARIMA model

The statsmodels package (from version 0.8) includes a function to model a seasonal arima.
We see how it works first using as parameters AR=1 and MA=0, plus the difference set to 1 to indicate that we want a data transformation to remove the trend and seasonality.

import statsmodels.api as sm_sa
firstModel = sm_sa.tsa.statespace.SARIMAX(X.LogMilk.values, order=(1,1,0),
results =
Statespace Model Results
Dep. Variable: y No. Observations: 228
Model: SARIMAX(1, 1, 0)x(1, 1, 0, 12) Log Likelihood 551.410
Date: Sun, 17 Jun 2018 AIC -1096.821
Time: 18:30:54 BIC -1086.533
Sample: 0 HQIC -1092.670
– 228
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 -0.1913 0.068 -2.832 0.005 -0.324 -0.059
ar.S.L12 -0.5278 0.059 -8.995 0.000 -0.643 -0.413
sigma2 0.0003 2.75e-05 12.378 0.000 0.000 0.000
Ljung-Box (Q): 105.30 Jarque-Bera (JB): 8.26
Prob(Q): 0.00 Prob(JB): 0.02
Heteroskedasticity (H): 0.81 Skew: 0.08
Prob(H) (two-sided): 0.39 Kurtosis: 3.95

Again we use a “grid search” to iteratively explore different combinations of parameters. For each combination of parameters, we fit a new seasonal ARIMA model with the SARIMAX() function from the statsmodels module and assess its overall quality.
This time is a double loop so it will take much longer time.

  # Define the p and q parameters to take any value between 0 and 3
p = q = range(0, 4)
sp = sq = range(0, 3)

  # Generate all different combinations of p, d=1 and q triplets
pdq = [(a,1,b) for a,b in itertools.product(p, q)]
  # Generate all different combinations of seasonal p, q and q triplets
seasonal_pdq = [(a, 1, b, 12) for a,b in list(itertools.product(sp, sq))]
best_score = np.inf
best_pdq = None
best_season_pdq = None
temp_model = None

for params in pdq:
    for params_seasonal in seasonal_pdq:
            temp_model = sm_sa.tsa.statespace.SARIMAX(X.LogMilk.values,
                                            order = params,
                                            seasonal_order = params_seasonal)

            results =

            if results.aic  NEW best AIC: ", best_score)

 (0, 1, 0) (0, 1, 1, 12) -> NEW best AIC: -1145.67087571
 (0, 1, 0) (2, 1, 1, 12) -> NEW best AIC: -1146.24623229
 (0, 1, 1) (0, 1, 1, 12) -> NEW best AIC: -1147.77581717
 (0, 1, 1) (2, 1, 1, 12) -> NEW best AIC: -1147.84213857
 (1, 1, 1) (0, 1, 1, 12) -> NEW best AIC: -1147.96663351
 (1, 1, 1) (2, 1, 1, 12) -> NEW best AIC: -1148.79899832

The model with lowest AIC is the (1,1,1)(2,1,1) but they have all very close AIC values so it could be worth to check them more in detail one by one.
Here we just take the lowest AIC:

bestModel = sm_sa.tsa.statespace.SARIMAX(X.LogMilk.values, order=(1,1,1),
results =
Statespace Model Results
Dep. Variable: y No. Observations: 228
Model: SARIMAX(1, 1, 1)x(2, 1, 1, 12) Log Likelihood 580.399
Date: Sun, 17 Jun 2018 AIC -1148.799
Time: 18:47:26 BIC -1128.223
Sample: 0 HQIC -1140.497
– 228
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 0.5334 0.265 2.013 0.044 0.014 1.053
ma.L1 -0.6859 0.230 -2.984 0.003 -1.136 -0.235
ar.S.L12 -0.1900 0.133 -1.429 0.153 -0.450 0.071
ar.S.L24 -0.1710 0.095 -1.791 0.073 -0.358 0.016
ma.S.L12 -0.7035 0.097 -7.241 0.000 -0.894 -0.513
sigma2 0.0002 1.77e-05 14.007 0.000 0.000 0.000
Ljung-Box (Q): 45.23 Jarque-Bera (JB): 27.85
Prob(Q): 0.26 Prob(JB): 0.00
Heteroskedasticity (H): 0.84 Skew: 0.02
Prob(H) (two-sided): 0.45 Kurtosis: 4.76

The coef column shows the weight (i.e. importance) of each feature and how each one impacts the time series. The P>|z| column informs us of the significance of each feature weight. Here, a couple of weights have a p-value higher than 0.05, so it could be reasonable to remove from the model.

The results object has a convenient diagnostic method named plot_diagnostics() to get a rundown of the fitted model.

results.plot_diagnostics(figsize=(16, 10));


The residual plot of the fitted model in the upper left corner appears to be white noise as it does not display obvious seasonality or trend behaviour.
The histogram plot in the upper right corner pair with the kernel density estimation (red line) indicates that the time series is almost normally distributed. This is compared to the density of the standard normal distribution (green line).
Not a perfect normal distribution tough, as the Q-Q chart in the bottom left shows.
The correlogram (autocorrelation plot) confirms this results, since the time series residuals show low correlations with lagged residuals.

The model produces a satisfactory fit that describes the underlying time series data well enough to forecast future values. Although the fit so far appears to be fine, a better fit could be achieved with a more complex model, i.e. a better model might have been von when increasing the parameter space for the grid search.


It is easy to forecast values using the previously fitted model.
The get_prediction and conf_int methods calculate predictions for future points in time for the previously fitted model and the confidence intervals associated with a prediction, respectively.

the last 12 months

First, we get the predictions for the last 12 months, to see how the model behaves:

HISTORIC = X.shape[0]
PRED = 12
last12m = results.get_prediction(start=HISTORIC-PRED+1,
                                 dynamic=False, full_results=True)

The dynamic=False argument ensures that we produce one-step ahead forecasts, meaning that forecasts at each point are generated using the full history up to that point.

fig,ax = plt.subplots()
ax.plot(X.YearMonth[-PRED:], X.LogMilk[-PRED:], label="Observed")
ax.plot(X.YearMonth[-PRED:], last12m.predicted_mean, label='Modelled', color='#ff0066');
  # avoid overlapping of ticks on x ax
plt.setp(ax.get_xticklabels(), rotation=30, horizontalalignment='right')

ax.legend(loc='upper right')
fig.suptitle("Last year milk production");


rmse = ((last12m.predicted_mean - X.LogMilk[-PRED:]) ** 2).mean()

Both the plot and the RMSE are quite good.

the next 24 months

To get the forecast we can call the get_forecast function:

next24m = results.get_forecast(steps=FORECAST)
next24m_ci = next24m.conf_int()

There is a point forecast (the expected value) along with 95 percent confidence intervals. Notice, that the confidence intervals generally get wider for forecasts further out in time. It is hardly suprising that the forecast has more uncertaintly as time increases from the present.

Finally, we can plot the forecast but need first to create new dataframe with the related dates, for convenience.

futureDates = pd.date_range(start = '2014-01-01', periods=FORECAST, freq='MS')
next24m_ci.index = futureDates

futureDF = pd.DataFrame(next24m.predicted_mean, index=futureDates)
2014-01-01 1.258927
2014-02-01 1.193670
2014-03-01 1.304142
2014-04-01 1.279353
2014-05-01 1.308496
fig,ax = plt.subplots()
ax.plot(X.YearMonth, X.LogMilk, label="Observed")
ax.plot(futureDF, label='Forecast', color='#ff0066');
  # draw confidence bound
                next24m_ci.iloc[:, 0],
                next24m_ci.iloc[:, 1], color='#ff0066', alpha=.25);


ax.legend(loc='upper left')
fig.suptitle("Milk production with forecast");


The original time series of milk production is show in blue in the plot above. The forecast is show in red. The 95 percent confidence intervals are shown in lighter shades of red.

It is obvious that, the further we try to extrapolate the time series evolution into the future, the less confident our prediction becomes. Hence, the confidence bound widens in the course of time.

You can try yourself to decompose the Icecream.Prod column of the diary data frame and answer the following questions:
-Does icecream production have a noticable seasonal component or are the values all close to the average over time? Is there a strong seasonal component for icecream production?
-Does the acf plot indicate that the remainder series is stationary?
-Do the values in the histogram have an approximately normal distribution?
-Does the interquartile range for each month overlap, indicating that the decomposition has produced a reasonably good model of the seasonal variation?

Note: this post is part of a series about Machine Learning with Python.

One thought on “Introduction to time series – Part II: an example

  1. Pingback: Introduction to time series – Part I: the basics – Look back in respect

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s