Recover audio using linear regression

In this example, we will use linear regression to recover or ‘fill out’ a completely deleted portion of an audio file!
For this, we use the FSDD, Free-Spoken-Digits-Dataset, an audio dataset put together by Zohar Jackson:

cleaned up audio (no dead-space, roughly same length, same bitrate, same samples-per-second rate, same speaker, etc) samples ready for machine learning.

You can follow along with the associated notebook in GitHub. Continue reading “Recover audio using linear regression”


Regularisation in neural networks

We have seen the concept of regularisation and how is applied to linear regression, let’s see now another example for logistic regression done with artificial neural networks.

The question to answer is to recognise hand-written digits and is a classic one-vs-all logistic regression problem.

The dataset  contains 5000 training examples of handwritten digits and is a subset of the MNIST handwritten digit dataset.

Each training example is a 20 pixel by 20 pixel grayscale image of the digit. Each pixel is represented by a floating point number indicating the grayscale intensity at that location.

The 20 by 20 grid of pixels is unrolled into a 400-dimensional vector. Each of these training examples becomes a single row in our data matrix X. This gives us a 5000 by 400 matrix X where every row is a training example for a handwritten digit image.

Let’s get more familiar with the dataset.
You can follow along on the associated notebook.
Continue reading “Regularisation in neural networks”


The basic idea of regularisation is to penalise or shrink the large coefficients of a regression model.
This can help with the bias / variance trade-off (shrinking the coefficient estimates can significantly reduce their variance and will improve the prediction error)  and can help with model selection by automatically removing irrelevant features (that is, by setting the corresponding coefficient estimates to zero).
Its cons are that this approach may be very demanding computationally.

There are several ways to perform the shrinkage; the regularisations models that we will see are the Ridge regression and the Lasso. Continue reading “Regularisation”



We have seen previously how learning the parameters of a prediction function on the same data would have a perfect score but would fail to predict anything useful on yet-unseen data. This situation is called overfitting. To avoid it, it is common practice to hold out part of the available data as a test set.

The test error would be then the average error that results when predicting the response on a new observation, one that was not used in training the learning method.
In contrast, the training error is calculated by applying the learning method to the observations used in its training.
But the training error rate often is quite different from the test error rate, and in particular it can dramatically underestimate the general error.

The best solution would be to use a large designated test set, that often is not available.

Here we see a class of methods that estimate the test error by holding out a subset of the training observations from the fitting process, and then applying the learning method to those held out observations. Continue reading “Cross-validation”


Cat or not cat?

This shows two simple image-recognition algorithms that can correctly classify pictures as cat or non-cat.
The first is a classic logistic regression while the second – more accurate – is a deep neural network.

You can follow along using the notebook in GitHub also. And this post is part of a series about Machine Learning with Python. Continue reading “Cat or not cat?”


The overfitting problem and the bias vs. variance dilemma

We have seen what is linear regression, how to make models and algorithms for estimating the parameters of such models, how to measure the loss.

Now we see how to assess how well the considered method should perform in predicting new data, how to select amongst possible models to choose the best performing.

We will first explore the concept of training and test error, how they vary with model complexity and how they might be utilised to form a valid assessment of predictive performance. This leads directly to an important bias-variance tradeoff, which is fundamental to machine learning.

The concepts described in this post are key to all machine learning problems, well-beyond the regression setting. Continue reading “The overfitting problem and the bias vs. variance dilemma”