Logistic regression using SKlearn

We have seen an introduction of logistic regression with a simple example how to predict a student admission to university based on past exam results.
This was done using Python, from scratch defining the sigmoid function and the gradient descent, and we have seen also the same example using the statsmodels library.

Now we are going to see how to solve a logistic regression problem using the popular SciKitLearn library, specifically the LogisticRegression module.

The example this time is to predict survival on the Titanic ship (that sank against an iceberg).
It’s a basic learning competition on the ML platform Kaggle, a simple introduction to machine learning concepts, specifically binary classification (survived / not survived).
Here we are looking into how to apply Logistic Regression to the Titanic dataset.

You can follow along the Python notebook on GitHub or the Python kernel on Kaggle. Continue reading “Logistic regression using SKlearn”


Azure Machine Learning Studio

Azure ML (Machine Learning) Studio is an interactive environment from Microsoft to build predictive analytics solutions.
You can upload your data or use the Azure cloud and via a drag-and-drop interface you can combine existing machine learning algorithms – or your own scripts, in several languages – to build and test a data science pipeline.
Eventually, the final model can be deployed as a web service for e.g. Excel or custom apps.

If you are already using the Azure solutions, it offers a valuable add-on for machine learning. Especially if you need a quick way to analyse an dataset and evaluate a model.

This is what Gartner says about Azure ML Studio in its 2018 “Magic Quadrant for Data Science and Machine-Learning Platforms”:

Microsoft remains a Visionary.
Its position in this regard is attributable to low scores for market responsiveness and product viability, as Azure Machine Learning Studio’s cloud-only nature limits its usability for the many advanced analytic use cases that require an on-premises option.

Note: I have no affiliation with Microsoft nor I am payed by them. I am just looking  into the main tools available for machine learning.

We will see how to create and build a regression model based on the Autos dataset that we already used earlier.

You can follow up this experiment, directly from Azure ML Studio. Continue reading “Azure Machine Learning Studio”

Recover audio using linear regression

In this example, we will use linear regression to recover or ‘fill out’ a completely deleted portion of an audio file!
For this, we use the FSDD, Free-Spoken-Digits-Dataset, an audio dataset put together by Zohar Jackson:

cleaned up audio (no dead-space, roughly same length, same bitrate, same samples-per-second rate, same speaker, etc) samples ready for machine learning.

You can follow along with the associated notebook in GitHub. Continue reading “Recover audio using linear regression”

Regularisation in neural networks

We have seen the concept of regularisation and how is applied to linear regression, let’s see now another example for logistic regression done with artificial neural networks.

The question to answer is to recognise hand-written digits and is a classic one-vs-all logistic regression problem.

The dataset  contains 5000 training examples of handwritten digits and is a subset of the MNIST handwritten digit dataset.

Each training example is a 20 pixel by 20 pixel grayscale image of the digit. Each pixel is represented by a floating point number indicating the grayscale intensity at that location.

The 20 by 20 grid of pixels is unrolled into a 400-dimensional vector. Each of these training examples becomes a single row in our data matrix X. This gives us a 5000 by 400 matrix X where every row is a training example for a handwritten digit image.

Let’s get more familiar with the dataset.
You can follow along on the associated notebook.
Continue reading “Regularisation in neural networks”


The basic idea of regularisation is to penalise or shrink the large coefficients of a regression model.
This can help with the bias / variance trade-off (shrinking the coefficient estimates can significantly reduce their variance and will improve the prediction error)  and can help with model selection by automatically removing irrelevant features (that is, by setting the corresponding coefficient estimates to zero).
Its cons are that this approach may be very demanding computationally.

There are several ways to perform the shrinkage; the regularisations models that we will see are the Ridge regression and the Lasso. Continue reading “Regularisation”


We have seen previously how learning the parameters of a prediction function on the same data would have a perfect score but would fail to predict anything useful on yet-unseen data. This situation is called overfitting. To avoid it, it is common practice to hold out part of the available data as a test set.

The test error would be then the average error that results when predicting the response on a new observation, one that was not used in training the learning method.
In contrast, the training error is calculated by applying the learning method to the observations used in its training.
But the training error rate often is quite different from the test error rate, and in particular it can dramatically underestimate the general error.

The best solution would be to use a large designated test set, that often is not available.

Here we see a class of methods that estimate the test error by holding out a subset of the training observations from the fitting process, and then applying the learning method to those held out observations. Continue reading “Cross-validation”