Cross-validation

We have seen previously how learning the parameters of a prediction function on the same data would have a perfect score but would fail to predict anything useful on yet-unseen data. This situation is called overfitting. To avoid it, it is common practice to hold out part of the available data as a test set.

The test error would be then the average error that results when predicting the response on a new observation, one that was not used in training the learning method.
In contrast, the training error is calculated by applying the learning method to the observations used in its training.
But the training error rate often is quite different from the test error rate, and in particular it can dramatically underestimate the general error.

The best solution would be to use a large designated test set, that often is not available.

Here we see a class of methods that estimate the test error by holding out a subset of the training observations from the fitting process, and then applying the learning method to those held out observations. Continue reading “Cross-validation”

Advertisements

The overfitting problem and the bias vs. variance dilemma

We have seen what is linear regression, how to make models and algorithms for estimating the parameters of such models, how to measure the loss.

Now we see how to assess how well the considered method should perform in predicting new data, how to select amongst possible models to choose the best performing.

We will first explore the concept of training and test error, how they vary with model complexity and how they might be utilised to form a valid assessment of predictive performance. This leads directly to an important bias-variance tradeoff, which is fundamental to machine learning.

The concepts described in this post are key to all machine learning problems, well-beyond the regression setting. Continue reading “The overfitting problem and the bias vs. variance dilemma”

An introduction to logistic regression

Variables can be described as either quantitative or qualitative.
Quantitative variables have a numerical value, e.g. a person’s income, or the price of a house.
Qualitative variables have a values taken from one of different classes or categories. E.g., a person’s gender (male or female), the type of house purchased (villa, flat, penthouse, …) the colour of the eye (brown, blue, green) or a cancer diagnosis.

Linear regression predicts a continuous variable but sometime we want to predict a categorical variable, i.e. a variable with a small number of possible discrete outcomes, usually unordered (there is no order among the outcomes).

This kind of problems are called Classification.

Classification

Given a feature vector X and a qualitative response y taking values from one fixed set, the classification task is to build a function f(X) that takes as input the feature vector X and predicts its value for y.
Often we are interested also (or even more) in estimating the probabilities that X belongs to each category in C.
For example, it is more valuable to have the probability that an insurance claim is fraudulent, than if a classification is fraudulent or not.

There are many possible classification techniques, or classifiers, available to predict a qualitative response.

We will se now one called logistic regression.

Note: this post is part of a series about Machine Learning with Python.
Continue reading “An introduction to logistic regression”

Google and Microsoft blend AI into core products

I recently watched parts of both Google and Microsoft developer conferences (respectively Build 2017 and I/O 2017).
As expected, there was big emphasis on Artificial Intelligence but, in all, I liked more the Microsoft’s one while the Google’s felt too heterogeneous and without real meat (the new capabilities from Google Lens have been available e.g. at Baidu since years).

A few things that attracted my curiosity:

Vision plus X is the killer app of AI

At Google I/O, Dr. Fei-Fei Li – the new Chief Scientist of AI/ML at Google Cloud – articulated the most convincing vision: Continue reading “Google and Microsoft blend AI into core products”

Machine Learning Yearning

Got this morning the first draft 12 chapters of Prof. Andrew Ng‘s new book titled “Machine Learning Yearning – Technical strategy for AI engineers, in the era of deep learning”.

screen-shot-2016-12-04-at-18-50-21
The book cover

The book aims to help readers to quickly become better at building AI systems by gaining the practical skills needed when organising a machine learning project.
The book assumes readers are already familiar with machine learning concepts and does not go into technical explanations of how they work.

These first chapters look great, I think this book will help to close the gap between machine learning knowledge and proper execution.

My favourite chapter is the Ninth: Optimizing and satisficing metrics which suggests how to handle the problem of establishing a single-number evaluation metric when the interesting metrics are not compatible:

Suppose you care about both the accuracy and the running time of a learning algorithm.

It seems unnatural to derive a single metric by putting accuracy and running time into a single formula, such as:

Accuracy – 0.5*RunningTime

Here’s what you can do instead:
First, define what is an “acceptable” running time. Let’s say anything that runs in 100ms is acceptable.
Then, maximize accuracy, subject to your classifier meeting the running time criteria.
Here, running time is a “satisficing metric”— your classifier just has to be “good enough” on this metric, in the sense that it should take at most 100ms. Accuracy is the “optimizing metric.”


P.S. I know, satisficing is not a common word and is marked wrong by spelling checkers but it really exists ! I had to look it myself but here is the definition:
To satisfice means to choose or adopt the first option fulfilling all requirements that one comes across (in contrast to look for the optimal one).