An introduction to logistic regression

Variables can be described as either quantitative or qualitative.
Quantitative variables have a numerical value, e.g. a person’s income, or the price of a house.
Qualitative variables have a values taken from one of different classes or categories. E.g., a person’s gender (male or female), the type of house purchased (villa, flat, penthouse, …) the colour of the eye (brown, blue, green) or a cancer diagnosis.

Linear regression predicts a continuous variable but sometime we want to predict a categorical variable, i.e. a variable with a small number of possible discrete outcomes, usually unordered (there is no order among the outcomes).

This kind of problems are called Classification.

Classification

Given a feature vector X and a qualitative response y taking values from one fixed set, the classification task is to build a function f(X) that takes as input the feature vector X and predicts its value for y.
Often we are interested also (or even more) in estimating the probabilities that X belongs to each category in C.
For example, it is more valuable to have the probability that an insurance claim is fraudulent, than if a classification is fraudulent or not.

There are many possible classification techniques, or classifiers, available to predict a qualitative response.

We will se now one called logistic regression.

Note: this post is part of a series about Machine Learning with Python.
Continue reading “An introduction to logistic regression”

Advertisements

Google and Microsoft blend AI into core products

I recently watched parts of both Google and Microsoft developer conferences (respectively Build 2017 and I/O 2017).
As expected, there was big emphasis on Artificial Intelligence but, in all, I liked more the Microsoft’s one while the Google’s felt too heterogeneous and without real meat (the new capabilities from Google Lens have been available e.g. at Baidu since years).

A few things that attracted my curiosity:

Vision plus X is the killer app of AI

At Google I/O, Dr. Fei-Fei Li – the new Chief Scientist of AI/ML at Google Cloud – articulated the most convincing vision: Continue reading “Google and Microsoft blend AI into core products”

Machines “think” differently but it’s not a problem (maybe)

Yet another article about the interpretability problem of many AI algorithms, this time on the MIT Technology Review, May/June 2017 issue.

The issue is clear; many of the most successful recent AI technologies revolve around deep learning: complex artificial neural networks – with so many layers of so many neurons transforming so many variables – that behave like “black boxes” for us.
We cannot comprehend anymore the model, we don’t know how or why the outcome to a specific input is obtained.
Is it scary?

In the film Dekalog 1 by Krzysztof Kieślowski – the first of ten short films inspired to the ten Christian imperatives, the first one being “I am the Lord your God; you shall have no other gods before me”  – Krzysztof lives alone with Paweł, his 12-years-old and highly intelligent son, and introduces him to the world of personal computers. Continue reading “Machines “think” differently but it’s not a problem (maybe)”

Agile for managing a research data team

 

An interesting read: Lessons learned managing a research data science team on the ACMqueue magazine by Kate Matsudaira.

The author described how she managed a data science team in her role as VP engineering at a data mining startup.

When you have a team of people working on hard data science problems, the things that work in traditional software don’t always apply. When you are doing research and experiments, the work can be ambiguous, unpredictable, and the results can be hard to measure.

These are the changes that the team implemented in the process: Continue reading “Agile for managing a research data team”

[Link] Algorithms literature

From the Social Media Collective, part of the Microsoft Research labs, an interesting and comprehensive list of studies about algorithms as social concern.

Our interest in assembling this list was to catalog the emergence of “algorithms” as objects of interest for disciplines beyond mathematics, computer science, and software engineering.

They also try to categorise the studies and add an intriguing timeline visualisation (that shows how much interest are sparking the algorithms in this time):

timeline

Machine Learning Yearning

Got this morning the first draft 12 chapters of Prof. Andrew Ng‘s new book titled “Machine Learning Yearning – Technical strategy for AI engineers, in the era of deep learning”.

screen-shot-2016-12-04-at-18-50-21
The book cover

The book aims to help readers to quickly become better at building AI systems by gaining the practical skills needed when organising a machine learning project.
The book assumes readers are already familiar with machine learning concepts and does not go into technical explanations of how they work.

These first chapters look great, I think this book will help to close the gap between machine learning knowledge and proper execution.

My favourite chapter is the Ninth: Optimizing and satisficing metrics which suggests how to handle the problem of establishing a single-number evaluation metric when the interesting metrics are not compatible:

Suppose you care about both the accuracy and the running time of a learning algorithm.

It seems unnatural to derive a single metric by putting accuracy and running time into a single formula, such as:

Accuracy – 0.5*RunningTime

Here’s what you can do instead:
First, define what is an “acceptable” running time. Let’s say anything that runs in 100ms is acceptable.
Then, maximize accuracy, subject to your classifier meeting the running time criteria.
Here, running time is a “satisficing metric”— your classifier just has to be “good enough” on this metric, in the sense that it should take at most 100ms. Accuracy is the “optimizing metric.”


P.S. I know, satisficing is not a common word and is marked wrong by spelling checkers but it really exists ! I had to look it myself but here is the definition:
To satisfice means to choose or adopt the first option fulfilling all requirements that one comes across (in contrast to look for the optimal one).