Variables can be described as either quantitative or qualitative.
Quantitative variables have a numerical value, e.g. a person’s income, or the price of a house.
Qualitative variables have a values taken from one of different classes or categories. E.g., a person’s gender (male or female), the type of house purchased (villa, flat, penthouse, …) the colour of the eye (brown, blue, green) or a cancer diagnosis.
Linear regression predicts a continuous variable but sometime we want to predict a categorical variable, i.e. a variable with a small number of possible discrete outcomes, usually unordered (there is no order among the outcomes).
This kind of problems are called Classification.
Given a feature vector X and a qualitative response y taking values from one fixed set, the classification task is to build a function f(X) that takes as input the feature vector X and predicts its value for y.
Often we are interested also (or even more) in estimating the probabilities that X belongs to each category in C.
For example, it is more valuable to have the probability that an insurance claim is fraudulent, than if a classification is fraudulent or not.
There are many possible classification techniques, or classifiers, available to predict a qualitative response.
We will se now one called logistic regression.
Note: this post is part of a series about Machine Learning with Python.
Continue reading “An introduction to logistic regression”