Variables can be described as either quantitative or qualitative.

Quantitative variables have a numerical value, e.g. a person’s income, or the price of a house.

**Qualitative** variables have a values taken from one of different classes or categories. E.g., a person’s gender (male or female), the type of house purchased (villa, flat, penthouse, …) the colour of the eye (brown, blue, green) or a cancer diagnosis.

Linear regression predicts a continuous variable but sometime we want to predict a categorical variable, i.e. a variable with a small number of possible discrete outcomes, usually unordered (there is no order among the outcomes).

This kind of problems are called Classification.

# Classification

Given a feature vector X and a qualitative response y taking values from one fixed set, the classification task is to build a function f(X) that takes as input the feature vector X and predicts its value for y.

Often we are interested also (or even more) in estimating the probabilities that X belongs to each category in C.

For example, it is more valuable to have the probability that an insurance claim is fraudulent, than if a classification is fraudulent or not.

There are many possible classification techniques, or classifiers, available to predict a qualitative response.

We will se now one called **logistic regression.**

Note: this post is part of a series about Machine Learning with Python.

Continue reading “An introduction to logistic regression” →