Logistic regression with Python statsmodels

We have seen an introduction of logistic regression with a simple example how to predict a student admission to university based on past exam results.
This was done using Python, the sigmoid function and the gradient descent. 

We can now see how to solve the same example using the statsmodels library, specifically the logit package, that is for logistic regression. The package contains an optimised and efficient algorithm to find the correct regression parameters.
You can follow along from the Python notebook on GitHub.

The initial part is exactly the same: read the training data, prepare the target variable.
Then, we’re going to import and use the statsmodels Logit function:

import statsmodels.formula.api as sm

model = sm.Logit(y, X)

result = model.fit()
Optimization terminated successfully.
         Current function value: 0.203498
         Iterations 9
 result.summary()
Logit Regression Results
Dep. Variable: Admitted No. Observations: 100
Model: Logit Df Residuals: 97
Method: MLE Df Model: 2
Date: Tue, 18 Jul 2017 Pseudo R-squ.: 0.6976
Time: 15:06:33 Log-Likelihood: -20.350
converged: True LL-Null: -67.301
LLR p-value: 4.067e-21
coef std err z P>|z| [95.0% Conf. Int.]
Exam1 0.2062 0.048 4.296 0.000 0.112 0.300
Exam2 0.2015 0.049 4.143 0.000 0.106 0.297
intercept -25.1613 5.799 -4.339 0.000 -36.526 -13.796

You get a great overview of the coefficients of the model, how well those coefficients fit, the overall fit quality, and several other statistical measures.
The result object also lets you to isolate and inspect parts of the model output, for example the coefficients are in params field:

coefficients = result.params
coefficients
Exam1      0.206232
Exam2      0.201472
intercept -25.161334

As you see, the model found the same coefficients as in the previous example.

The confidence interval gives you an idea for how robust the coefficients of the model are.

result.conf_int()
Exam1      0.112152    0.300311
Exam2      0.106168    0.296775
intercept -36.526287 -13.796380

Note: this post is part of a series about Machine Learning with Python.

Advertisements

One thought on “Logistic regression with Python statsmodels

  1. Pingback: An introduction to logistic regression – Look back in respect

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s