Logistic regression with Python statsmodels

We have seen an introduction of logistic regression with a simple example how to predict a student admission to university based on past exam results.
This was done using Python, the sigmoid function and the gradient descent. 

We can now see how to solve the same example using the statsmodels library, specifically the logit package, that is for logistic regression. The package contains an optimised and efficient algorithm to find the correct regression parameters.
You can follow along from the Python notebook on GitHub.

The initial part is exactly the same: read the training data, prepare the target variable.
Then, we’re going to import and use the statsmodels Logit function:

import statsmodels.formula.api as sm

model = sm.Logit(y, X)

result = model.fit()
Optimization terminated successfully.
         Current function value: 0.203498
         Iterations 9
Logit Regression Results
Dep. Variable: Admitted No. Observations: 100
Model: Logit Df Residuals: 97
Method: MLE Df Model: 2
Date: Tue, 18 Jul 2017 Pseudo R-squ.: 0.6976
Time: 15:06:33 Log-Likelihood: -20.350
converged: True LL-Null: -67.301
LLR p-value: 4.067e-21
coef std err z P>|z| [95.0% Conf. Int.]
Exam1 0.2062 0.048 4.296 0.000 0.112 0.300
Exam2 0.2015 0.049 4.143 0.000 0.106 0.297
intercept -25.1613 5.799 -4.339 0.000 -36.526 -13.796

You get a great overview of the coefficients of the model, how well those coefficients fit, the overall fit quality, and several other statistical measures.
The result object also lets you to isolate and inspect parts of the model output, for example the coefficients are in params field:

coefficients = result.params
Exam1      0.206232
Exam2      0.201472
intercept -25.161334

As you see, the model found the same coefficients as in the previous example.

The confidence interval gives you an idea for how robust the coefficients of the model are.

Exam1      0.112152    0.300311
Exam2      0.106168    0.296775
intercept -36.526287 -13.796380

Note: this post is part of a series about Machine Learning with Python.

15 thoughts on “Logistic regression with Python statsmodels

  1. Solomon

    This is great. But I have issue with my result, the coefficients failed to converged after 35 iterations. How can I increase the number of iterations? Also, I’m working with a complex design survey data, how do I include the sampling unit and sapling weight in the model?

  2. devender kumar

    I am not getting intercept in the model? Please help

    import statsmodels.formula.api as sm
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
    model = sm.Logit(endog=y_train,exog= X_train)
    result = model.fit()

    0 1
    Delay_bin 0.992853 1.068759
    LIMIT_BAL_bin 0.282436 0.447070
    Avg_Use_bin 0.151494 0.353306
    Tot_percpaid_bin 0.300069 0.490454
    Edu -0.278094 0.220439
    Age_bin 0.169336 0.732283

  3. Pingback: Classification metrics and Naive Bayes – Look back in respect

    1. mashimo

      In this case is the final cost minimised after n iterations (cost being – in short – the difference between the predictions and the actual labels).
      I think that statsmodels internally uses the scipy.optimize.minimize() function to minimise the cost function and that method is generic, therefore the verbose logs just say “function value”.

  4. Pingback: Multi-class logistic regression – Look back in respect

  5. Pingback: Logistic regression using SKlearn – Look back in respect

    1. mashimo

      Each student has a final admission result (1=yes, 0= no).
      Basically y is a logical variable with only two values.

  6. Pingback: An introduction to logistic regression – Look back in respect

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s