We have seen an introduction of logistic regression with a simple example how to predict a student admission to university based on past exam results.
This was done using Python, the sigmoid function and the gradient descent.
We can now see how to solve the same example using the statsmodels library, specifically the logit package, that is for logistic regression. The package contains an optimised and efficient algorithm to find the correct regression parameters.
You can follow along from the Python notebook on GitHub.
The initial part is exactly the same: read the training data, prepare the target variable.
Then, we’re going to import and use the statsmodels Logit function:
import statsmodels.formula.api as sm model = sm.Logit(y, X) result = model.fit()
Optimization terminated successfully. Current function value: 0.203498 Iterations 9
|Dep. Variable:||Admitted||No. Observations:||100|
|Date:||Tue, 18 Jul 2017||Pseudo R-squ.:||0.6976|
|coef||std err||z||P>|z|||[95.0% Conf. Int.]|
You get a great overview of the coefficients of the model, how well those coefficients fit, the overall fit quality, and several other statistical measures.
The result object also lets you to isolate and inspect parts of the model output, for example the coefficients are in params field:
coefficients = result.params coefficients
Exam1 0.206232 Exam2 0.201472 intercept -25.161334
As you see, the model found the same coefficients as in the previous example.
The confidence interval gives you an idea for how robust the coefficients of the model are.
Exam1 0.112152 0.300311 Exam2 0.106168 0.296775 intercept -36.526287 -13.796380
Note: this post is part of a series about Machine Learning with Python.