Moneyball again: a multiple linear regression example

We have seen a linear regression example as told in the book / movie Moneyball: how a baseball team was able to leverage the power of data analysis to compete with richer teams.
We used the Gradient Descent algorithm to see how the number of wins in a season linearly depends on the number of runs scored and allowed.

We can extend that example, with the help of one of the most used Python library : sklearn; short for Scientific Kit LEARN, a Machine Learning dedicated tool, designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

The function LinearRegression fits a linear model to minimise the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation. The method used is the Ordinary Least Squares.

moneyball is the dataset containing the necessary data, refer to the previous post for details.

Here is an extract of the Jupyter notebook, that is available on Github. Continue reading “Moneyball again: a multiple linear regression example”