Moneyball: a simple Regression example

Moneyball book cover, from Wikimedia

The book (and later a movie) Moneyball by Michael Lewis tells the story of how the USA baseball team Oakland Athletics in 2002 leveraged the power of data instead of relying on experts.
Better data and better analysis of the data lead to find and use market inefficiencies.

The team was one of the poorest in a period when only rich teams could afford the all-star players (the imbalance in total salaries being something like 4 to 1).
A new ownership in 1995 was improving the team’s wins but in 2001 the loss of 3 key players and budget cuts were bringing a new idea: take a quantitative approach and find undervalued players.

The traditional way to select players was through scouting but Oakland and his general manager Billy Bean (Brad Pitt in the movie…) selected the players based on their statistics without any prejudice. Specifically, his assistant – the Harvard graduate Paul DePodesta looked at the data to find which ones were the undervalued skills.

A huge repository for the USA baseball statistics (called Sabermetrics) is the Lahman’s Baseball Database. This database contains complete batting and pitching statistics from 1871 to present plus fielding statistics, standings, team stats, managerial records, post-season data and more.

Continue reading “Moneyball: a simple Regression example”