Fairness can perpetuate discrimination

In the last century insurances had not sophisticated algorithms like today. The original idea of risk spreading and the principle of solidarity was based on the notion that sharing risk bound people together, encouraging a spirit of mutual aid and interdependence.

This started to change in the last decades, around the 70s, when this vision had given way to the so-called actuarial fairness. Living in a poor or minority-dense neighbourhood  was going to cost you more in insurance, denied loans and so on. Soon the insurances were accused of justifying discrimination to which they replied that were just doing their job, a purely technical job, and that it didn’t involve moral judgments.  Effects on society were really not their problem or business.

Sounds familiar? It’s the same arguments made by social network platforms today: they are technical platforms running algorithms and are not involved in judging the content. 

Civil rights activists lost their battles with the insurance industry because they insisted on arguing about the accuracy of certain statistics or the validity of certain classifications rather than questioning whether actuarial fairness was a valid way in the first place.

There are several obvious problems with it. If you believe the risk scores are accurate in predicting the future outcomes of a certain group of people, then it means it’s “fair” that a person is more likely to spend time in jail simply because they are black. 

The other problem is that there are fewer arrests in rich neighbourhoods, not because they commit less crimes but because there is less policing. One is more likely to be rearrested if one lives in an over-policed neighbourhood and that creates a feedback loop—more arrests mean higher recidivism rates. 

Over-policing and predictive policing may be “accurate” in the short term but the long-term effects on communities have been shown to be negative, creating self-fulfilling prophecies.

Like the insurers, large tech firms and the computer science community also tend to frame “fairness” in a de-politicised way involving only mathematics and code.

The problem is that fairness cannot be reduced to a simple self-contained mathematical definition—fairness is dynamic and social and not a statistical issue. It can never be fully achieved and must be constantly audited, adapted and debated in a democracy. By merely relying on historical data and current definitions of fairness, we will lock in the accumulated unfairnesses of the past and our algorithms and the products they support will always trail the norms, reflecting past norms rather than future ideals and slowing social progress rather than supporting it.

This is a summary of the inspiring idea from Joi Ito, that you can read in full in his article on Wired.

Advertisements

MonteCarlo and Pi

This is a post to introduce a couple of probability concepts that are useful for machine learning. To make it more interesting, I am mixing in it some MonteCarlo simulation ideas too!

To see examples in Python we need first to introduce the concept of random numbers.

Stochastic vs deterministic numbers

The English word stochastic is an adjective describing something that was randomly determined. It originally came from Greek στόχος (stokhos), meaning ‘aim, guess’.

On the other side, deterministic means that the outcome – given the same input – will always be the same. There is no unpredictability.

Random number

Randomly generated is a big area by itself, for our scope is enough to say that randomness is the lack of pattern or predictability in events. A random sequence of events therefore has no order and does not follow an intelligible combination.

Individual random events are by definition unpredictable, but in many cases the frequency of different outcomes over a large number of events is predictable.
And this is what is interesting for us: if I throw a die with six faces thousands of times, how many times in percent shall I expect to see the face number six?

Let’s see some practical examples with Python. As usual they are also available in a notebook. Continue reading “MonteCarlo and Pi”

DevOps

We live and work in exciting times when technology can play a key role in delivering real value and competitive advantage into any businesses. Technology now provides opportunities to deliver features to customers in new ways and at speeds not possible before. But organisations based on old IT and waterfall methods often find themselves struggling to keep up.

DevOps

DevOps is a new term that emerged from two related major trends.

The first trend – called also “agile infrastructure” or “agile operations” – originated from applying Agile and Lean to the operations work.

The second trend came from a better understanding of the value of collaboration between development and operations teams and how important operations has become in our increasingly service-oriented world.

Continue reading “DevOps”

Agile retrospectives: a simple framework

Retrospectives are an important Agile ceremony (and actually they are part of many other disciplines, just think about the project post-mortem analyses) were the team gets together and “looks back on or dealing with past events or situations” with the goal of achieving a continuous improvement.

But they are also one of the most complex ceremony to set up and facilitate and many teams tend to skip them.

One of the most useful framework for a successful retrospective is the one described by Esther Derby and Diana Larsen in their book Agile Retrospective.

Their focus is on short retrospectives, the ones occurring after a sprint, so typically after one to four weeks of work. While continuous builds, automated unit tests and frequent demos are all ways to focus attention on the product, retrospectives focus attention on how the team works and interacts.

The framework is organised in 5 stages:

  • Set the Stage
  • Gather Data
  • Generate Insights
  • Decide what to do
  • Close the retrospective

Continue reading “Agile retrospectives: a simple framework”

Multi-class logistic regression

We have seen several examples of binary logistic regression where the outcomes that we wanted to predict had two classes, such as a model predicting if a student will be admitted to the University (Yes or No) based on the previous exam results  or if a random Titanic passenger will survive or not.

Binary classification such as these are very common but you can also encounter classification problems where the outcome is a multi-class of more than two: for example if tomorrow weather will be sunny, cloudy or rainy; or if an incoming email shall be tagged as work, family, friends or hobby.

We see now a couple of approaches to handle such classification problems with a practical example: to classify a sky object based on a set of observed variables.
Data is from the Sloan Digital Sky Survey (Release 14).
For the example that we are using the sky object to be classified can be one of three classes: Star, Galaxy or Quasar.

The code is available also on a notebook in GitHub. Data is also available on GitHub. Continue reading “Multi-class logistic regression”

Introduction to time series – Part II: an example

Exploring a milk production Time Series

Time series models are used in a wide range of applications, particularly for forecasting, which is the goal of this example, performed in four steps:

– Explore the characteristics of the time series data.
– Decompose the time series into trend, seasonal components, and remainder components.
– Apply time series models.
– Forecast the production for a 12 month period.

Part I of this mini-series explored the basic terms of time series analysis.

Load and clean the data

The dataset is the production amount of several diary products in California, month by month, for 18 years.
Our goal: forecast the next year production for one of those products: milk.

You can follow along with the associated notebook in GitHubContinue reading “Introduction to time series – Part II: an example”