This is a post to introduce a couple of probability concepts that are useful for machine learning. To make it more interesting, I am mixing in it some MonteCarlo simulation ideas too!

To see examples in Python we need first to introduce the concept of random numbers.

# Stochastic vs deterministic numbers

The English word stochastic is an adjective describing something that was randomly determined. It originally came from Greek στόχος (stokhos), meaning ‘aim, guess’.

On the other side, deterministic means that the outcome – given the same input – will always be the same. There is no unpredictability.

## Random number

Randomly generated is a big area by itself, for our scope is enough to say that randomness is the lack of pattern or predictability in events. A random sequence of events therefore has no order and does not follow an intelligible combination.

Individual random events are by definition unpredictable, but in many cases the frequency of different outcomes over a large number of events is predictable.

And this is what is interesting for us: if I throw a die with six faces thousands of times, how many times in percent shall I expect to see the face number six?

Let’s see some practical examples with Python. As usual they are also available in a notebook.

We generate a (pseudo) random number in Python using the random library:

import random def genEven(): ''' Returns a random even number x, where 0 <= x < 100 ''' return random.randrange(0,100,2) genEven()

66

def stochasticNumber(): ''' Stochastically generates and returns a uniformly distributed even number between 9 and 21 ''' return random.randrange(10,21,2) stochasticNumber()

12

Again:

stochasticNumber()

14

Generally, in applications such as in security applications, hardware generators are generally preferred over software algorithm.

A pseudo-random algorithms, like the Python random library above, is called pseudo because is not really unpredictable: the sequence of random numbers generated depends on the initial seed: using the same number as seed will generate the same sequence.

This is very useful for debugging purpose but means also that one needs to be very careful when choosing the seed (for example, choosing atmospheric signals or other noises).

We can use this property to generate deterministic numbers. If the seed is fixed, the number generated “randomly” will be always the same.

def deterministicNumber(): ''' Deterministically generates and returns an even number between 9 and 21 ''' random.seed(0) # Fixed seed, always the same. return random.randrange(10,21,2) deterministicNumber()

16

And again:

deterministicNumber()

16

The same number !!

Before looking at what is the probability of an event, we define a function that simulates the roll of a six-faced die:

def rollDie(): """returns a random int between 1 and 6""" return random.randint(1,6) # an alternative: return random.choice([1,2,3,4,5,6]) rollDie()

4

# Discrete Probability

If E represents an event, then P(E) is the probability that E will occur.

A probability is a frequency expressed as a fraction of the sample size:

P(E) = number of outcomes favourable to E / total number of outcomes

It is a real value between 0 and 1.

Example: a die has six faces: the space of the possibilities is six.

If I want to calculate the probability of getting the event 5 when rolling the die one time, I need to consider that there is one face with the number *five* on it therefore only one possible outcome favourable:

P(5) = 1 / 6

This is the frequentist definition.

Probability can be seen as the *long-run relative frequency* with which an event occurred given many repeated trials.

Empirically, let’s say I throw this die 1000 times and the face number five comes 125 times.The frequency observed for face six is 125/1000 = 0.125

An alternative is Bayesianism which defines probability as a degree of belief that an event will occur. It depends on the own state of knowledge, therefore is more subjective than frequency probability.

By the way, the probability that an event does NOT occur is:

P(not A) = 1 – P(A)

Probability NOT to get a five in a die roll is therefore 1 – 1/6 = 5/6

# Monte Carlo simulations

One of the most powerful techniques in any data scientist’s tool belt is the Monte Carlo Simulation. It’s super flexible and extremely powerful since it can be applied to almost any situation if the problem can be stated probabilistically.

Monte Carlo methods are a class of methods that can be applied to computationally ‘difficult’ problems to arrive at near-enough accurate answers. The general premise is remarkably simple:

- Randomly sample input(s) to the problem
- For each sample, compute an output
- Aggregate the outputs to approximate the solution

The name refers to a famous casino in Monaco. It was coined in 1949 by one of the method’s pioneers, Stanislaw Ulam. Ulam’s uncle was reportedly a gambling man, and the connection between the element of ‘chance’ in gambling and in Monte Carlo methods must have been particularly apparent to Stanislaw.

## A Montecarlo simulation of a die roll

We run now a simulation of a die roll and see what are the frequencies for each face of the die:

def simMCdie(numTrials): print ("Running ", numTrials) counters = [0] * 6 # initialize the counters for each face for i in range(numTrials): roll = rollDie() counters[roll-1] += 1 return counters results = simMCdie(10000)

Running 10000 [1657, 1690, 1697, 1655, 1632, 1669]

import matplotlib.pyplot as plt plt.bar(['1','2','3','4','5','6'], results);

# Independent events

A and B are independent if knowing whether A occurred gives no information about whether B occurred.

Let’s define a function to model *n* rolls of a die, whereas each roll should be independent from the others.

def rollNdice(n): result = '' for i in range(n): result = result + str(rollDie()) return result rollNdice(5)

'12344'

These are independent events.

Now an interesting question would be:

What is the probability that both two independent events A and B occur?

**P(A and B) = P(A) * P(B)**

For example, the probability to get TWO consecutive six in a die roll is therefore:

1/6 * 1/6 = 1 / (6^2) = 1 / 36

Which is quite low.

This applies also for more than two independent events.

In general, the probabilities to occur *n* independent events is:

For a six-sided die, there are 6^5 possible sequences of length five.

The probability of getting five consecutive six is 1 / 6^5, pretty low number. 1 out of 7776 possibilities.

Let’s look at a simulation to check this.

def getTarget(goal): # Roll dice until we get the goal # goal: a string of N die results, for example 5 six: "66666" numRolls = len(goal) numTries = 0 while True: numTries += 1 result = rollNdice(numRolls) # if success then exit if result == goal: return numTries def runDiceMC(goal, numTrials): print ("Running ... trails: ", numTrials) total = 0 for i in range(numTrials): total += getTarget(goal) print ('Average number of tries =', total/float(numTrials)) runDiceMC('66666', 100)

Running ... trails: 100 Average number of tries = 6596.93

Remember that the theory says it will take on average 7776 tries.

## Pascal’s problem

A friend asked Pascal:

*would it be profitable, given 24 rolls of a pair of dice, to bet against their being at least one double six?*

In 17th century this was a hard problem.

Now we know it’s :

P(A=6 and B=6) = 1/6 * 1/6 = 1/36 (two independent events)

P(not double six) = 1 – 1/36 = 35/36

P(no double six in 24 rolls) = (35/36)^24

(35.0 / 36.0)**24

0.5085961238690966

it’s very close!

Again, we can run a simulation to check it:

def checkPascalMC(numTrials, roll, numRolls = 24, goal = 6): numSuccess = 0.0 for i in range(numTrials): for j in range(numRolls): die1 = roll() die2 = roll() if die1 == goal and die2 == goal: numSuccess += 1 break print ('Probability of losing =', 1.0 - numSuccess / numTrials) checkPascalMC(10000, rollDie)

Probability of losing = 0.5063

# Sampling table

The sampling table gives the number of possible samples of size *k* out of a population of size *n*, under various assumptions how the sample is collected.

One example:

one ball will be drawn at random from a box containing: 3 green balls, 5 red balls, and 7 yellow balls.

What is the probability that the ball will be green?

green = 3 red = 5 yellow = 7 balls = green+red+yellow pGreen = green / balls pGreen

0.2

The population has size *15* and has therefore 15 possible samples of size *1*; out of these 15 possible samples, only 3 of them will answer our question (ball is green).

We defined the variable pGreen as the probability of choosing a green ball from the box. What is the probability that the ball you draw from the box will NOT be green?

1 - pGreen

0.8

## Sampling without replacement – generalised

Instead of taking just one draw, consider taking two draws. You take the second draw without returning the first draw to the box. We call this **sampling without replacement**.

What is the probability that the first draw is green and that the second draw is not green?

# probability of choosing a green ball from the box on the first draw. pGreen1 = green / balls # probability of NOT choosing a green ball on the second draw without replacement. pGreen2 = (red + yellow) / (green -1 + red + yellow) # Calculate the probability that the first draw is green and the second draw is not green. pGreen1 * pGreen2

0.17142857142857143

## Sampling with replacement – generalised

Now repeat the experiment, but this time, after taking the first draw and recording the color, return it back to the box and shake the box. We call this **sampling with replacement**.

What is the probability that the first draw is green and that the second draw is not green?

# probability of choosing a green ball from the box on the first draw. # same as above: pGreen1 # probability of NOT choosing a green ball on the second draw with replacement pGreen2r = (red + yellow) / balls # Calculate the probability that the first draw is cyan and the second draw is not cyan. pGreen1 * pGreen2r

0.16000000000000003

### Sampling with replacement – be careful

Say you’ve drawn 5 balls from a box that has 3 green balls, 5 red balls, and 7 yellow balls – with replacement – and all have been yellow.

What is the probability that the next one is yellow?

# probability that a yellow ball is drawn from the box. pYellow = yellow / balls # probability of drawing a yellow ball on the sixth draw. pYellow

0.4666666666666667

Yes, doesn’t matter if all previous five draws were ALL yellow balls, the probability that the sixth ball is yellow **is the same** as for the first draw and all other draws. With replacement the population is always the same.

## A football match

Two teams, say the Manchester United (M.Utd.) and the AC Milan, are playing a seven game series. The Milan are a better team and have a 60% chance of winning each game.

What is the probability that the M.Utd. wins at least one game? Remember that they must win one of the first four games, or the series will be over!

Let´s assume they are independent events (in reality losing one match may impact the team’s morale for the next match):

p_milan_wins = 0.6 # probability that the Milan team will win the first four games of the series: p_milan_win4 = p_milan_wins**4 # probability that the M.Utd. wins at least one game in the first four games of the series. 1 - p_milan_win4

0.8704000000000001

Here is the Monte Carlo simulation to confirm our answer to M.Utd. winning a game:

import numpy as np def RealWinsOneMC(numTrials, nGames=4): numSuccess = 0 for i in range(numTrials): simulatedGames = np.random.choice(["lose","win"], size=nGames, replace=True, p=[0.6,0.4]) if 'win' in simulatedGames: numSuccess += 1 return numSuccess / numTrials print (RealWinsOneMC(1000))

0.876

## Winning a game – with MonteCarlo

The two teams are playing a seven game championship series. The first to win four games wins the series. The teams are equally good, so they each have a 50-50 chance of winning each game.

If Milan lose the first game, what is the probability that they win the series?

# Create a list that contains all possible outcomes for the remaining games. possibilities = [(g1,g2,g3,g4,g5,g6) for g1 in range(2) for g2 in range(2) for g3 in range(2) for g4 in range(2) for g5 in range(2) for g6 in range(2)] # Create a list that indicates whether each row in 'possibilities' # contains enough wins for the Cavs to win the series. sums = [sum(tup) for tup in possibilities] results = [s >= 4 for s in sums] # Calculate the proportion of 'results' in which the Cavs win the series. np.mean(results)

0.34375

Confirm the results of the previous question with a Monte Carlo simulation to estimate the probability of Milan winning the series.

def MilanWinsSeriesMC(numTrials, nGames=6): numSuccess = 0 for i in range(numTrials): simulatedGames = np.random.choice([0,1], size=nGames, replace=True) if sum(simulatedGames) >= 4: numSuccess += 1 return numSuccess / numTrials MilanWinsSeriesMC(100)

0.3

That’s all for now.

Finally – to show how useful probability and MonteCarlo can be for many real problems – we can see how to estimate the value of the infamous PI.

# Estimate the value of pi

Pi (or π) is a mathematical constant. It is perhaps most famous for its appearance in the formula for the area of a circle:

A = πr²

π is an example of an **irrational** number. Its exact value is impossible to represent as any fraction of two integers.

In fact, π is also an example of a **transcendental** number — there aren’t even any polynomial equations to which it is a solution.

Nonetheless obtaining an accurate value for π is not that difficult.

You can find a pretty good estimate of π using a Monte Carlo-inspired method:

Draw a 2m×2m square on a wall. Inside, draw a circle target of radius 1m.

Now, take a few steps back and throw darts randomly at the wall. Count each time the dart lands in the circle.

After a hundred throws, multiply what fraction of throws landed in the circle by the area of the square. There is your estimate for π.

The circle’s area is about 78% that of the square, so about that percentage of darts lands inside the circle.

The reason this works is very intuitive.When sampling random points from a square containing a circle, the probability of selecting points from within the circle is proportional to the circle’s area. With enough random samples, we can find a reliable estimate for this proportion, p.

Now, we know the area of the square is 2×2 = 4m², and we know the area of the circle is π×r². Since the radius r equals 1, the area of the circle is just π.

As we know the area of the square and have an estimate for the proportion of its area covered by the circle, we can estimate π. Simply multiply p×4.

The more random samples we throw, the better the estimate *p* will be. However, the gain in accuracy diminishes as we take more and more samples.

def throwNeedlesInCircle(numNeedles): ''' Throw randomly needles in a 2x2 square (area=4) that has a circle inside of radius 1 (area = PI) Count how many of those needles at the end landed inside the circle. Return this estimated proportion: Circle Area / Square Area ''' inCircle = 0 # number of needles inside the circle for needle in range(1, numNeedles + 1): x = random.random() y = random.random() if (x*x + y*y)**0.5 <= 1.0: inCircle += 1 return (inCircle/float(numNeedles)) piEstimation = throwNeedlesInCircle(100) * 4 piEstimation

3.16

More needles you throw more precise will be the PI value.

def getPiEstimate(numTrials, numNeedles): print(("{t} trials, each has {n} Needles.")\ .format(t= numTrials, n=numNeedles)) estimates = [] for t in range(numTrials): piGuess = 4*throwNeedlesInCircle(numNeedles) estimates.append(piGuess) stdDev = np.std(estimates) curEst = sum(estimates)/len(estimates) print ('PI Estimation = ' + str(curEst)) print ('Std. dev. = ' + str(round(stdDev, 5))) return (curEst, stdDev) getPiEstimate(20, 100);

20 trials, each has 100 Needles. PI Estimation = 3.106 Std. dev. = 0.18858

We can do better and go on – increasing the number of needles at each trial – until we get the wished precision.

def estimatePi(precision, numTrials, numNeedles = 1000): sDev = precision while sDev >= precision/2.0: curEst, sDev = getPiEstimate(numTrials, numNeedles) numNeedles *= 2 print("---") return curEst estimatePi(0.005, 100)

100 trials, each has 1000 Needles. PI Estimation = 3.13456 Std. dev. = 0.05118 --- 100 trials, each has 2000 Needles. PI Estimation = 3.1476800000000016 Std. dev. = 0.0388 --- 100 trials, each has 4000 Needles. PI Estimation = 3.1422299999999996 Std. dev. = 0.02884 --- 100 trials, each has 8000 Needles. PI Estimation = 3.140664999999999 Std. dev. = 0.01864 --- 100 trials, each has 16000 Needles. PI Estimation = 3.141509999999999 Std. dev. = 0.01187 --- 100 trials, each has 32000 Needles. PI Estimation = 3.14153125 Std. dev. = 0.00842 --- 100 trials, each has 64000 Needles. PI Estimation = 3.1414699999999995 Std. dev. = 0.00589 --- 100 trials, each has 128000 Needles. PI Estimation = 3.1423515625000005 Std. dev. = 0.00448 --- 100 trials, each has 256000 Needles. PI Estimation = 3.142143593750002 Std. dev. = 0.0034 --- 100 trials, each has 512000 Needles. PI Estimation = 3.1416040624999995 Std. dev. = 0.00247 --- 3.1416040624999995

Monte Carlo techniques are often invoked when we need to do repeated sampling to determine results. Often that means that the problem cannot be simply solved mathematically. For instance, when the effect of one step in the calculation changes the possibilities in the next steps.

In a probabilistic world, we use random variables to represent stochastic phenomena. Once we know the random variables, we can use the Monte Carlo method.

The Monte Carlo method is a fine way to find the variations of the process. In other words, the risk of the process. The risk of a supply chain to be understock or overstock. The risk of the costs of the project to be above budget. The risk of the work to exceed timeline. And so on.