# deviation measures (the standard deviation and the variance)

We have seen that measures of central tendency like the mean can describe how a set of data is typical compared to other sets.
In the same way, the variance can describe the spread of a set of data.
Let’s say we have this set as example: A=[1, 3, 29] and B=[10,11,12] both with mean = 11.
These are the deviation measures:

• deviation from the mean is the difference between the mean and a given data point. $deviation(i) = \left | x_{i} - \mu \right |$

For the set A, they are respectively 10, 8 and 18.

• Variance is the mean square deviation, i.e. the sum of all the deviations from the mean, squared, and divided by the number of data points: $\sigma^{2} = \sum (x_{i} - \mu )^{^{2}}\frac{1}{n}$

• As you can see, the variance is hard to interpret, being its unit a squared, therefore the standard deviation has been introduced, that is just the square root of the variance. $\sigma = \sqrt{\sum (x_{i} - \mu )^{^{2}}\frac{1}{n}}$

For set A it would be 12.75; the high standard deviation shows that the set is quite dispersed (in this case due to the number 29).

Let’s see how to calculate the standard deviation in Python, given a list of values:

def stdDev(X):
"""
X: a list of values
returns: float, the standard deviation of the input,
"""
tot = 0.0
meanX = mean(X)
for x in X:
tot += (x - meanX) ** 2
return (tot/len(X))**0.5


The operator ** is the power, so **0.5 means doing the square root.
The function mean() was previously defined.