# Automatic documentation with python

A nice feature in python is the documentation strings (or docstrings) which provides a convenient way of associating documentation with Python modules, functions, classes, and methods.

An object’s docstring is defined by including a string constant as the first statement in the object’s definition.

For example:

def mean(dataPoints):
"""
This function calculates the arithmetic average of given data
"""

To get each function description you can use the associated “doc” method:

>>> print stats.mean.__doc__
>>> This function calculates the arithmetic average of given data

Do this for every function inside a module (here stats is the name of the file/module). Add arguments and return value descriptions.

You can also add a comment at the beginning of the file, to explain what it contains.

This and all functions descriptions can then be viewed with the help command: help(fileName)

>>> import stats
>>> help(stats)
Help on module stats:
NAME
stats - Basic statistics module for data analysis and inference
DESCRIPTION
This module provides functions for calculating statistics of data, including
averages, variance, and standard deviation.
FUNCTIONS

mean(dataPoints, precision=3)

the arithmetic average of given data

Arguments:

dataPoints: a list of data points, int or float

precision (optional): digits precision after the comma, default=3

Returns:

float, the mean of the input

or StatsError if X is empty.

median(dataPoints)

the median of given data

Arguments:

dataPoints: a list of data points, int or float

Returns:

the middle number in the sorted list, a float or an int

# deviation measures (the standard deviation and the variance)

We have seen that measures of central tendency like the mean can describe how a set of data is typical compared to other sets.
In the same way, the variance can describe the spread of a set of data.
Let’s say we have this set as example: A=[1, 3, 29] and B=[10,11,12] both with mean = 11.
These are the deviation measures:

• deviation from the mean is the difference between the mean and a given data point. $deviation(i) = \left | x_{i} - \mu \right |$

For the set A, they are respectively 10, 8 and 18.

• Variance is the mean square deviation, i.e. the sum of all the deviations from the mean, squared, and divided by the number of data points: $\sigma^{2} = \sum (x_{i} - \mu )^{^{2}}\frac{1}{n}$

• As you can see, the variance is hard to interpret, being its unit a squared, therefore the standard deviation has been introduced, that is just the square root of the variance. $\sigma = \sqrt{\sum (x_{i} - \mu )^{^{2}}\frac{1}{n}}$

For set A it would be 12.75; the high standard deviation shows that the set is quite dispersed (in this case due to the number 29).

Let’s see how to calculate the standard deviation in Python, given a list of values: