Automatic documentation with python

A nice feature in python is the documentation strings (or docstrings) which provides a convenient way of associating documentation with Python modules, functions, classes, and methods.

An object’s docstring is defined by including a string constant as the first statement in the object’s definition.

For example:

def mean(dataPoints):
This function calculates the arithmetic average of given data

To get each function description you can use the associated “doc” method:

>>> print stats.mean.__doc__
>>> This function calculates the arithmetic average of given data

Do this for every function inside a module (here stats is the name of the file/module). Add arguments and return value descriptions.

You can also add a comment at the beginning of the file, to explain what it contains.

This and all functions descriptions can then be viewed with the help command: help(fileName)

>>> import stats
>>> help(stats)
Help on module stats:
stats - Basic statistics module for data analysis and inference
This module provides functions for calculating statistics of data, including
averages, variance, and standard deviation.

    mean(dataPoints, precision=3)

        the arithmetic average of given data


            dataPoints: a list of data points, int or float

            precision (optional): digits precision after the comma, default=3


            float, the mean of the input

            or StatsError if X is empty.



        the median of given data


            dataPoints: a list of data points, int or float     


            the middle number in the sorted list, a float or an int

deviation measures (the standard deviation and the variance)

We have seen that measures of central tendency like the mean can describe how a set of data is typical compared to other sets.
In the same way, the variance can describe the spread of a set of data.
Let’s say we have this set as example: A=[1, 3, 29] and B=[10,11,12] both with mean = 11.
These are the deviation measures:

  • deviation from the mean is the difference between the mean and a given data point.

    deviation(i) = \left | x_{i} - \mu \right |

    For the set A, they are respectively 10, 8 and 18.

  • Variance is the mean square deviation, i.e. the sum of all the deviations from the mean, squared, and divided by the number of data points:

\sigma^{2} = \sum (x_{i} - \mu )^{^{2}}\frac{1}{n}

  • As you can see, the variance is hard to interpret, being its unit a squared, therefore the standard deviation has been introduced, that is just the square root of the variance.

    \sigma = \sqrt{\sum (x_{i} - \mu )^{^{2}}\frac{1}{n}}

    For set A it would be 12.75; the high standard deviation shows that the set is quite dispersed (in this case due to the number 29).

Let’s see how to calculate the standard deviation in Python, given a list of values:

Continue reading “deviation measures (the standard deviation and the variance)”