# Plotting in Python – introduction

It is often said that a picture is worth thousands of words. This is especially true when coming to data: charts make the data properties immediately apparent.

Let’s start with a simple chart: a bar graph is a chart with rectangular bars whose lengths are proportional to the values that they represent. They are often used to compare data that fits into categories (histograms – sometimes confused with bars – are used for continuous data).

I will use as example the World Heritage sites, specifically the list of the top 10 countries with the highest number of sites.
The list is short (as a matter of fact 10 items) and can be represented in Python by a dictionary, better known as associative array (basically a collection of unique key + value pairs):

whcTop10 = {'Italy': 50, 'China': 47, 'Spain': 44, 'Germany': 39,
'France': 39, 'Mexico': 32, 'India': 32, 'UK': 28,
'Russia': 26, 'USA': 22}

As you see the key is a string (the name of the country) and the value is the number of sites in that country.
The Python dictionaries are orderless (if you print it, the order of the items will be different) so the items (lands on X axis and sites on Y axis) need first to be sorted, to have a better chart.
Note: there is a library available called Collections that have Ordered Dictionary as data structure, that can be used too. For this simple example would be though overkilling.

# the frequency for y axis, from highest to lowest
nSites = sorted(whcTop10.values(), reverse = True)
# the list of all top 10 lands, sorted by sites
lands = sorted(whcTop10.keys(), key=whcTop10.__getitem__, reverse=True)

To plot the chart, I will use the pyPlot module, part of the MatPlotLib library (which is included in many Python installations):

import matplotlib.pyplot as pypl   # pypl: you can put any other name...
landAsX = range (10)          # the x locations for the groups
""" plot the bar chart """
pypl.bar (landAsX, nSites)    # format data as bar chart
pypl.show()                  # display it

This is the chart produced with just a few lines of codes:

Note that the bars are displayed separate (unlike the histograms).
That’s nice but the chart looks quite bare … let’s add more descriptions, such as labels (which land is) for the bars, some more space between bars and axes and a grid:

""" plot the bar chart """
pypl.bar (landAsX, nSites)
# set lands names as labels on X axis (param. rotation is optional)
pypl.xticks(landAsX, lands, rotation=20)
# add some space between bars and axes
pypl.xlim([min(landAsX) - 0.3, max(landAsX) + 1]) # x axis
pypl.ylim([0, max(nSites) + 3])                   # y axis starts at 0
# let's add a grid on y-axis
pypl.grid(True, axis='y')
pypl.show()

Note that the labels are slightly slanted (parameter “rotation”) so that they are nicer to read and that the y-axis starts at 0; Starting at a value above zero truncates the bars and doesn’t accurately reflect the full value.
The “grid” API has many more parameters, such as line type, width and colour, that you can experiment with.
Here is the new outcome:

Now is nicer but still not so descriptive. You can add a chart title and labels for the axis with the following APIs. You can also add arbitrary lines and texts in the chart (for example to show what is the target or the average):

# add a red dashed line and a label for the mean (nSitesMean)
pypl.hlines(nSitesMean, -0.3, 10, color='red', linestyles='dashed',
label='$\mu (top10)$')
pypl.legend()    # display label for mean line into a legend
""" define the plot labels """
pypl.title('Distribution of the WHC sites by land - Top 10',
fontsize=18)  # chart title
pypl.ylabel('Number of WHC sites')            # y axis title
pypl.show()

And here is the final result:

There are many more APIs that can be used, to make even fancier charts. This is outside the scope of this small tutorial but just to make a couple of examples, you can annotate some part of the charts, or change colour of one of the bar:

# change colour of single bars (the very first) and annotate it
bars[0].set_color('green')
pypl.annotate('Country with largest number of WHC sites', xy=(0.5,50),
xytext=(2,49), arrowprops=dict(facecolor='green',
shrink=0.2), color='green')
pypl.text(7, 36, '$\mu=35$', color='red')     # text for mean

Here is the chart with annotations, colours, … somehow overfitting:

Or you can strip the chart to the very minimum but still being readable:

#add the numbers to the top of each bar
for pos, n in zip(landAsX, nSites):
pypl.annotate(str(n), xy=(pos + 0.3, n + 1.1))
# remove the borders
ax = pypl.gca()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
# turn off all ticks
ax.yaxis.set_ticks_position('none')
ax.xaxis.set_ticks_position('none')
# grid white
pypl.grid(axis='y', color='white', linestyle='-')

The chart can be rotated, by using barh instead of bar (or by adding a parameter orientation=’horizontal’ to bar) and exchanging x with y in the other APIs:

""" plot the bar chart """
pypl.barh (landAsX, nSites)

Have fun with pyplot and the bar charts!

Some Useful Web Page:  Plotting API summary