Plotting in Python – introduction

It is often said that a picture is worth thousands of words. This is especially true when coming to data: charts make the data properties immediately apparent.

Let’s start with a simple chart: a bar graph is a chart with rectangular bars whose lengths are proportional to the values that they represent. They are often used to compare data that fits into categories (histograms – sometimes confused with bars – are used for continuous data).

I will use as example the World Heritage sites, specifically the list of the top 10 countries with the highest number of sites.
The list is short (as a matter of fact 10 items) and can be represented in Python by a dictionary, better known as associative array (basically a collection of unique key + value pairs):

whcTop10 = {'Italy': 50, 'China': 47, 'Spain': 44, 'Germany': 39, 
            'France': 39, 'Mexico': 32, 'India': 32, 'UK': 28, 
            'Russia': 26, 'USA': 22}

As you see the key is a string (the name of the country) and the value is the number of sites in that country.
The Python dictionaries are orderless (if you print it, the order of the items will be different) so the items (lands on X axis and sites on Y axis) need first to be sorted, to have a better chart.
Note: there is a library available called Collections that have Ordered Dictionary as data structure, that can be used too. For this simple example would be though overkilling.

                   # the frequency for y axis, from highest to lowest
nSites = sorted(whcTop10.values(), reverse = True) 
                   # the list of all top 10 lands, sorted by sites                      
lands = sorted(whcTop10.keys(), key=whcTop10.__getitem__, reverse=True)

To plot the chart, I will use the pyPlot module, part of the MatPlotLib library (which is included in many Python installations):

import matplotlib.pyplot as pypl   # pypl: you can put any other name...
landAsX = range (10)          # the x locations for the groups
""" plot the bar chart """ (landAsX, nSites)    # format data as bar chart                  # display it

This is the chart produced with just a few lines of codes:

Step 1. simple chart
Step 1. simple chart

Note that the bars are displayed separate (unlike the histograms).
That’s nice but the chart looks quite bare … let’s add more descriptions, such as labels (which land is) for the bars, some more space between bars and axes and a grid:

""" plot the bar chart """ (landAsX, nSites)
   # set lands names as labels on X axis (param. rotation is optional)
pypl.xticks(landAsX, lands, rotation=20)  
   # add some space between bars and axes 
pypl.xlim([min(landAsX) - 0.3, max(landAsX) + 1]) # x axis
pypl.ylim([0, max(nSites) + 3])                   # y axis starts at 0
   # let's add a grid on y-axis
pypl.grid(True, axis='y')

Note that the labels are slightly slanted (parameter “rotation”) so that they are nicer to read and that the y-axis starts at 0; Starting at a value above zero truncates the bars and doesn’t accurately reflect the full value.
The “grid” API has many more parameters, such as line type, width and colour, that you can experiment with.
Here is the new outcome:

Now with labels, grid and borders. Nicer!
Step 2.  Now with labels, grid and borders. Nicer!

Now is nicer but still not so descriptive. You can add a chart title and labels for the axis with the following APIs. You can also add arbitrary lines and texts in the chart (for example to show what is the target or the average):

   # add a red dashed line and a label for the mean (nSitesMean)
pypl.hlines(nSitesMean, -0.3, 10, color='red', linestyles='dashed',
           label='$\mu (top10) $')
pypl.legend()    # display label for mean line into a legend
""" define the plot labels """
pypl.title('Distribution of the WHC sites by land - Top 10', 
          fontsize=18)  # chart title
pypl.ylabel('Number of WHC sites')            # y axis title

And here is the final result:

Step 3 - With titles and legend
Step 3 – With titles and legend

There are many more APIs that can be used, to make even fancier charts. This is outside the scope of this small tutorial but just to make a couple of examples, you can annotate some part of the charts, or change colour of one of the bar:

    # change colour of single bars (the very first) and annotate it
pypl.annotate('Country with largest number of WHC sites', xy=(0.5,50), 
              xytext=(2,49), arrowprops=dict(facecolor='green', 
              shrink=0.2), color='green')
pypl.text(7, 36, '$\mu=35 $', color='red')     # text for mean

Here is the chart with annotations, colours, … somehow overfitting:

Version A - exagerated
Version A – bit exaggerated

Or you can strip the chart to the very minimum but still being readable:

       #add the numbers to the top of each bar
for pos, n in zip(landAsX, nSites):
    pypl.annotate(str(n), xy=(pos + 0.3, n + 1.1))
       # remove the borders
ax = pypl.gca()
      # turn off all ticks
      # grid white
pypl.grid(axis='y', color='white', linestyle='-')
Version B - minimalistic
Version B – minimalistic

The chart can be rotated, by using barh instead of bar (or by adding a parameter orientation=’horizontal’ to bar) and exchanging x with y in the other APIs:

""" plot the bar chart """
pypl.barh (landAsX, nSites)
Version C  - horizontal
Version C – horizontal

Have fun with pyplot and the bar charts!

Some Useful Web Page:  Plotting API summary

3 thoughts on “Plotting in Python – introduction

  1. Pingback: Scatter plot in Python | Look back in respect

  2. Pingback: visualize quartiles and summary statistics in python | Look back in respect

  3. Pingback: Plotting on the web (with Javascript D3) | Look back in respect

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s