Mode

In previous posts I wrote about the measures of central tendency and how to calculate the mean and the median in Python.

Now let’s see how to calculate the mode (the most common number in a set of numbers).

The first thing to do is to calculate the number of occurrences for each number (how many times it’s present in the set).
This could be done using a second list but a more elegant solution would be to use a dictionary, also called associative array : a collection of (key, value) pairs, such that each key is unique.
In this case the keys are the numbers found in the set and the values its occurrences.

This function returns the dictionary given a set of numbers:

def getOccurrences(dataPoints):
  listOfTerms = {} # dictionary of all terms found
  for x in dataPoints:
    if x in listOfTerms:
      listOfTerms[x] += 1 # the key was there: increment its value
    else:
      listOfTerms[x] = 1  # new key
 
  return listOfTerms

It’s quite simple; the line

for x in dataPoints:

is the syntax in Python for the loops.
The square brackets [ ] access the dictionary element at the position indicated by the index inside the brackets.

Now, a very “pythonesque” solution would be to use just the max operator (it finds the key with the highest value):

def wrongMode(dataPoints):
 dataAndOcc = getOccurrences(dataPoints)
 return max(dataAndOcc.iterkeys(), key=(lambda key: dataAndFreq[key]))

Unfortunately this solution will find ONLY the first key with the maximum value.
If you have two numbers that occurs twice and all other only once, then you have TWO modes in the set and the solution is expected to find both of them.
This can be fixed by first finding the highest value (using the max operator):

maxOccurrence = max(dataAndOcc.values())

and then looping into the dictionary and extracting ALL the keys with a value matching the maximum one found:

return [k for k,v in dataAndOcc.items() if v == maxOccurrence]

And here is the complete function, with a couple of examples / unit tests:

def mode(dataPoints):
  dataAndOcc = getOccurrences(dataPoints)
 
  maxOccurrence = max(dataAndOcc.values())
  return [k for k,v in dataAndOcc.items() if v == maxOccurrence]

# Unit tests
X = [10.3, 4.1, 12, 15.5, 20.2, 5.8, 15.5, 4.1]
Y = (1,1,1)
print ("mode X = ", mode(X))  # returns a list of two items: [15.5, 4.1]
print ("mode Y = ", mode(Y))  # returns [1]
Advertisements

One thought on “Mode

  1. Pingback: Introduction to Python package NumPy | Look back in respect

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s