Introduction to Python package pandas

Pandas is another key Python library for data science.

It contains high-level data structures and manipulation tools designed to make data analysis fast and easy in Python. Pandas is built on top of NumPy.

Let’s see how can help by reading and analysing a data set.

The Series and the DataFrame are the pandas foundation classes.

Continue reading “Introduction to Python package pandas”

Introduction to Python package NumPy

We have seen how to calculate several measures of central tendency (like mean, mode and median) in Python, using the native lists.

Now, a more memory-efficient and fast handling than lists would be to use the array object, which gives me the opportunity to introduce one of the key Python package for data science: NumPy.

What is NumPy?

NumPy, short for Numerical Python, is a module that provides high-performance (thanks to its implementation in C and Fortran) vector, matrix and higher-dimensional data structures for Python.

The array object class is the foundation of NumPy, and they are basically like lists in Python, except that have a fixed size at creation, are statically typed and homogeneous (everything inside them must be of the same type); therefore the type of the elements is determined when the array is created and this improves the performance.

NumPy arrays are also a much more efficient way of storing and manipulating data than the built-in Python lists, allowing to exchange data between different programs and systems (for example between a Python program and another C++ program).

To create vector and matrix arrays there are several methods, from Python lists or from scratch:
Continue reading “Introduction to Python package NumPy”