MonteCarlo and Pi

This is a post to introduce a couple of probability concepts that are useful for machine learning. To make it more interesting, I am mixing in it some MonteCarlo simulation ideas too!

To see examples in Python we need first to introduce the concept of random numbers.

Stochastic vs deterministic numbers

The English word stochastic is an adjective describing something that was randomly determined. It originally came from Greek στόχος (stokhos), meaning ‘aim, guess’.

On the other side, deterministic means that the outcome – given the same input – will always be the same. There is no unpredictability.

Random number

Randomly generated is a big area by itself, for our scope is enough to say that randomness is the lack of pattern or predictability in events. A random sequence of events therefore has no order and does not follow an intelligible combination.

Individual random events are by definition unpredictable, but in many cases the frequency of different outcomes over a large number of events is predictable.
And this is what is interesting for us: if I throw a die with six faces thousands of times, how many times in percent shall I expect to see the face number six?

Let’s see some practical examples with Python. As usual they are also available in a notebook. Continue reading “MonteCarlo and Pi”

Advertisements

DevOps

We live and work in exciting times when technology can play a key role in delivering real value and competitive advantage into any businesses. Technology now provides opportunities to deliver features to customers in new ways and at speeds not possible before. But organisations based on old IT and waterfall methods often find themselves struggling to keep up.

DevOps

DevOps is a new term that emerged from two related major trends.

The first trend – called also “agile infrastructure” or “agile operations” – originated from applying Agile and Lean to the operations work.

The second trend came from a better understanding of the value of collaboration between development and operations teams and how important operations has become in our increasingly service-oriented world.

Continue reading “DevOps”

Agile retrospectives: a simple framework

Retrospectives are an important Agile ceremony (and actually they are part of many other disciplines, just think about the project post-mortem analyses) were the team gets together and “looks back on or dealing with past events or situations” with the goal of achieving a continuous improvement.

But they are also one of the most complex ceremony to set up and facilitate and many teams tend to skip them.

One of the most useful framework for a successful retrospective is the one described by Esther Derby and Diana Larsen in their book Agile Retrospective.

Their focus is on short retrospectives, the ones occurring after a sprint, so typically after one to four weeks of work. While continuous builds, automated unit tests and frequent demos are all ways to focus attention on the product, retrospectives focus attention on how the team works and interacts.

The framework is organised in 5 stages:

  • Set the Stage
  • Gather Data
  • Generate Insights
  • Decide what to do
  • Close the retrospective

Continue reading “Agile retrospectives: a simple framework”

Multi-class logistic regression

We have seen several examples of binary logistic regression where the outcomes that we wanted to predict had two classes, such as a model predicting if a student will be admitted to the University (Yes or No) based on the previous exam results  or if a random Titanic passenger will survive or not.

Binary classification such as these are very common but you can also encounter classification problems where the outcome is a multi-class of more than two: for example if tomorrow weather will be sunny, cloudy or rainy; or if an incoming email shall be tagged as work, family, friends or hobby.

We see now a couple of approaches to handle such classification problems with a practical example: to classify a sky object based on a set of observed variables.
Data is from the Sloan Digital Sky Survey (Release 14).
For the example that we are using the sky object to be classified can be one of three classes: Star, Galaxy or Quasar.

The code is available also on a notebook in GitHub. Data is also available on GitHub. Continue reading “Multi-class logistic regression”

Introduction to time series – Part II: an example

Exploring a milk production Time Series

Time series models are used in a wide range of applications, particularly for forecasting, which is the goal of this example, performed in four steps:

– Explore the characteristics of the time series data.
– Decompose the time series into trend, seasonal components, and remainder components.
– Apply time series models.
– Forecast the production for a 12 month period.

Part I of this mini-series explored the basic terms of time series analysis.

Load and clean the data

The dataset is the production amount of several diary products in California, month by month, for 18 years.
Our goal: forecast the next year production for one of those products: milk.

You can follow along with the associated notebook in GitHubContinue reading “Introduction to time series – Part II: an example”

Introduction to time series – Part I: the basics

A Time series is a data set collected through time.

What makes it different from other datasets that we used for regular regression problems are two things:

  1. It is time dependent. So the basic assumption of a linear regression model that the observations are independent doesn’t hold in this case.
  2. Most time series have some form of trend – either an increasing or decreasing trend – or some kind of seasonality pattern, i.e. variations specific to a particular time frame.

Basically, this means that the present is correlated with the past.
A value at time T is correlated with the value at T minus 1 but it may also correlated with the value at time T minus 2, maybe not quite as much as T minus 1.
And even at 20 times steps behind, we could still know something about the value of T because they’re still correlated, depending on which kind of time series it is.
And this obviously is not true with normal random data.

Time series are everywhere, for example in:

  • Financial data (stocks, currency exchange rates, interest rates)
  • Marketing (click-through rates for web advertising)
  • Economics (sales and demand forecasts)
  • Natural phenomenon (water flow, temperature, precipitation, wind speed, animal species abundance, heart rate)
  • Demographic and population and so on.

What might you want to do with time series?

  • Smoothing – extract an underlying signal (a trend) from a noise.
  • Modelling – explain how the time series arose, for intervention.
  • Forecasting – predict the values of the time series in the future.

We first see here which specific characteristics the Time Series (TS) have, and will then see in a second part a concrete example of TS analysis (smoothing + modelling + forecasting).

You can follow along with the associated notebook in GitHub. Continue reading “Introduction to time series – Part I: the basics”