Data are considered the new secret sauce, are everywhere and have been the cornerstone for the success of many high-tech companies, from Google to Facebook.
But we always used data, there are examples from the ancient times dated thousands of years ago.
In the latest centuries data started to find more and more practical applications thanks to the emergence of statistics and later by the Business Intelligence. The earliest known use of the term “Business Intelligence” is by Richard Millar Devens in 1865. Devens used the term to describe how a banker gained profit by receiving and acting upon information about his environment, prior to his competitors.
It is after the WWII that the practice of using data-based systems to improve business decision-making – surely driven by advances in automatic computing systems and storage possibilities – started to take off and be used widely. Digital storage becomes more cost-effective for storing data than paper and since then, an unbelievable amount of data have been collected and organised in data warehouses, initially in structured formats. The term Big Data started to be used meaning just a lot of data.
In a 2001 research report and related lectures, analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing
- volume (the amount of data reached peaks that could be handled only by specific systems)
- velocity (speed of data in and out, including the emergence of real-time data)
- variety (the range of data types and sources, often in unstructured formats)
Gartner, and now much of the industry, quickly picked this “3Vs” model for describing Big Data which, a decade later, has become the generally accepted three defining dimensions of big data.
Especially critical is here that the Big Data focus is not primarily about the size but all 3 aspects, the characteristics of the data nonetheless its variety.
Relational Databases require ‘pristine data’. If the data is in the database then it is accurate, clean and 100% reliable. A huge amount of time, money and accountability is put on to making sure the data is well prepared before loading it in to the database.
Big Data tackles this problem from the other direction. The data are poorly defined, much of it may be inaccurate and much of it may in fact be missing.
Big Data has to have enough volume so that the amount of bad data or missing data becomes statistically insignificant.
The abundance of available data means also that the trend was shifting from Business Intelligence (inherently descriptive statistics ) where data is used to measure things, detect trends, etc.. to the use of inductive statistics to infer laws from large sets of data to reveal patterns, relationships and dependencies, or to perform predictions of outcomes and behaviours.
In the world of data this new interdisciplinary field is called Data Science.
Data Science is all about extracting knowledge from data, either structured or unstructured, and incorporates many diverse skills such as mathematics, statistics, artificial intelligence, computer programming, visualisation, image analysis, and much more.
The term “data science” has existed for over thirty years and was variously used interchangeably for data analysis or data mining (i.e., the process of discovering patterns in large data sets, like when you mine a mountain of data and your goal is to find the nuggets of insight) but gradually started to include more areas.
Don’t be fooled by the many academics and journalists who see no distinction between data science and statistics or even advocate that statistics be renamed data science and statisticians data scientists (C.F. Jeff Wu in 1997).
Data science is an independent discipline, who relies on the shoulders of statistics but it is extending the field to new realms thanks to Big Data, Computer Science and distributed systems.
It’s actually very challenging for models and situations with large numbers of features, as it is quite often in data that we see these days – such as high dimensional data with large numbers of variables, possibly larger than the number of observations. Modern sensors used in the Internet Of Things can collect hundreds or thousands of features.
Things which were simple with smaller numbers of variables now become very challenging.
And that’s actually challenging to the point where it’s an open area in research.
For example ridge regression, LASSO and other shrinkage methods are a really contemporary area of research right now.
Although ridge regression was invented in the ’70s it wasn’t very popular for many, many years. It’s only with the advent of fast computation in the last 10 years that it’s become very popular along with the LASSO.
LASSO (least absolute shrinkage and selection operator) is actually a paper that Robert Tibshirani wrote in 1996 and at the time it didn’t get a lot of attention, but in the last 10 years or so, it’s become a very hot topic both in statistics and computer science.
One reason for its popularity now is computation, what’s called a convex optimisation.
Earlier there was, like, one approach that statisticians were using to fit this model.
And this has suddenly become something that anyone can solve on their laptop, no matter how big your data is.
Some of these methods are old, brought to new life. Some are new. The technology and the kinds of data that we see bring new challenges every day.
Machine Learning is the sum of all above: big data, powerful distributed computers, statistical models. But done by the machines.
Machine Learning as a scientific endeavour grew out of the quest for artificial intelligence (AI). Arthur Samuel defined it in 1959 as the field that gives computers the ability to learn without being explicitly programmed. Tom Mitchell in 1998 defined (simplifying) that a program learns if its performance on a task improves with experience (which is nothing more than data).
Already in the early days of AI as an academic discipline, researchers were interested in having machines learn from data but machine learning, reorganised as a separate field, started to flourish in the 1990s.
It shifted focus away from the symbolic approaches it had inherited from AI and toward methods and models borrowed from statistics but especially benefited from the increasing availability of digitized information.
The wunderkind of machine learning is now deep learning (a powerful set of techniques for learning in neural networks, that are a beautiful biologically inspired programming paradigm) which is dramatically improving the state-of-the-art in speech or visual recognition and many other domains such as genomics.
Deep learning methods discover intricate structure in large data sets by composing simple non-linear modules that each transform the representation at one level into a representation at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions can be learned. The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure.
This is powerful. The conventional option is to hand design good feature extractors, which requires a considerable amount of engineering skill and domain expertise, and many times these carefully crafted algorithms aren’t able to harness the exponential growth in data that’s out there. But this can all be avoided if good features can be learned automatically using a general-purpose learning procedure that can scale to these large data. This is the key advantage of deep learning.
Machine Learning is now everywhere: it’s in the system that decides which online ad you are going to receive, it automatically translates a text, it recognises hand written notes, tag people in pictures, drive autonomous vehicles, recommend your next book or movie, predict the price of next year Bordeaux wine and defeat the top human player of Go.
Soon will also replace the data scientists.