The author described how she managed a data science team in her role as VP engineering at a data mining startup.
When you have a team of people working on hard data science problems, the things that work in traditional software don’t always apply. When you are doing research and experiments, the work can be ambiguous, unpredictable, and the results can be hard to measure.
Got this morning the first draft 12 chapters of Prof. Andrew Ng‘s new book titled “Machine Learning Yearning – Technical strategy for AI engineers, in the era of deep learning”.
The book aims to help readers to quickly become better at building AI systems by gaining the practical skills needed when organising a machine learning project.
The book assumes readers are already familiar with machine learning concepts and does not go into technical explanations of how they work.
These first chapters look great, I think this book will help to close the gap between machine learning knowledge and proper execution.
My favourite chapter is the Ninth: Optimizing and satisficing metrics which suggests how to handle the problem of establishing a single-number evaluation metric when the interesting metrics are not compatible:
Suppose you care about both the accuracy and the running time of a learning algorithm.
It seems unnatural to derive a single metric by putting accuracy and running time into a single formula, such as:
Accuracy – 0.5*RunningTime
Here’s what you can do instead:
First, define what is an “acceptable” running time. Let’s say anything that runs in 100ms is acceptable.
Then, maximize accuracy, subject to your classifier meeting the running time criteria.
Here, running time is a “satisficing metric”— your classifier just has to be “good enough” on this metric, in the sense that it should take at most 100ms. Accuracy is the “optimizing metric.”
P.S. I know, satisficing is not a common word and is marked wrong by spelling checkers but it really exists ! I had to look it myself but here is the definition:
To satisfice means to choose or adopt the first option fulfilling all requirements that one comes across (in contrast to look for the optimal one).