How Data Scientists Fail
What can go wrong in a data science task
I am going to job interviews, again. This time, a frequent request is: “Tell us about a failed project”. Of course, I never fail as a data scientist, how could I? A data science task involves a combination of domain knowledge and data, neither is held or produced by me, and a question someone else wants an answer to. All I do as a data scientist is encoding the domain knowledge as a model, updating the model’s latent variables based on the data, and computing a quantitative answer to the question. There are ways to ensure adequacy of the model, check convergence of inference, and express uncertainty of the answer. Just doing all these steps by the book ensures that there is absolutely no way to fail. Consider the task of classifying hand-written digits — although different models may have different accuracy, there is no way to ‘fail’ as long as one does things as taught. Or is there?
Read More →