## How Data Scientists Fail

### What can go wrong in a data science task

I am going to job interviews, again. This time, a frequent request is: “Tell us about a failed project”. Of course, I never fail as a data scientist, how could I? A data science task involves a combination of domain knowledge and data, neither is held or produced by me, and a question someone else wants an answer to. All I do as a data scientist is encoding the domain knowledge as a model, updating the model’s latent variables based on the data, and computing a quantitative answer to the question. There are ways to ensure adequacy of the model, check convergence of inference, and express uncertainty of the answer. Just doing all these steps by the book ensures that there is absolutely no way to fail. Consider the task of classifying hand-written digits — although different models may have different accuracy, there is no way to ‘fail’ as long as one does things as taught. Or is there?

## Double Speed Replay

### Why students watch lectures offline at 2x speed

Thanks to the plague, we teach over Zoom, and have our lectures recorded. Many students do not attend in real time and instead replay the recordings at their convenience, and at 2x speed. It is easy to label the students as superficial, but double speed replay has a perfectly valid though slightly embarrassing, for us the teachers, justification. When I was trained in public speaking, I was taught this basic technique for preparing a time-framed lecture:

arXiv | code The ultimate Bayesian approach to learning from data is embodied by hierarchical models. In a hierarchical model, each observation or a group of observations $y_i$ corresponding to a single item in the data set is conditioned on a parameter $\theta_i$, and all parameters are conditioned on a hyperparameter $\tau$: \begin{aligned} \tau & \sim H \\ \theta_i & \sim D(\tau) \\ y_i & \sim F(\theta_i) \end{aligned} \label{eqn:hier}