Skip to content

John Lam's Blog


Day 2 of fastai class.

The problem with p-values

Ways of determining whether a relationship would happen by chance?

Independent variables Dependent variables

One way of doing this is by simulation.

Another way to do this is by looking at the p-value, i.e., the probability of an observed or more extreme result assuming that the null hypothesis is true. See wikipedia article on p-value

Unfortunately, p-values are not useful - bottom line is that it doesn't say anything about the importance of a result.

See Frank Harrell's work.

p-values are a part of machine learning.

The outcome that we want from our models is whether they predict a practical result or outcome. In Jeremy's critique of the temperature-R relationship paper, he's

Another way to look at this set of problems is through the lens of outcomes that you want from your model. He has a 4 step process. In the example, he's looking at the objective of maximizing the 5 year profits of a hypothetical insurance company. Next he looks at the levers, i.e., what can you control which in the case of insurance it's the price of a policy for an individual. Next he looks at the data that can be collected, i.e., the revenues and claims and the impact those have on the profitability. He then ties the first thee things: objective, levers, and data together in a model which learn how the levers influence the objective. This is all discussed in this article that he wrote in 2012

"Using data to produce actionable outcomes", i.e., don't just make a model to predict crashes, instead have the model optimize the profit.

Deploying the bear detector (black, grizzly, teddy) reminds me of Streamlit. It might be a really interesting exercise to build the bear model from the class and deploy it as a local streamlit app (and perhaps think about what it would take to deploy it to Azure as well).

A great tribute by Steven Sinofsky to the Apple Bicycle for the Mind. It would be great to create a modern poster for this someday.