Today, I'm starting to work on the 4th edition of the
fastai course, and I'll be note-taking on
this blog. At the start Jeremy shows this lovely photo of the Mark 1
perceptron at Cornell circa 1961. It does a great job at showing the
complexity of the connections in a neural network:
Some housekeeping things that are pretty cool that intersects with my day job
at Microsoft building tools for data scientists.
I'm using my ez to run the fastai
notebooks
locally on my Windows machine which has an RTX2080 GPU. All I need to do is
run a single command:
$ ez env go [email protected]:jflam/fastai -n .
You'll notice that I'm using my own fork of
the fastai
repo which contains only a aingle configuration file: ez.json
that ez
uses to build and run the image in VS Code.
This is the entire contents of the ez.json
file:
{
"requires_gpu": "true",
"base_container_image": "fastdotai/fastai:latest"
}
After running that command, this is what ez
created on my machine:
It's a fully functional running Docker container with GPU support enabled and
VS Code is bound to it using the VS Code Remote Containers extension. There's
a fair amount of manual steps that would have been needed to get this running
and ez
eliminates the need to do any of that - you get straight to the
course in a running local environment.
I'm also using GitHub Codespaces to view the notebooks for the book version of
the course. All you need to do is go to the
fastbook GitHub repo and press the .
key to open up Codespaces in the browser to view the contents of the notebook:
Of course, if you are able to, please support the authors by
purchasing
a copy of the book.
{% end %}
Jerermy introduces his pedagogy for the class at the start, based on the work
of David Perkins. I love this image:
Begin by "building state-of-the-art world-class models" as your Hello, World.
Jeremy and Sylvain wrote a paper on the fastai
library which is the layer of software on
top of PyTorch that is used in the course.
While following along in the first video of the course, I realized that
Jeremy is running some of the cells in the notebooks from the fastbook repo. So I forked and copied the ez.json
file into that
repo and was quickly able to reproduce his results using ez
:
ez env go -g [email protected]/jflam/fastbook -n .
A quick note - the -n .
parameter tells ez
to run the repo locally. ez
also supports running on Azure VMs using the same command. See the docs in the
ez for more details on the setup on Azure (it's
only 2 additional commands!)
This is a screenshot from my computer this morning after running the first
model in the course. You can see that I'm using the Outline feature in VS Code
notebooks to see an outline of the different sections from the first chapter
of the book:
Terminology:
- classification model predicts one of a number of discrete possibilities,
e.g., "dog" or "cat
- regression models predict a numeric (continuous?) quantity
valid_pct=0.2
in the code means that it holds back 20% of the data to
validate the accuracy of the model. This is also the default in case you
forget to set it
a learner contains your data, your architecture and also a metric to
optimize for:
learn = cnn_learner(dls, resnet34, metrics=error_rate)
An epoch
is looking at every item in the dataset once
accuracy
is another function which is define as 1.0 - error_rate
error_rate and accuracy are not good loss functions which is a bit
counter-intuitive as those are the values that humans care about, but it turns
out that they are poor loss functions which are used to tune the parameters in
the model across epochs
fastai uses the validation set to determine the accuracy or error rate of a
model
Remember that a model is parameters + architecture
Training, validation, test datasets - this is used for things like Kaggle
Actual performance is withheld from models so that you can avoid the
overfitting problem as well
In a time series you can't really create a validation dataset by random
sampling, instead you need to chop off the end, since that's really the goal -
to predict the future, not make predictions at random points in the past. We
need to have different sampling algorithms based on the nature of the data and
the predictions desired.
Discusses a case about loss functions vs. metrics. One way to think about
overfitting is a case where your model keeps getting better at making
predictions in the training dataset, but starts getting worse against the
validation dataset. This can be an indication of overfitting. Jeremy cautions
that this is different from changes in the loss function however, and he will
get into the mathematics behind this later when he discusses loss functions in
more detail.
Definition: transfer learning is using a pretrained model for a task
different from the one it was originally trained for. The fine_tune()
method
called that because it is doing transfer learning. In the examples earlier
with resnet34, it was performing transfer learning against the model to get
the superior performance. It lets you use less data and less compute to
accomplish our goals.
Zeiler and Fergus published a paper in 2013 Visualizing and Understanding
Convolutional Networks. It showed how
different layers in the network recognize initially simple patterns and then
become more specialized in later layers. I think this is done by activations
of different filters against an image, so that you can see the parts of the
image that a filter gets activated against. This paper gives a good mental
model for thinking about how filters can be generalized and how transfer
learning can take advantage of filters in earlier layers.
Sound detection can work by turning sounds into pictures and using CNNs to
classify them:
Here's a really cool example of detecting fraudulent activity by looking at
traces of mouse movements and clicks and turning them into pictures (done by a
fastai student at Splunk - blog that announced this
result
What happens when you fine tune an existing model - does it perform worse on
detecting things that it used to do before the fine tuning dataset? In the
literature this is called catastrophic forgetting or catastrophic
interference. To
mitigate this problem you need to continue to provide data for other
categories that you want detected during the fine tuning (transfer learning)
stage.
When looking for pretrained model, you can search for model zoo
or
pretrained models
.
He has a number of different categories: vision, text, tabular, recommendation
systems.
Recommendation systems == collaborative filtering
Recommendation != Prediction