Skip to content

John Lam's Blog


I went to Best Buy this afternoon and bought an Oculus Quest 2 because I wanted to spend some time this holiday seeing how real Ben Thompson's take on the Metaverse was. I'm pretty amazed at what they were able to cram into a $299 device. To quote:

My personal experience with Workrooms didn’t involve any dancing or fitness; it was simply a conversation with the folks that built Workrooms. The sense of presence, though, was tangible. Voices came from the right place, thanks to Workrooms’ spatial audio, and hand gestures and viewing directions really made it feel like the three of us were in the same room. What was particularly compelling was the way that Workrooms’ virtual reality space seamlessly interfaced with the real world:

People joining a meeting without a headset appear on a TV as if they are video conferencing; it feels completely natural. Here is a photo from Facebook’s PR pack (which honestly, given the size of the group, seems less immersive; my demo had one person on video, and one person in the room with me):

I'm writing this back on my PC after spending about 30 minutes in a Horizon Workroom by myself. So here's a really quick take on what I liked and didn't like about the experience:

The immersive feeling is real. I really like how, with Horizon Remote desktop installed on my PC, I could interact with and type on my PC.

The experience with the whiteboard was ... interesting. I get that this is using $299 worth of consumer-level hardware, so I'm not expecting a whole lot. You can write on the whiteboard using one of the controllers flipped around so that you're using the bottom like a pen. This was ... OK, but clearly not as good as it could be with better hardware. However I am holding my arm out in space vs. bracing it against a real whiteboard so I'm not sure how better hardware can help in this regard.

The latency while using my keyboard and mouse was pretty jarring. Now I totally respect that I can actually do things with this, but this is very early adopter territory. Oculus Remote Desktop is pretty confused with multi-monitor setups though (Aero snap on Windows doesn't do what I would expect), and I need to take off my headset and manually move windows to my primary monitor to get it to work correctly. It is usable though. I wonder if it's any better with USB tethering to my PC?

The resolution is limiting. I get that this will get better over time but it likely needs to be a LOT better. I'm looking at two 27" 4K monitors right now and that's my ideal experience. I would imagine 25MP per eye will get us pretty close to the experience I have already, but that's 7x more pixels than I'm seeing right now at 1832x1920 per eye, which is still quite amazing considering it's a $299 device!

Even though I wouldn't call the Quest heavy, it does have a noticeable heft on my head. I wonder how well this will hold out during a meeting (I have one scheduled for later this week with a friend to see how well this works).

I'd really like the device to do a better job tracking where my hands are and letting me see them as I type on my keyboard. I hit keys like F10/F11 all the time and I can't really touch type those keys as it's quite a reach from the home position (try it yourself to see what I mean). I'd also like to see my mouse as well - I would imagine that this will get much better in the future.

But back to the latency - this is probably the biggest technical issue that I see with the hardware right now. But there's promise here. It feels very Christensen disruptive (if the collaborative room experience is as good as Ben claims).

I'll try it again later tonight. But this does feel like the future. But we're definitely not there right now.


This is a fantastic piece from NYT Opinion that asks a simple question: in the 18 states in the US where Democrats have absolute power, do they live the values espoused by their party? It examines three key issues: affordable housing (California), progressive taxation (Washington), and education (Illinois). The results will be surprising to you. I didn't know that Washington state is the most regressive taxation state in the nation.

We continue to focus on cases now that Omicron is raging across the country. But is this the right metric for us to be looking at? We're saying "OMG, Omicron is setting new daily records". But given that Omicron is much less virulent:

and the recent data from South Africa shows a dramatic decoupling in deaths and cases compared to previous variants:

Why are we continuing to scare people with hyperbolic language like "global dominance"?

It's meaningless to compare case numbers from a more virulent but less transmissible variant to a less virulent but more transmissible variant unless you're just trying to scare people. Perhaps it's time for a better metric - it seems like hospitalizations are a much more reasonable metric to be looking at to a first approximation. Here is Bob Wachter's more reasonable look at hospitalizations (which of course is a lagging metric to cases):

Also, Bob has a hopeful take on Omicron. Hopefully Omicron continues to outcompete Delta and becomes the variant that becomes endemic in the population. Maybe, as Bob suggests, COVID becomes "just like a bad flu" by the Spring. We can only hope.


This is the best explainer of how Airline frequent flyer programs really work. Airlines are effectively sovereign currency issuers with the caveat that not only do they control issuing currency, they also control the only means by which the currency can be redeemed. While this is not entirely true, i.e., you can purchase goods and services by redeeming frequent flyer miles, you'd have to be crazy to do so given the terrible exchange rates that are offered by the credit card issuers.

This diagram does a great job at explaining how Wall Street values airlines and their frequent flyer programs. Effectively operating an airline is a loss- leader for providing frequent flyer miles! Now, there's an arbitrary multiplier applied over EBITDA to come up with the valuation of the frequent flyer programs but this is still a stunning figure:

The video also does a great job at explaining how different forms of arbitrage have been systematically eliminated by the airlines. For example, mile arbitrage (i.e., find a cheap flight that went on a circuitious route that would yield a large number of miles for a low cost) have been eliminated by pegging reward miles to the dollar spend vs. the miles flown. They also do the same thing on the redemption side as well. I used to run an arbitrage on this many years ago when I could fly anywhere at anytime for 25K miles. I would charge customers for the flight at a discount over the current flight cost, e.g., if a flight cost $1200, I would sell for $1000 and redeem 25K miles for $1000 for an unheard-of $.04 per mile. Yes, I would get taxed on the income for that flight but that was still an unheard-of redemption rate for miles.

One of the challenges of building a semantic search engine is splitting the input text into smaller chunks that are suitable for generating embeddings using Transformer models. This thread on the Huggingface forums does a good job at breaking down the problem into smaller pieces. The key insight from lewfun is using a sliding window algorithm over the text in the document. For those who don't know, there is a limit on the number of tokens (roughly words) that can be fed to a model. By using a sliding window algorithm, you block the entire document and can map each block back to the same original document. This way, you can use similarity based models to generate a ranked list that maps back to the original document. This will be the way that I will build the first version of my semantic search engine.


I've been interested in using Machine Learning to extract text reliably from a web page for archival purposes (adding to my personal knowlegebase). So today, I'm collecting some links to prior art in this area for some inspiration:

A Machine Learning Approach to Webpage Content Extraction. This paper uses support vector machines to train a model that uses some specific features in the text block:

  • number of words in this block and thee quotient to its previous block
  • average sentence length in this block and the quotient to its previous block
  • text density in this block and the quotient to its previous block
  • link density in this block

Readability.js. This is an node library that contains the readability library used for the FireFox Reader view. In a web browser, you must directly pass the document object from the browser DOM to the library. If used in a node application, you'll need to use an external DOM library like jsdom. Either way the code is simple:

var article = new Readability(document).parse();

or in the case of a node app:

var { Readability } = require('@mozilla/readability');
var { JSDOM } = require('jsdom');
var doc = new JSDOM("<body>Look at this cat: <img src='./cat.jpg'></body>", {
  url: ""
let reader = new Readability(doc.window.document);
let article = reader.parse();

Ideally I should create an API that accepts a URI as a parameter and returns the parsed document to the caller. Invoking this from a Chrome extension should make it very straightforward to "clip" a web page into a personal knowledgebase.

Large language models like GPT-3 show real promise in this area as well. In this article, the authors use GPT-3 to answer questions based on text extracted from a document. Starting at 2:47 in the video below is a great demo of this working


Day 3 of fastai. The book arrived today and I'm following along from the book directly. A combination of reading the paper book and using the Jupyter notebooks found in the GitHub book repo are a potent combination. Reminder that you can run the book repo in just a single step using ez and VS Code.

The Universal Approximation Theorem undergirds the theoretical basis for neural networks being able to compute arbitrary functions. The two parts of the approximation theorem look at the limits of a single layer with an arbitrary number of neurons ("arbitrary width") and the limits of a network with an arbitrary number of hidden layers ("arbitrary depth").


Day 2 of fastai class.

The problem with p-values

Ways of determining whether a relationship would happen by chance?

Independent variables Dependent variables

One way of doing this is by simulation.

Another way to do this is by looking at the p-value, i.e., the probability of an observed or more extreme result assuming that the null hypothesis is true. See wikipedia article on p-value

Unfortunately, p-values are not useful - bottom line is that it doesn't say anything about the importance of a result.

See Frank Harrell's work.

p-values are a part of machine learning.

The outcome that we want from our models is whether they predict a practical result or outcome. In Jeremy's critique of the temperature-R relationship paper, he's

Another way to look at this set of problems is through the lens of outcomes that you want from your model. He has a 4 step process. In the example, he's looking at the objective of maximizing the 5 year profits of a hypothetical insurance company. Next he looks at the levers, i.e., what can you control which in the case of insurance it's the price of a policy for an individual. Next he looks at the data that can be collected, i.e., the revenues and claims and the impact those have on the profitability. He then ties the first thee things: objective, levers, and data together in a model which learn how the levers influence the objective. This is all discussed in this article that he wrote in 2012

"Using data to produce actionable outcomes", i.e., don't just make a model to predict crashes, instead have the model optimize the profit.

Deploying the bear detector (black, grizzly, teddy) reminds me of Streamlit. It might be a really interesting exercise to build the bear model from the class and deploy it as a local streamlit app (and perhaps think about what it would take to deploy it to Azure as well).

A great tribute by Steven Sinofsky to the Apple Bicycle for the Mind. It would be great to create a modern poster for this someday.


Today, I'm starting to work on the 4th edition of the fastai course, and I'll be note-taking on this blog. At the start Jeremy shows this lovely photo of the Mark 1 perceptron at Cornell circa 1961. It does a great job at showing the complexity of the connections in a neural network:

Some housekeeping things that are pretty cool that intersects with my day job at Microsoft building tools for data scientists.

I'm using my ez to run the fastai notebooks locally on my Windows machine which has an RTX2080 GPU. All I need to do is run a single command:

$ ez env go [email protected]:jflam/fastai -n .

You'll notice that I'm using my own fork of the fastai repo which contains only a aingle configuration file: ez.json that ez uses to build and run the image in VS Code. This is the entire contents of the ez.json file:

    "requires_gpu": "true",
    "base_container_image": "fastdotai/fastai:latest"

After running that command, this is what ez created on my machine:

It's a fully functional running Docker container with GPU support enabled and VS Code is bound to it using the VS Code Remote Containers extension. There's a fair amount of manual steps that would have been needed to get this running and ez eliminates the need to do any of that - you get straight to the course in a running local environment.

I'm also using GitHub Codespaces to view the notebooks for the book version of the course. All you need to do is go to the fastbook GitHub repo and press the . key to open up Codespaces in the browser to view the contents of the notebook:

Of course, if you are able to, please support the authors by purchasing a copy of the book. {% end %}

Jerermy introduces his pedagogy for the class at the start, based on the work of David Perkins. I love this image:

Begin by "building state-of-the-art world-class models" as your Hello, World.

Jeremy and Sylvain wrote a paper on the fastai library which is the layer of software on top of PyTorch that is used in the course.

While following along in the first video of the course, I realized that Jeremy is running some of the cells in the notebooks from the fastbook repo. So I forked and copied the ez.json file into that repo and was quickly able to reproduce his results using ez:

ez env go -g [email protected]/jflam/fastbook -n .

A quick note - the -n . parameter tells ez to run the repo locally. ez also supports running on Azure VMs using the same command. See the docs in the ez for more details on the setup on Azure (it's only 2 additional commands!)

This is a screenshot from my computer this morning after running the first model in the course. You can see that I'm using the Outline feature in VS Code notebooks to see an outline of the different sections from the first chapter of the book:


  • classification model predicts one of a number of discrete possibilities, e.g., "dog" or "cat
  • regression models predict a numeric (continuous?) quantity

valid_pct=0.2 in the code means that it holds back 20% of the data to validate the accuracy of the model. This is also the default in case you forget to set it

a learner contains your data, your architecture and also a metric to optimize for:

learn = cnn_learner(dls, resnet34, metrics=error_rate)

An epoch is looking at every item in the dataset once

accuracy is another function which is define as 1.0 - error_rate

error_rate and accuracy are not good loss functions which is a bit counter-intuitive as those are the values that humans care about, but it turns out that they are poor loss functions which are used to tune the parameters in the model across epochs

fastai uses the validation set to determine the accuracy or error rate of a model

Remember that a model is parameters + architecture

Training, validation, test datasets - this is used for things like Kaggle

Actual performance is withheld from models so that you can avoid the overfitting problem as well

In a time series you can't really create a validation dataset by random sampling, instead you need to chop off the end, since that's really the goal - to predict the future, not make predictions at random points in the past. We need to have different sampling algorithms based on the nature of the data and the predictions desired.

Discusses a case about loss functions vs. metrics. One way to think about overfitting is a case where your model keeps getting better at making predictions in the training dataset, but starts getting worse against the validation dataset. This can be an indication of overfitting. Jeremy cautions that this is different from changes in the loss function however, and he will get into the mathematics behind this later when he discusses loss functions in more detail.

Definition: transfer learning is using a pretrained model for a task different from the one it was originally trained for. The fine_tune() method called that because it is doing transfer learning. In the examples earlier with resnet34, it was performing transfer learning against the model to get the superior performance. It lets you use less data and less compute to accomplish our goals.

Zeiler and Fergus published a paper in 2013 Visualizing and Understanding Convolutional Networks. It showed how different layers in the network recognize initially simple patterns and then become more specialized in later layers. I think this is done by activations of different filters against an image, so that you can see the parts of the image that a filter gets activated against. This paper gives a good mental model for thinking about how filters can be generalized and how transfer learning can take advantage of filters in earlier layers.

Sound detection can work by turning sounds into pictures and using CNNs to classify them:

Here's a really cool example of detecting fraudulent activity by looking at traces of mouse movements and clicks and turning them into pictures (done by a fastai student at Splunk - blog that announced this result

What happens when you fine tune an existing model - does it perform worse on detecting things that it used to do before the fine tuning dataset? In the literature this is called catastrophic forgetting or catastrophic interference. To mitigate this problem you need to continue to provide data for other categories that you want detected during the fine tuning (transfer learning) stage.

When looking for pretrained model, you can search for model zoo or pretrained models.

He has a number of different categories: vision, text, tabular, recommendation systems.

Recommendation systems == collaborative filtering

Recommendation != Prediction


Tim O'Reilly, one of our elder statesmen of the web, has written a great analysis article on Web3. It is well worth reading the post in its entirety, as he does a really good job at constructing arguments without being confrontational in his reasoning. he does this by asking questions without presuming what the answers are. This is probably the most balanced account of Web3 that I've read so far and well worth your time. #

I've been thinking a lot about the parallels between liquid chromatography and espresso making. I found this article that delves into both the chemistry and physics of espresso brewing. #

While looking around for a project for the holidays, I've started thinking about continuing to build my personal semantic search engine. The core idea is to make a tool that makes it easier to remember and recall things that are interesting to me. Part of that is searching my private data for things that are interesting. I've already made pretty good progress in August on this. The other part is building a browser extension that makes it easy to tag and take notes on things that I'm reading and add those things to the index that my search engine operates over. That feels like a good task for the holidays. #

I've also been interested in autonomous agents for helping to manage beer mode information. My gut tells me that these things likely won't help in the long run, but they are nevertheless interesting to me. There are two tools that I came across tonight:

  1. mailbrew which is a service for delivering news culled by agents that you configure into an email that shows up in your inbox. This is pretty interesting as a tool as it lets you aggregate different pieces of information into a personal newsletter. It's $5/month which is also pretty reasonable.
  2. huginn which is named after the crows Huginn and Muninn who sat on Odin's shoulders and told him the news of the world. This is kind of like a DIY mailbrew where all the information sits on a server that you get to run it on. It's a DAG of agents (all written in Ruby) that you can configure to do virtually anything. It also runs as a Docker container to save you the trouble of setup. This feels like a lot of work compared to mailbrew.


I found a way to split an MP3 into smaller files automatically using ffmpeg. This is also the first time that I've ever used ffmpeg before and it did a fantastic job on this task.

$ ffmpeg -i somefile.mp3 -f segment -segment_time 3 -c copy out%03d.mp3

source #

Perhaps the greatest productivity hack ever created is News Feed Eradicator. I use this for Twitter so that I still have the ability to read specific tweets, e.g., they were linked from somewhere else or I can look up a specific user. But the algorithmic feed is gone. It's lovely. #


There's long been an argument by crypto enthusiasts that we need crypto to fight against the dastardly fees charged by Western Union and the like in the 3rd world. In this post by Patrick McKenzie (aka patio11) More than you want to know about gift cards it seems like there's a strong argument to be made for using gift cards to work around the fees charged by Western Union?

In this regard it is not merely important that they look attractive in a birthday card but also that they’re available for cash everywhere, require no identification or ongoing banking relationship to purchase, do not charge a fee like e.g. Western Union, and can be conveyed over a text message or phone call. They're not worse cash, they're better Tide in the informal economy.


This morning on HN I found this course on Natural Language Processing for Semantic Search by a startup called pinecone. This is my current area of interest, which is why I created a simple wine semantic search engine a while ago to explore this area. Taking a look at a couple of chapters it definitely looks interesting and worth a longer look over the holidays. #

There's another post by someone who is trying to build a news site that is kind of like the original Yahoo aggregator, but with the twist of having sagas which let you follow a story as it progresses, e.g., salacious news like the Theranos trial which unfolds over a long period of time. It looks like it is curated by the poster though. I would love to combine the idea of sagas with some kind of AI filter that is trained on my interests to pull tweets and news articles into a personal feed for my own consumption. This way it is aligned with my interests vs. the interests of the aggregator. #

I listened to Professor Christensen on this podcast the year it came out (2004!) and it left an indelible impression on me. Sadly, it looks like IT Conversations no longer exists, and I found this archive of the page created by the awesome folks at I also copied it to this part 1 and part 2 so that I can find it again - just in case. I highly recommend listening to this; the stories that Christensen tells about his conversations with Andy Grove are wonderful and do a great job at driving home the concepts of his theory of disruption. RIP.



This is an interesting take on the metaverse that I haven't seen before:

The idea that (metaverse : digital) is like (singularity : AI) is certainly a possibility. As Shaan (with a healthy dose of unhelpful crypto speak) correctly says, we've been on a rapid trend towards more a life in a virtual world that is more detached from our physical world thanks to ever improving technology.

Where I disagree with Shaan's tweet is how long this has beeng going on. It's been going on for much longer than 20 years; from the creation of the printing press, we have been on a path to ever increasing amounts of media/digital/online in our life. We have been spending more of our time in front of some other piece of technology and less time in the "real world". When you are reading a book, watching TV, playing a video game, or wearing a VR headset, you aren't in the "real world" - you can be doing those activities from anywhere - your physical environment doesn't matter. You're immersed in these experiences.

At what point does the value of the virtual world become greater to us than the value of our physical world? To some extent, the pandemic has started pushing our work to be more online work and it's not a huge leap to imagine that we are moving more towards a world where the experience of being in an online meeting in the metaverse is better than the experience of being in a real-life meeting.

Perhaps Ben Thompson is right - the metaverse-as-a-place will start in businesses who will buy this expensive technology for their employees, much like how the PC revolution started. It has the characteristics of disruption; it is worse on some dimension (e.g., the fidelity of the experience) that the mainstream cares about but better on some dimension that the early adopters care about (e.g., you don't need to live close to an office to go to work) - and it's on a steeper slope of improvement.

I know I'll be watching this area closely and learning. It's tempting to want to dismiss this because of the dystopian takes on this technology. But that's not an excuse to ignore it or try to block it. Technology is chaotic neutral and can't be un-invented. It's up to all of us to create a better experience for ourselves using it.