Categories: Analysis

The Hubris of Scale: A Lesson From Junkmail Detection

The first AI product that I worked on was email junkmail detection. While spam detection is focused on catching malicious email, junkmail is an adjacent problem of forgotten newsletters and coupons from marketers. In some ways, this is a harder problem than spam detection — while a spam message is spam for everyone, some users may want that J.C. Penny’s coupon.

Our approach was to train a model per user that would understand that user’s preferences for what email they considered junk. This would be hundreds of millions of ML models. We built a series of complicated infrastructure pieces to experiment and train the base model, and another series of complicated infrastructure pieces to fine-tune models for each user. This fine tuning would happen by the user’s natural behavior of deleting or reading messages, among other signals. Yes, this could be considered reinforcement learning. In sum, we were using RL to train hundreds of millions of models using loads of custom infrastructure and pipelines. Incredibly advanced stuff!

It sucked.

We tried for a long time to improve the results. We would update which signals would be used, run more experiments, and tweak feature weights. There were executive meetings about whether a user opening and then deleting should be considered a positive or negative signal. We burned a lot of compute and a lot more engineering time.

It turns out that the original hypothesis was wrong: no users wanted that J.C. Penny’s coupon in their main inbox view. We ripped out the whole stack and filtered junkmail based on the domain of the sender and similar information. This was cheaper, faster, didn’t require new infrastructure, was more predictable, and had better precision than our fanciest machine learning attempts. It took us years to show an improvement by using ML, and even still it was a small amount of ML on top of the basic heuristics.

Beyond Scale Alone

I do not think that the transformer architecture behind today’s LLMs has hit its final scale wall yet. But I do know that it will! For every ML technique researchers have ever come up with, at some point, scale stops working. The good news, nay, the great news, is that we’ve barely scratched the surface of applying this generation’s models to the problems in our lives.

We do not need another generation of scale to make a significant improvement to gross world product. There are so many places that we can apply today’s AI in our work and personal lives to increase what we can do. I don’t even think we need inference-time compute to do it.

Abram Jackson

Recent Posts

How to Beat AI at Your Job

When AI is better than us at our hardest work tasks, what is left for…

2 weeks ago

Your Agent Has a Brain. Now Give It Hands

An AI agent is a looping LLM decision process that invokes tools to perform digital…

3 weeks ago

10x Your Social Life with AI

Can AI automate friendship? I think AI can remove some of the work so that…

4 weeks ago

The 10x Everything-er: How to Succeed with AI

Beginning a serial adventure in attempting to use AI to become dramatically better at anything.…

1 month ago

AI Has Booted Five Billion-Dollar Features. What’s Next?

These five successful features are copied by all the AI labs. What they have in…

2 months ago

From AI: The New Developer

AI-written version of the previous post on users as the third type of developers.

3 months ago