Signal-to-Noise: Crafting Better Content with AI

If you’ve ever stared at an old TV test pattern, you already understand the difference between signal and noise in AI.

Large Language Models are well-named. They’re a model in the sense of being a simplified representation of an object. They are a model of all language itself. And of course they are large, with trillions of parameters. They are a large model of language!

Consider a model of a human, such as a crash test dummy. In some ways, a crash test dummy is better-than-human. For instance, they are much more precise about the amount of pressure put. On the other hand, this model is faceless and personality-less. No one will accuse a crash test dummy of being too personalized!

Models are intended to be the average version of the object they represent. Language Models are the average version of language as crash test dummies resemble average humans. What do I mean? I mean that unless you’re careful, the output you get from an LLM will be average. This is like white noise, the old analog television static.

It doesn’t have to be that way.

You’ll get incredible results from AI when you provide sufficient signal. Do this through detailed prompts, lengthy context, and verifiable results.

The value of long prompts

This week, I implemented an entire feature of Microsoft 365 Copilot in one prompt. But it was a big prompt! It was about 6,000 characters, near my recommended length for agent instructions. It probably took me 20 minutes of focus to write.

In this long prompt, I included my knowledge of architecture. My prompt included the design document of an interface I would leverage (another several thousand characters). I described a testing strategy. Every bit of the prompt was crammed full of details from my knowledge and context.

In effect, I had synthesized massive amounts of information gleaned from meetings, documents, chats, emails, and hallway conversations into that prompt. I distilled everything I knew into a complete feature definition that no one else in the world could have done at that moment.

This was a useful prompt because of that context synthesis. A simple prompt could possibly have gotten partial success, but the result would never pass design review. In this case, the LLM could not output the average language because I had moved it so far in embedding space with the prompt.

But the “prompt” doesn’t necessarily need to come from you.

Signal from searching and executing

Another source of signal for the AI is what it can gather on its own. For the code feature that I implemented, the LLM applied reasoning and code execution. It iterated hundreds of times, testing out the code it had written. It gained new signal by seeing what the code did when it ran.

There’s a similar case in AI-assisted research. An AI using tools to search websites results in a large amount of useful information. The AI isn’t actually expanding your prompt; it is summarizing the data it retrieves from the internet. And again, unless your research question was a detailed prompt, you’ll end up with a generic research report. Don’t be surprised if you get a lot of content that doesn’t answer your real question.

Even more strongly, consider how AI has been solving breakthrough mathematics problems. Here, the combination of reasoning and executing (through the Lean proof assistant) helps AI make real breakthroughs. It’s too bad that not very much of life is like abstract mathematics problems. Perhaps one day we will learn to “execute” verifiable PowerPoint documents, but we aren’t there yet.

The folly of long posts from short prompts

Long prompts can have great results, but attempting the opposite is awful. Asking AI to write without detailed prompts, context, and verification has bad outcomes. You see this all over social media:

comments that slightly restate the original post
extremely generic takes
overuse of AI-isms

If you spend a lot of time online, you’ll get as annoyed by these as I do. But there’s a worse sin: AI-written long documents.

A short comment with little value on LinkedIn is easy to glance over. It’s not so easy to ignore a 30-page document from your coworker about a new strategy. Most of the document will be wrong, and correcting it will cost you more time than they spent writing it. They’ve given you the hard part.

I’m afraid the best thing to do when handed 30 pages of AI text is pretend you didn’t see it. Any substance of the document came either from the prompt; the rest is the model’s generic filler. The prompt is the signal, and it doesn’t get better by adding noise.

More Signal; Less Noise

There’s a simple principle behind all these scenarios. Think about the Signal-to-Noise Ratio (SNR). For your task, where is the AI getting signal? Is there enough signal to produce the length of content you are asking for, or will the AI fill the rest with noise?

It’s easy to see this in the tasks where AI can be a great benefit vs. where it wastes time and tokens.

Naturally High Signal

Analyzing existing information
Writing boilerplate
Feedback and editing
Answering questions and explaining
Translating between formats

Only Your Prompt and Noise

Image generation
Writing strategy documents
Authoring fiction
Generating PowerPoint
Social media bots

Tasks that naturally have their own signal are always a great idea to assign to AI. If the signal is only based on your prompt, your prompt should be about as long as the intended result.

Build your skills at providing signal to AI. This is how you’ll succeed into the future.

The value of long prompts

Signal from searching and executing

The folly of long posts from short prompts

More Signal; Less Noise

Author Profile

Abram Jackson

Latest Posts

Trace Elements

The Three Great Liberations of Software

AI is a Bicycle, Not a Cyclist

Is AI Just a Fancy Autocomplete?

Categories

Tags