How to “Smell” AI Images

AI generated image of a woman standing in front of a store named "Concepts". From https://www.realornotquiz.com/

I saw this Real or Not quiz online this morning, and commenters were bragging about their scores of 67% or 80%. Go ahead and take it now to see how you do. I found it very easy; I’ve spent enough time generating images and trying out new tools and techniques that I can almost “smell” when an image is AI-generated. I can’t promise that these tips will work forever, but I believe that they are more durable than most.

Partially Helpful Techniques

What you’ll hear most often – and has since become a meme – is to “count fingers.” This doesn’t work anymore, except for the laziest of image generations. The latest models rarely have anatomy issues, tools like ControlNet and inpainting let prompters fix problems like this, and anything you see online has been cherry-picked to not have this issue.

Another technique you’ll hear about is looking for discontinuities behind occlusions. For example, a tree remains coherent between two windowpanes, or the floor pattern continues behind legs of a chair. AI still has this problem, although less and less each week. This method also requires close attention to detail. Similarly, text remains difficult for AI. This is getting fixed quickly as well; not to mention that anyone trying to deceive you will make sure there isn’t any text to get garbled. The image at the top of the article has ungarbled text, but it is in fact AI-generated.

Instead, here are my best, and I believe the most durable, way to determine whether AI generated an image: the framing and subject of the picture.

Smelling AI

You can tell at a glance whether an image was AI-generated by what is included in the picture and its framing. Most AI images will have a single subject, centered in the image. Anything else in the image will be background, and of a single scene. That scene is generally something like a landscape or a street of buildings. It can sometimes be a crowd, but if so, even the crowd will be fairly uniform. You won’t find odd things in the background on their own. Although this woman has an unusual hand position, the image also has a masked person in the background. It’s a real photo:

Real photo of a woman yelling. From https://www.realornotquiz.com/ — It’s very difficult to prompt that the subject should wear a visor and someone in the background should not but instead have a mask.

The most common AI images have a single, beautiful person right in the center. They’ll have perfect skin, or else uniform in another way like being “perfectly” old. And they’ll be shiny! By default, image generation ends up with very reflective skin, like they are in a photo studio with perfect umbrella lights. This man is either very sweaty or it was AI generated (it was AI generated):

AI-generated image of a man in front of a mural of a bear. From https://www.realornotquiz.com/ — This man is too shiny to be real.

For now, AI also has trouble with people facing away from the camera. A prompter must be extremely intentional to get a subject to face the other direction. This applies to all perspective shots; it’s rare for an AI image to have an unusual perspective. The people in this real image are all facing different directions:

Real photo of people waiting near a boat. From https://www.realornotquiz.com/ — The people are facing away from the camera and have different kinds of bags with them.

Also, take a quick check for anything that’s repeated. If there’s a picture on the wall, see if the picture is of the same type of thing as the primary subject. If the image is of an airplane, there may be more airplanes in the sky behind it. This AI picture of a meal has the same meal in the background:

AI-generated image from https://www.realornotquiz.com/. Close-up of burger and fries. — The same fries appear in the background also.

The key to smelling AI images is to think like a prompter. You can prompt a couple of objects and their positional relationship, a style, and a background. As models improve, AI may be able to put together a few more distinct objects to include, but it will be a long time before they look as complex as real-world photographs.

All images in this post are taken from Microsoft’s Real or Not quiz at https://www.realornotquiz.com/.

Update 8/10/2024

The new Flux model from Black Forest Labs is the talk of the internet, as a much improved model from the Stable Diffusion series.

Words, fingers, and lighting have all gotten quite good – although words are not yet perfect if you zoom in (e.g. on the lanyard). I expect text to get better over time. However, notice that there is still a single subject in this image. The way that prompting and diffusion works, it’s going to remain difficult to make one image of multiple subjects and actions. Regional Prompter and similar allow for separate prompts in different parts of the image, but it will have to improve quite a bit to have realistic scenes.

It comes down to this: an image with a single (or repeated) subject should be suspect.

Tagged in :

deepfakes