Unraveling the Mechanism Behind Negative Prompt Usage

Negative prompt is an additional way to nudge Stable Diffusion to give you what you want. Unlike inpainting which requires drawing a mask, you can use negative prompt with all the convenience of text input. In fact, some images can only be generated by using negative prompts.

In this article, we will go over a simple example of using negative prompt. Then I will explain how negative prompt works in Stable Diffusion.

This is the first part of the two-part series on negative prompt. See the second part: How to use negative prompts for guidelines on building good negative prompts.

A simple example

Positive prompt only

Let’s try generating some images of man. That’s right, we are going into an uncharted territory here… I am using Stable Diffusion v1.5 with prompt

Portrait photo of a man.

Prompt: Portrait photo of a man.

OK, we got what we expected. No surprise. Although these men look a bit too serious. Let’s try removing their mustaches to lighten them up. Let’s try the prompt

Portrait photo of a man without mustache.

image generated with positive prompt only.
Prompt: Portrait photo of a man without mustache.

We got a problem here. We get even more prominent mustaches! What’s going on? The culprit is likely the failure of cross attention to associate “without” and “mustache”. Stable Diffusion understood the prompt as “man” and “mustache”. That’s why you see both of them.

Positive and negative prompts

So what can we do to generate men without mustache? Is this something Stable Diffusion cannot do? The answer is using negative prompts. If we use the prompt

Portrait photo of a man.

together with the negative prompt

mustache

We can finally generate some men without mustache! You will get similar results using v2 models.

Images generated with negative prompt.
Prompt: Portrait photo of a man.
Negative prompt: mustache.

This example demonstrates a principle of using negative prompt:

If you see something you don’t want, put it in negative prompt.

How does negative prompt work?

Recall in text-to-image conditioning, prompt is converted to embedding vectors which is in turn fed to the U-Net noise predictor. Well, that’s not the whole story. (Sorry this have happened so many times…) There are actually two sets of embedding vectors, one for the positive prompt and the other for the negative prompt.

The positive and negative prompts are in equal footing. They both have 77 tokens. You can always use one with or without the other.

Negative prompt is implemented in samplers, the algorithm responsible for implementing the reverse diffusion. To understand how negative prompt works, we will first need to understand how sampling works without using negative prompt.

Sampling without negative prompt

In a sampling step in Stable Diffusion, the algorithm first denoises the image a little bit with conditional sampling, guided by the text prompt. The sampler then denoises the same image a little bit with unconditional sampling. That is totally unguided as if you don’t use a text prompt. Note that it would still diffuse towards a decent image, like a basketball or a wineglass below, but it could be anything. The diffusion step that’s actually done is the difference between the conditional and unconditional samplings. This process is repeated for the number of sampling steps.

Sampling steps in Stable Diffusion WITHOUT negative prompt.
Without negative prompt, a diffusion step is a step towards the prompt and away from random images.

Sampling with negative prompt

Negative prompt is implemented by hijacking the unconditional sampling. Instead of using an empty prompt which generate random images, a negative prompt is used.

Sampling steps in Stable Diffusion WITH negative prompt.
When using negative prompt, a diffusion step is a step towards the positive prompt and away from the negative prompt.

Technically, positive prompt steers the diffusion towards the images associated with it, while negative prompt steers the diffusion away from it. Note that the diffusion in Stable Diffusion happens in latent space, not images. The above figures in image space are for illustration process only. See this great write up if you are interested in how it is implemented in code level.

Sampling space

Let’s consider the following illustration of sampling space. When we use the prompt “Portrait photo of a man”, Stable Diffusion samples images from the whole latent space of all men, with and without mustache. You should get images of men with and without it.

Space of all images of men.

When the negative prompt “mustache” is added, the “Men with mustache” space is excluded. Effectively, we are sampling images from men without mustache.

Summary

I hope this article gives you a good overview of what negative prompt is and how it works.

Negative prompt removes objects or styles in a way that may not be possible by tinkering with positive prompt alone. It works by hijacking the unconditional sampling in each sampling step, so that the diffusion steers away from what’s described in the negative prompt.

Head to the second part: How to use negative prompt if you want to know how to use them.

Leave a Reply

Your email address will not be published. Required fields are marked *