Stable diffusion 2.1 was released on Dec 7, 2022.
Those who have used 2.0 has been scratching their head on how to make the most out of it. While we see some good images here or there, most of us went back to v1.5 for their business.
See step-by-step guide for installing AUTOMATIC1111 on Windows.
The difficulty was in part caused by (1) using a new language model that is trained from scratch, and (2) the training dataset was heavily censored with NSFW filter.
The second part would have been fine, but the filter was quite inclusive and has removed substantial amount of good-quality data. 2.1 promised to bring them back.
This tutorial will cover installing and using 2.1 models in AUTOMATIC1111 GUI, so that you can make your judgement by using it.
2.1 models variants
There are two text-to-image models available:
- 2.1 base model: Default image size is 512×512 pixels
- 2.1 model: Default image size is 768×768 pixels
The 768 model is capable of generating larger images. You can set the image size to 768×768 without worrying about the infamous two heads issue.
This is especially useful for generating larger scene with small characters. The faces can be generated a bit clearer than the 512 model, increasing the chance of success of the downstream upscaling and face restoration.
The downside of the 768 model is it takes longer to generate images. The larger images may limit the batch size, depending on how much VRAM your GPU has.
Install base software
We will go through how to use Stable Diffusion 2.0 in AUTOMATIC1111 GUI.
This GUI can be installed quite easily in Windows systems, or follow the installation instruction on your respective environment. Ideally, you should have a dedicated GPU card with at least 6GB VRAM.
If you have already have this GUI, make sure it is up-to-date by running the following command in terminal under its installation location (stable-diffusion-webui
folder).
git pull
Download Stable diffusion 2.1 model
2.1 base model (512-base)
- Download the model file (v2-1_512-ema-pruned.ckpt)
https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt
2. Download the config file, and rename it to v2-1_512-ema-pruned.yaml
https://raw.githubusercontent.com/Stability-AI/stablediffusion/main/configs/stable-diffusion/v2-inference.yaml
Put both of them in the model directory:
stable-diffusion-webui/models/Stable-diffusion
2.1 model (768)
- Download the model file (v2-1_768-ema-pruned.ckpt)
https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned.ckpt
2. Download the config file, rename it to v2-1_768-ema-pruned.yaml
https://raw.githubusercontent.com/Stability-AI/stablediffusion/main/configs/stable-diffusion/v2-inference.yaml
Put both of them in the model directory:
stable-diffusion-webui/models/Stable-diffusion
How to use 2.1 model
To use the 768 version of Stable Diffusion 2.1 model, select v2-1_768-ema-pruned.ckpt
in the Stable Diffusion checkpoint dropdown menu on top left.

The model is designed to generate 768×768 images. So set the image width and/or height to 768 to get the best result.
To use the base model, select v2-1_512-ema-pruned.ckpt
instead.
Troubleshooting
Something you can try if your install doesn’t work.
- See if your AUTOMATIC1111 GUI is outdated. In terminal, use the command
git pull
under thestable-diffusion-webui
directory and restart the GUI. - Check if the yaml file is downloaded correctly. Its content should be a simple text file, not with HTML tags.
- Check if the yaml file is correctly renamed as described in the previous section.
- If 2.0 or 2.1 is generating black images, enable full precision with startup arguments
--no-half
or--xformers
optimization.
Tips for using 2.1
I definitely think 2.1 is an improvement over 2.0. The images look better and require less effort in engineering the prompt.
So I am deleting my 2.0 models.
Here are some tips I found using when using 2.1.
Tip 1: Write more
Similar to 2.0, prompt needs to be very specific and detailed to get the image you want. Unlike v1 models, simple prompt usually won’t go well with 2.1.
Tip 2: Use negative prompt
Many have already found that negative prompt is very important for v2 models. I would suggest to keep a boilerplate negative prompt for portraits where many things can go wrong. In fact, Stability uses
cropped, lowres, poorly drawn face, out of frame, poorly drawn hands, blurry, bad art, blurred, text, watermark, disfigured, deformed, closed eyes
in a demo image in their press release.
Tip 3: Use correct image size
Finally, set the correct image size. Set at least one side to be 512 px for 512-base model and 768 px for the 768 model.
Have fun with 2.1!