AI Image upscaler like ESRGAN is an indispensable tool to improve the quality of images generated by Stable Diffusion. In fact, it is so commonly used that many Stable Diffusion GUI has built-in support for it.
Here, we will learn about what image upscalers are, how they work, and how to use them.
Why do we need an image upscaler?
The default image size of Stable Diffusion v1 is 512×512 pixels. This is pretty low in today’s standard. Let’s take iPhone 12 as an example. It’s camera produces 12 MP images – that is 4,032 × 3,024 pixels. Its screen displays 2,532 x 1,170 pixels so an unscaled Stable Diffusion image would need to be enlarged and look low quality.
To complicate the matter, a complex scene generated by Stable Diffusion is often not as sharp as it should be. It often struggles with fine details.
Why can’t we use a traditional upscaler?
You can, but the result won’t be as good.
Traditional algorithms for resizing images, such as the nearest neighbor interpolation and Lanczos interpolation, have been criticized for using only pixel values of the image. They enlarge the canvas and fill in the new pixels by performing mathematical operations using only the image’s pixel values. However, if the image itself is corrupted or distorted, there’s no way for these algorithms to fill in missing information accurately.
How does AI upscaler work?
In contrast, AI upscalers are models trained with massive amounts of data.
Good-quality images are first artificially corrupted to emulate real-world degradation. The degraded images are then reduced to a smaller size. A neural network model is then trained to recover the original images.
A massive amount of prior knowledge is embedded into the model. It is capable of filling in the missing information. It’s like humans don’t need to study a person’s face in great detail to remember it. We mainly pay attention to a few key features.
Below is an example of comparing the traditional (Lanczos) and AI (R-ESRGAN) upscaler. Because of the knowledge embedded in the AI upscaler, it can upscale the image and recover the details simultaneously.
How to use AI upscaler for Stable Diffusion?
We will go through how to use an AI upscaler using AUTOMATIC1111 GUI for Stable Diffusion.
See my Quick Start Guide for setting up AUTOMATIC1111 GUI.
Go to the Extras tab (I know the name is confusing), and select Single Image.
Upload the image you want to upscale to the source canvas.
Set the Resize factor. Many AI upscaler is default to upscaling 4 times, so 4 is a fine choice. Set it to a lower value, like 2, if you don’t want the image to be that big.
If your image is 512×512 pixels, resizing 2x is 1024×1024 pixels, and 4x is 2048×2048 pixels.
Select R-ESRGAN 4x+, an AI upscaler that works for most images.
Press Generate to start upscaling.
When it is done, the upscaled image will appear in the output window on the right. Right-click on the image to save.
AI upscaler options
I will go through a few notable options.
Latent Diffusion Super Resolution (LDSR) upscaler was initially released along with Stable Diffusion 1.4. It is a latent diffusion model trained to perform upscaling tasks.
Although delivering superior quality, it is extremely slow. I won’t recommend it.
Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) is an upscaling network that has won the 2018 Perceptual Image Restoration and Manipulation challenge. It is an enhancement to the previous SRGAN model.
It tends to retain fine details and produce crisp and sharp images.
The Real-ESRGAN (R-ESRGAN) is an enhancement to ESRGAN and can restore a variety of real-world images. It models various degrees of distortion from the camera lens and digital compression.
Compared to ESRGAN, it tends to produce smoother images.
R-ESRGAN performs best with realistic photo images.
There’s a good comparison in this post to check out other options.
R-ESRAGN is a good choice for photographs or realistic paintings. Anime images require upscalers specifically trained for recovering animes.
Visit Upscaler model database to download other upscalers.
Installing new upscaler
To install a new upscaler in AUTOMATIC1111 GUI, download a model from the upscaler model database and put it in the folder
Restart the GUI. Your upscaler should now be available for selection. Below is what you should see after installing the Universal Upscaler V2.
The following models are good general-purpose upscalers.
- Universal Upscaler v2
- NMKD Siax
Example of upscaled images
Below is an example of a complex scene upscaled using R-ESRGAN. Enlarge and switch between them to observe the difference. Compare them on computer and cell phone screens to see the difference.
Enhancing details with SD upscale
Using an upscaler alone is not ideal. If you have stable diffusion in hand, why not adding it to your upscaler workflow?
SD Upscale is a script that comes with AUTOMATIC1111 that performs upscaling with an upscaler followed by an image-to-image to enhance details.
Step 1. Navigate to Img2img page.
Step 2. Upload an image to the img2img canvas.
(Alternatively, use Send to Img2img button to send the image to the img2img canvas)
Step 3. In the Script dropdown menu at the bottom, select SD Upscale.
Step 4. Set Scale factor to 4 to scale to 4x the original size.
Step 5. Set denoising strength to between 0.1 and 0.3. The higher it is, the more the image will change. (You should experiment with this)
Step 6. Set the number of sampling steps to 100. Higher steps improve details. (You should experiment with this)
Step 7. You can use the original prompt and the negative prompt. If you don’t have one, use “highly detailed” as the prompt.
Step 8. Press Generate.
Below is a comparison of adding an additional image-to-image with the SD Upscale script.
- Left: Universal Upscaler v2 to 4x.
- Right: SD Upscale with Universal Upscaler v2 to 4x, prompt “highly detailed”, denoising strength 0.3 and 100 sampling steps.
The SD Upscale script helps to improve details and reduce upscaling artifacts.
Hires Fix in txt2img page
You can optionally upscale every image generated on the txt2img page. To do so, you simply need to check the Hires. fix.
Additional options will appear under the checkbox. The options are similar to those using the SD Upscale script.
Personally, I don’t use Hires fix much because it slows down image generation. Instead of upscaling all images, I would rather only upscale the ones I am going to keep.
Once you see a good image, you can send it to img2img for SD upscaling.
Learn about a new upscaling method for Stable Diffusion: ControlNet Tile Upscale.