Stable Diffusion: Harnessing the Power of Embeddings for AI

Embedding, also called textual inversion, is an alternative way to control the style of your images in Stable Diffusion. We will review what embedding is, where to find them, and how to use them.

What is embedding?

Embedding is the result of textual inversion, a method to define new keywords in a model without modifying it. The method has gained attention because its capable of injecting new styles or objects to a model with as few as 3 -5 sample images.

How does textual inversion work?

The amazing thing about textual inversion is NOT the ability to add new styles or objects — other fine-tuning methods can do that as well or better. It is the fact that it can do so without changing the model.

The diagram from the original research article reproduced below illustrates how it works.

how does embedding work.
New embedding is found for the new token S* through textual inversion.

First you define a new keyword that’s not in the model for the new object or style. That new keyword will get tokenized (that is represented by a number) just like any other keywords in the prompt.

Each token is then converted to a unique embedding vector to be used by the model for image generation.

Textual inversion finds the embedding vector of the new keyword that best represents the new style or object, without changing any part of the model. You can think of it as finding a way within the language model to describe the new concept.

Examples of embeddings

Embeddings can be used for new objects. Below is an example of injecting a toy cat. Note that the new concept (toy cat) can be used with other existing concepts (boat, backpack, etc) in the model.

Example of embedding an object.

Embeddings can also be a new style. The example below shows embedding a new style and transferring the style to different context.

Example of embedding a style.

Where to find embeddings

Hugging Face host the Stable Diffusion Concept Library, which is a repository of large number of custom embeddings.

Stable Diffusion concepts library.

Civtai is another great site you can browse models, including embeddings. Filter with textual inversion to view embeddings only.

How to use embeddings

Web interface

Stable Diffusion Conceptualizer is a great way to try out embeddings without downloading them.

First identify the embedding you want to test in the Concept Library. Let’s say you want to use this Marc Allante style. Next, identify the token needed to trigger this style. You can find it in the file token_identifier.txt, which is <Marc_Allante>.

Putting in the prompt

<Marc_Allante> a dog

Gives you the unique Marc Allante style.

The downside of web interface is you cannot use the embedding with a different model or change any parameters.

AUTOMATIC1111

Using embedding in AUTOMATIC1111 is easy.

First, download an embedding file from the Concept Library. It is the file named learned_embedds.bin. Make sure don’t right click and save in the below screen. That will save a webpage that it links to. Click of the file name and click the download button in the next page.

The embedding file.

Next, rename the file as the keyword you wanted to use this embedding with. It has to be something not exist in the model. marc_allante.bin is a good choice.

Put it in the embeddings folder in the GUI’s working directory:

stable-diffusion-webui/embeddings

Restart the GUI. In startup terminal, you should see a message like:

Loaded a total of 1 textual inversion embeddings.
Embeddings: marc_allante

Use the filename as part of the prompt to

For example, the following prompt would work on AUTOMATIC1111.

(marc_allante:1.2) a dog

We get the image with the expected style.

Shortcut to use embeddings in AUTOMATIC1111

Embedding won’t work even if it’s one letter off. Also, you cannot use v1 embeddings with v2, and vice versa — They are using two different language models.

Have you ever wonder how to make sure you are actually using the embeddings? It could be difficult to tell because It’s effect can sometimes be subtle.

There’s a little trick in AUTOMATIC1111 to ensure that. There’s a button between the trash and the copy buttons that looks like a little ipod (sorry if it was from a time before you were born…).

Click it and you will see all the embeddings that are available. They are all under the Textual Inversion tab.

Clicking any of them will insert that into the prompt. This function is especially useful to eliminate the tedious work of making sure you’ve entered the embedding magic word correctly.

Note on using embeddings in AUTOMATIC1111

If you pay attention to the prompt, you would notice I have increased the strength of the triggering keyword marc_allante. I found that it is necessary to adjust the keyword strength. This may have something to do with the way AUTOMATIC1111 loads the embedding.

You may have to play with the keyword strength to get the effect you want. Below is an example of varying the strength while keeping the seed and everything else the same.

Adjust keyword strength to get the effect you want.

To further complicate the matter, the strength needed could be different for different seed values.

Some embeddings I like

There are many embeddings available than I can try. Here’s a few I found that I like.

wlop_style

If you have played with Stable Diffusion base models, you will find it impossible to generate wlop‘s style no matter how hard you try. Embedding together with a custom model can finally do this.

The wlop_style embedding is able to render some nice illustration style of the artist wlop. It should be used with SirVeggie’s wlop-any custom model. (See this guide for installing custom models.)

Direct download link – wlop_style embedding

Direct download link – wlop-any model

If you try it out, you may find it doesn’t work at all. What you need to do is adjusting the prompt strength.

A working prompt for AUTOMATIC1111 is

(wlop_style :0.6) (m_wlop:1.4) woman wearing dress, perfect face, beautiful detailed eyes, long hair, birds

Negative prompt:

closed eyes, disfigured, deformed

wlop_style is keyword for embedding, m_wlop is keyword for the model.

Don’t get frustrated if you don’t get the style. Try changing the prompt strengths of the two keywords. Some objects may simply doesn’t work with the embedding. Try some common objects in wlop’s artworks.

Kuvshinov

Kuvshinov is a Russian illustration. You can use the kuvshinov embedding with Stable Diffusion v1.4.

Direct download link

Prompt:

(_kuvshinov:1), a woman with beautiful detailed eyes, highlight hair

Negative prompt:

disfigured, deformed

(Note I have renamed the embedding as _kuvshinov.bin)

Difference between embedding, dreambooth and hypernetwork

There are three popular methods to fine-tune Stable Diffusion models: textual inversion (embedding), dreambooth and hypernetwork.

Embedding defines new keyword to describe a new concept without changing the model. The embedding vectors are stored in .bin or .pt files. Its file size is very small, usually less than 100 kB.

Dreambooth injects a new concept by fine-tuning the whole model. The file size is typical of Stable Diffusion, around 2 – 4 GB. The file extension is the same as other models, ckpt.

Hypernetwork is an additional network attached to the denoising UNet of Stable Diffusion model. The purpose is to fine-tune a model without changing the model. The file size is typically about 100 MB.

Pros and Cons of using embedding

One of the advantages of using embedding is its small size. With file size of 100 KB or less, it is simple to store multiple of them in your local storage. Because embeddings are just new keywords, they can be used together in the same image.

The drawback of using embedding is sometimes its not clear which model it is supposed to be used with. If the trainer didn’t say, you can start with v1.4 or v1.5. You may also want to include VAE to see if that makes any difference. For anime styles, it is not uncommon for trainers to use anime models like Anything v3.

In general, I found using embedding a bit more difficult than using custom models. I had trouble reproducing the demo styles in many embeddings I downloaded. It’s true that I may get there if I keep tweaking the keyword strengths, but in reality people move on after a few tries.

Leave a Reply

Your email address will not be published. Required fields are marked *