Stable Diffusion: The Ultimate Video-to-Video Tool!

Transforming videos into animation is never easier with Stable Diffusion AI. You will find step-by-step guides for 5 video-to-video techniques in this article. The best of all: You can run them FREE on your local machine!

  • ControlNet-M2M script
  • ControlNet img2img
  • Mov2mov extension
  • SD-CN Animation extension
  • Temporal Kit

They all use a similar approach of transforming each video frame individually with the help of ControlNet.

At the end of the article, I will survey other video-to-video methods for Stable Diffusion.

Now you will see a sample video from each method. Below is the original video.

(Download this video here if you wish to use it in the tutorial.)

Below is an example of using method 1: ControlNet-M2M script method.

Below is an example of using method 2: ControlNet img2img.

Below is an example of using method 3: mov2mov extension.

Below is an example of using method 4: SD-CN Animation extension.

Here’s an example of Method 5: Temporal Kit.

Software

You will need AUTOMATIC1111 Stable Diffusion GUI. You can use this GUI on WindowsMac, or Google Colab.

You will need to install the ControlNet extension. Follow the installation instructions in the ControlNet article.

Method 1: ControlNet m2m script

This video-to-video method is simpler to use but has a higher amount of flickering.

We will use the following video.

Use the following button to download the video if you wish to follow with the same video.

Step 1: Update A1111 settings

Before using the controlnet m2m script in AUTOMATIC1111, you must go to Settings > ControlNet. Select the following two options.

  • Do not append detectmap to output: Yes
  • Allow other script to control this extension: Yes

The first option disables saving the control image to the image output folder, so you can grab the frame images easier.

The second setting lets the controlnet m2m script feed the video frames to the ControlNet extension.

Click Apply Settings. Reload the Web-UI page.

Step 2: Upload the video to ControlNet-M2M

In AUTOMATIC1111 Web-UI, navigate to the txt2img page.

In the Script dropdown menu, select the ControlNet m2m script.

Expand the ControlNet-M2M section.

Upload the mp4 video to the ControlNet-0 tab.

Step 3: Enter ControlNet setting

Expand the ControlNet section. Enter the following settings:

Enable: Yes

Pixel Perfect: Yes

Control Type: Lineart

Preprocessor: lineart realistic

Model: control_xxxx_lineart

Control weight: 0.6

For your own videos, you will want to experiment with different control types and preprocessors.

Step 4: Enter txt2img settings

Select a model you wish to use in the Stable Diffusion checkpoint at the top of the page. I will use deliberate v2.

Come up with a prompt and a negative prompt. I will use the following:

photo of Sci fi cute girl, pink hair, photorealistic, in the style of franciszek starowieyski, white porcelain sci fi, mecha, 32k uhd, machine aesthetics, dark white and azure, hans zatzka, silver and pink, science fiction city,shiny pink hair, half body, oil painting, white background

deformed, disfigured, ugly

Enter the following generation parameters:

Sampling method: Euler a

Sampling steps: 20

Width: 768

Height: 512

CFG Scale: 7

Seed: 100

The seed value needs to be fixed to reduce flickering. Changing the seed will change the background and the look of the character.

Click Generate.

Step 5: Make an animated GIF or mp4 video

The script converts the image with ControlNet frame-by-frame. You will find a series of png files in the txt2img output folder.

You have two options: (1) Combine the PNG files into an animated GIF, and (2) make an mp4 video.

Animated GIF

Use the EZGIF page to convert the png files to an animated GIF.

In the GIF option, set the Delay time to 5 (in 1/100 seconds) for 20 frames per second.

Here’s the final animated GIF.

video to video stable diffusion controlnet

MP4 video

Use the following command to convert the png files to an MP4 video. (You will need to have ffmpeg installed on your local PC)

ffmpeg -framerate 20 -pattern_type glob -i '*.png' -c:v libx264 -pix_fmt yuv420p out.mp4

A reader pointed out that the above command didn’t work on Windows and suggested the command below instead.

ffmpeg -framerate 20 -pattern_type sequence -start_number 00000 -i ‘%05d-100.png’ -c:v libx264 -pix_fmt yuv420p out.mp4

Notes for ControlNet m2m script

Unfortunately, as of the time of writing, multiple ControlNet does NOT work with the m2m script. As we will see later, multiple ControlNet is a useful technique in reducing flickering. Hope future updates will rectify that.

Here’s another video transformed with ControlNet Line art realistic.

Experiment with different ControlNets to get different results. Here’s a video with the Tile resample.

Method 2: ControlNet img2img

This video-to-video method converts a video to a series of images and then uses Stable Diffusion img2img with ControlNet to transform each frame.

Use the following button to download the video if you wish to follow with the same video.

Step 1: Convert the mp4 video to png files

You can use the ezgif site to convert the mp4 video to png image files.

Upload the mp4 video file.

Use the following settings.

  • Size: Original
  • Frame rate: 10 fps

Click Convert to PNG!

Click Download frames as ZIP.

Extract the ZIP file to a folder of png image files.

Rename the folder name as video. Now you should have a folder called video containing the png files.

Alternatively, below are the commands if you prefer to use the command line to convert the mp4 video to png files.

mkdir video
ffmpeg -i girl_dance.mp4 -r 10 video/%05d.png

Step 2: Enter Img2img settings

You will need image-to-image and ControlNet settings to apply to each frame.

Now open AUTOMATIC1111.

We need some special settings for ControlNet. Go to the Settings page.

Click Show all pages on the left panel.

Search the text (Ctrl+F for Windows. Cmd+F for Mac) “Quicksettings list”. Add “initial_noise_multiplier” and “img2img_color_correction” to the list.

It should look like this:

Go to the top of the page. Click Apply settings and then Reload UI.

You should see two new settings: Noise multiplier for img2img and Apply color correction… on top of the page.

Set Noise multipler for img2img to 0.5. This scaling factor is applied to the random latent tensor for img2img. Lowering it reduces flickering.

Check Apply color correction… setting. This option is for matching the color of the original content. This helps color consistency across frames.

Now one more setting… Go to Settings > ControlNet. Select the following option.

  • Do not append detectmap to output: Yes

Click Apply Settings. Reload the Web-UI.

Go to img2img page.

In the Stable Diffusion checkpoint at the top of the page, select a model you wish to use. I will use deliberate v2.

Come up with a prompt and a negative prompt. I will use:

photo of Sci fi cute girl, pink hair, photorealistic, in the style of franciszek starowieyski, white porcelain sci fi, mecha, 32k uhd, machine aesthetics, dark white and azure, hans zatzka, silver and pink, science fiction city,shiny pink hair, half body, oil painting

deformed, disfigured, ugly

Upload one of the frames (i.e. the png files) to the img2img canvas.

Resize mode: Just resize

Sampling method: DPM++ 2M Karras

Sampling Steps: 20

Width: 908 (This is set to maintain the aspect ratio of the video)

Height: 512 (The shorter dimension is fixed to 512)

CFG scale: 20 (Experiment with this. The higher you set, the more it follows the prompt.)

Denoising strength: 0.4 (Experiment with this. The higher you set, the more changes but also more flickering)

Seed: -1 (random)

Step 3: Enter ControlNet settings

Now go to the ControlNet section…

Upload the same frame to the image canvas.

Enable: Yes

Pixel Perfect: Yes

Allow Preview: Yes

Control Type: Lineart

Preprocessor: Lineart Realistic

Model: control_xxxx_lineart

(Experiment with the control type, preprocessor, and model. Many of them will work just as well. The goal is to see the details like eyes, mouth, hairstyle got outlined in the preview)

Control Weight: 0.6 (Lower it when you see color artifacts)

Step 4: Choose a seed

Press Generate to test the setting.

Select the image on the left panel once you are happy with the effect.

To fix the seed, click the recycle icon next to the Seed value box. You should see the value changes from -1 to a positive number.

Step 5: Batch img2img with ControlNet

Now with all the hard work, you have generated one frame… The goal is to apply the same setting to ALL frames. Luckily, you can do that with batch processing.

First, remove the reference image in the ControlNet section. This step is important. Otherwise, you will be using this reference image for all frames! Click the cross icon on the top right to remove the image.

Confirm you see the reference image removed, like the screenshot below.

Keep the rest of the ControlNet settings untouched.

Now switch to the Batch tab on the Img2img page.

Enter the paths of the

  • Input directory: The folder containing the PNG files of your video.
  • Output directory: A new folder for your processed PNG files.

In Windows, the input directory is the folder location of the PNG files in File Explorer.

If you use Google Colab, copy the PNG files to your Google Drive and specify a path. The path can be found in the file explorer on the left and right-clicking a folder.

Click Generate to start the generation process.

Step 6: Convert the output PNG files to video or animated gif

Animated GIF

Use the EZGIF page to convert the png files to an animated GIF.

In the GIF option, set the Delay time to 10 (in 1/100 seconds) for 10 frames per second.

Here’s the video-to-video result (Denoising strength 0.5):

Increasing the denoising strength to 0.7 changes the video more but also increases flickering.

MP4 video

Use the following command to convert the png files to an MP4 video.

ffmpeg -framerate 10 -pattern_type glob -i '*.png' -c:v libx264 -pix_fmt yuv420p out.mp4

Note on ControlNet img2img

This is probably the most laborious out of all video-to-video methods. The reason you want to use it is simple: To gain total control of the process.

You will see quite a few Youtubers advocate this method. I recommend Enigmatic_e‘s videos to learn more about this method (and, generally, video-making with Stable Diffusion). This video from Corridor Crew walks you through a laborious method that produces high-quality Stable Diffusion videos.

Method 3: Mov2mov extension

The Mov2mov extension automates many of the manual steps of video-to-video tasks.

Use the following button to download the video if you wish to follow with the same video.

Step 1: Install Mov2mov extension

In AUTOMATIC1111 Web-UI, navigate to the Extension page.

Select Install from URL tab.

In the URL for extension’s git repository field, enter

https://github.com/Scholar01/sd-webui-mov2mov

Click Install.

Completely close and restart the Web-UI.

Step 2: Enter mov2mov settings

You should see a new page called mov2mov.

Select a Stable Diffusion checkpoint in the dropdown menu at the page top. I used Deliberate v2.

Enter the prompt and the negative prompt.

photo of Sci fi cute girl, pink hair, photorealistic, in the style of franciszek starowieyski, white porcelain sci fi, mecha, 32k uhd, machine aesthetics, dark white and azure, hans zatzka, silver and pink, science fiction city,shiny pink hair, half body, oil painting, white background

deformed, disfigured, ugly

Upload the video by dropping it to the video canvas.

Resize mode: Crop and resize.

Set the width to 768 and the height to 512 for a landscape movie. (Adjust according for your own video)

The extension has a nice slider for noise multiplier. Keep it at 0 to reduce flickering.

Adjust the CFG scale to control how much the prompt should be followed. (7 in this video)

Adjust the denoising strength to control how much the video should be changed. (0.75 in this video)

The Max frame is the total number of frames to be generated. Set to a low number e.g. 10 for initial testing. Set to -1 to generate a full-length video.

The seed determines the seed value of the FIRST frame. All frames will use the same seed value even if you set the seed to -1 (random).

Step 3: Enter ControlNet settings

Enter the following settings for ControlNet.

Enable: Yes

Pixel Perfect: Yes

Control Type: Lineart

Preprocessor: lineart_realistic

Model: control_xxxx_lineart

Control weight: 0.6

Important: Don’t upload a reference image. Mov2mov will use the current frame for the reference image.

Step 4: Generate the video

Click Generate to start generating the video.

It will take a while… When it is down, your new video will appear on the right.

Click Save to save the video.

Go to the output/mov2mov-videos folder to find the video if it doesn’t show up.

Try a different Video Mode if there is an error.

If the video generation fails, make the video yourself from the image series. They are in the folder output/mov2mov-images. Follow this step to convert the images to a video.

Here’s the final video from Mov2mov.

Note for mov2mov

For some reason, deterministic samplers (e.g. Euler, LMS, DPM++2M Karras…) do NOT work well with this extension. Otherwise, it would be a good way to reduce flickering.

Method 4: SD-CN-Animation

SD-CN-Animation is an AUTOMATIC1111 extension that provides a convenient way to perform video-to-video tasks using Stable Diffusion.

SD-CN-Animation uses an optical flow model (RAFT) to make the animation smoother. The model tracks the movements of the pixels and creates a mask for generating the next frame.

Note that this extension does not work for all videos. For example, it produces poor results with the video used in the previous 3 methods. Presumably, it is because of its dark background.

So I switch to another video for this walkthrough.

Download this video here if you want to use it to follow this tutorial.

Step 1: Installing the extension

In AUTOMATIC1111 Web-UI, navigate to the Extension page.

Select Install from URL tab.

In the URL for extension’s git repository field, enter

https://github.com/volotat/SD-CN-Animation

Click Install.

Completely close and restart the Web-UI.

Step 2: Enter SD-CN-Animation parameters

In AUTOMATIC1111 Web-UI, navigate to the SD-CN-Animation page.

Make sure “Apply color correction to img2img results to match original colors.” is NOT selected. (If you have enabled this option when testing the previous method) This color correction affects the RAFT model and produces poor results.

Upload the mp4 video file to the Input video section.

Set the width to 512. Set to height to 512. (Adjust accordingly for your video.)

Set the Prompt to

photo of Sci fi cute girl, pink hair, photorealistic, in the style of franciszek starowieyski, white porcelain sci fi, mecha, 32k uhd, machine aesthetics, dark white and azure, hans zatzka, silver and pink, science fiction city,shiny pink hair, half body, oil painting, white background

Set the Negative Prompt to

deformed, disfigured, ugly

Set the sampling method to DPM++2M Karras.

Step 3: Enter ControlNet Settings

We will use 2 ControlNets. If you don’t see multiple ControlNet tabs, go to Settings > ControlNet to enable them.

For ControlNet Unit 0:

  • Enable: Yes
  • Pixel Perfect: Yes
  • Control Type: Line Art
  • Preprocessor: Line art realistic
  • Model: control_xxxx_lineart
  • Control weight: 0.6

DON’T upload an image.

Leave all other settings as the default.

For ControlNet Unit 1:

  • Enable: Yes
  • Pixel Perfect: Yes
  • Control Type: Tile
  • Preprocessor: Tile resample
  • Model: control_xxxx_tile
  • Control weight: 0.5

DON’T upload an image.

Leave all other settings as the default.

Step 4: Generate the video

Click Generate to start processing.

Once it is done, right-click on the video, and you will find an option to save it.

Here’s what I got.

Notes for SD-CN-Animation

I like the fact that this extension is quite polished. Things work without error. (My expectation in software engineering is low when using A1111…)

Make sure you have unchecked “Apply Color correction…” for img2img. Otherwise, you won’t get the best results.

All samplers work in this extension. Make sure to pick a deterministic sampler to reduce flickering. (See comments on flickering below)

Method 5: Temporal Kit

Temporal Kit implements several methods for video-to-video conversion. I’m only going to tell you the best one here.

The basic idea is to pick keyframes across the video (e.g. 16), stylize them with image-to-image, and use them as references to paint adjacent frames.

This method was pioneered by EbSynth, a computer program for painting videos. It was created before Stable Diffusion, but img2img capability in Stable Diffusion has given it a new life.

However, the result will be poor if you do image-to-image on individual frames. The reason is that the resulting images lack coherence.

The trick is to transform ALL keyframes at once by stitching them together in one giant sheet. Like this:

We used to do it manually. But with Temporal Kit, you don’t have to.

Step 1: Install Temporal Kit extension

In AUTOMATIC1111 Web-UI, navigate to the Extension page.

Select Install from URL tab.

In the URL for extension’s git repository field, enter

https://github.com/CiaraStrawberry/TemporalKit

Click Install.

Completely close and restart the Web-UI.

Step 2: Install FFmpeg

Visit FFmpeg’s download page and download the FFmpeg program for your platform.

It should be a zip file. After unzipping, you should see a file called ffmpeg or ffmpeg.exe. This is the FFmpeg program you need!

But to let Temporal Kit use it, you need to put it in the PATH so that it can be accessed anywhere, by everyone.

If you know what PATH means. Put it in one of the directories in the PATH.

Read on if not…

Windows

Press the Windows key. Type envirnoment and click the item “Edit environment variables for your account”.

Select PATH, and then Edit.

Add a new entry by clicking New and then type

%USERPROFILE%bin

After adding, you should see the new entry of the above path.

Click OK to save and exit.

Open File Explorer. In the address bar, type

%USERPROFILE%

And press Enter. You should have gone to your home folder.

Create a new folder called bin.

Test going there by putting the following in the address bar and press Enter.

%USERPROFILE%bin

You should be in your newly created folder bin.

Now put ffmpeg.exe in this folder, and you are all set. Now the file is in your PATH.

To test, open a command prompt by pressing the Windows key and type cmd. Press Enter.

In the command prompt, type

ffmpeg

and press Enter.

You should see ffmpeg’s help page.

Mac or Linux

Open the Terminal App.

Create a new folder bin in your home directory.

mkdir ~/bin

Put the ffmpeg file in the new directory. You can use Finder.

Edit .zprofile in your home directory.

Add the following line

export PATH=~/bin:$PATH

Start a new Terminal and type

ffmpeg

You should see the help page of ffmpeg displayed. This verifies FFmpeg is in your path.

Step 3: Enter Pre-processing parameters

In AUTOMATIC1111, Go to the Temporal Kit page.

Go to the Pre-Processing tab.

Upload your video to the Input video canvas. (Download this video if you want to use the same video to follow the tutorial.)

Next is to generate one giant sheet of keyframes. This sheet will go through img2img so that all keyframes will have the same style.

Set:

  • Side: 4 (This is setting a 4×4 images grid)
  • Height resolution: 2048 (Since each image is 512, 512×4 = 2048)
  • frames per keyframe: 4 (How many frames each keyframe is responsible for)
  • fps: 30
  • EbSyth mode: Yes
  • Target Folder: Put in a folder name path to wish to save this project to. E.g. G:temporalkittest1

Click Run on the right panel. You should see a sheet of 4×4 keyframes generated.

Make sure these keyframes cover the whole video for your own video. Adjust those parameters if not.

If you encounter out of memory issue in the next img2img step, reduce the side or resolution parameters.

Click Save Settings if you are happy with the result.

Click Send to img2img.

Step 4: Perform Img2img on keyframes

Go to the Img2img page. The giant sheet of keyframes should already be in the image canvas.

Switch to the Batch tab.

Input directory: The name of your target directory with input appended. E.g. G:temporalkittest1input

Output directory: Similarly but with output appended. Eg. G:temporalkittest1output

The image size should be corrected and set automatically. (2048×2048).

Enter a prompt. I used

photo of Sci fi cute girl, pink hair, photorealistic, in the style of franciszek starowieyski, white porcelain sci fi, mecha, 32k uhd, machine aesthetics, dark white and azure, hans zatzka, silver and pink, science fiction city,shiny pink hair, half body, oil painting, white background

And a negative prompt:

deformed, disfigured, ugly

Sampling method: DPM++2M Karras

Sampling steps: 20

CFG scale: 7

Denoising strength: 0.5 (adjust accordingly)

In ControlNet (Unit 0) section, set:

  • Enable: Yes
  • Pixel Perfect: Yes
  • ControlType: Tile
  • Preprocessor: tile_resample
  • Model: control_xxxx_tile

Press Generate. After it is done, you will find the image in the batch output folder.

Make sure to open the image in full size and inspect the details in full size. Make sure they look sharp and have a consistent style.

Step 5: Prepare EbSynth data

Now we need to generate data to put into EbSynth.

Go to Temporal-Kit page and switch to the Ebsynth-Process tab.

Input Folder: Put in the same target folder path you put in the Pre-Processing page. E.g. G:temporalkittest1

Click read last_settings. If your input folder is correct, the video and the settings will be populated.

Click prepare ebsynth. After it is done, you should see the keys folder populated with your stylized keyframes, and the frames folder populated with your images.

Step 6: Process with EbSynth

Now open the EbSynth program.

Open the File Explorer and navigate to the project folder. You should folder like the ones showed below. We need the keys folder and the frames folder for EbSynth.

Drag the keys folder from the File Explorer and drop it to the Keyframes field in EbSynth.

Drag the frames folder from the File Explorer and drop it to the frames field in EbSynth.

After these two steps, EbSynth should have populated with the correct settings with a bunch of Synth buttons. There is one row for each keyframe. Each keyframe acts as a reference and stylizes a certain number of frames.

Click Run All and wait for them to complete.

When it is done, you should see a series of out_##### directories generated in the target project folder.

Step 7: Make the final video

Now go back to AUTOMATIC1111. You should still be on the Temporal Kit page and Ebsynth-Process tab.

Click recombine ebsynth and you are done!

Look how smooth the video is. With some tweaking, you can probably make it better!

Variations

Multiple ControlNets

Experiment with Multiple ControlNet to further fix small details and reduce flickering.

For example, you can add a second ControlNet to use reference only to fix the character’s look. I used a frame from a video generated before.

Here’s the video with lineart and reference-only ControlNets.

Other settings

Experiment with denoising strength for a trade-off between the amount of change and flickering.

Sometimes, the input video is too hard to process well with ControlNet. Try another one with a bigger and slow-moving subject.

Deflickering

Multiple ControlNet

Using multiple ControlNet to fix features in the video can significantly reduce flickering.

For example, this is with ONE ControlNet, Line art realistic.

Lineart realistic (weight: 0.6)

What if we add one more ControlNet, the Tile Resample?

Lineart realistic (weight: 0.6). Tile Resample (weight: 0.5).

Not only it flickers less, but it also helps to preserve the color of the original video.

A similar degree of deflickering can be achieved by adding the Canny ControlNet.

Lineart realistic (weight: 0.6). Canny (weight: 0.5)

The tradeoff is it is going to take longer to process a video. But I think its worth it!

Post-processing

Videos made using Stable Diffusion ControlNet still have some degree of flickering. Here are something you can do the flickering.

Da Vinci Resolve has a deflickering plugin you can easily apply to the Stable Diffusion video. Unfortunately, it is only available in the paid version (Studio).

If you are not prepared to shell out for that and are tech-savvy, use this deflickering model to process your videos.

deterministic samplers

Use a deterministic sampler to reduce flickering.

Below is using Euler a, a stochastic sampler. (Produced with SD-CN-animation.)

The video below uses the same settings except using DPM++2M Karas, a deterministic sampler.

Note her face and hair flicker less.

Some examples of deterministic samplers are

  • Euler
  • LMS
  • Heun
  • DPM++2M
  • DPM++2M Karas

See the sampler article for an overview.

Other video-to-video Options

EbSynth

EBSynth is used to paint over a video, either manually or with AI image generators such as Stable Diffusion.

You will then dice the image back to 4 individual images and use them as keyframes in EbSynth.

The reason to go through this process is to improve the consistency across the keyframes. If the keyframes were transformed with img2img individually, they would normally have too much variation.

The EbSynth method is best to be executed with the Temporal Kit.

Deforum

Deforum has a video-to-video function with ControlNet. (Writing about this soon.)

Stable WarpFusion

Stable WarpFusion is a paid Colab Notebook that produces amazing video transformations with custom algorithms.

Leave a Reply

Your email address will not be published. Required fields are marked *