[ComfyUI] AnimateDiff + ControlNet Keyframe + Prompt Travel (English ver.)

AIGC

While Prompt Travel is effective for creating animations, it can be challenging to control precisely. To address this, I've gathered information on operating ControlNet KeyFrames.


Chinese Version


Prompt Travel Overview

Prompt Travel has gained popularity, especially with the rise of AnimateDiff. Here, I'll provide a brief introduction to what Prompt Travel is. In AnimateDiff, using this method for animation allows us to utilize a specific format for our prompts.

It primarily consists of three parts:

  1. Head Prompt
  2. Frames Prompt
  3. Tail Prompt

When writing it, it looks something like this:

(best quality:1.2), ultra highres, 8k, vibrant, adult girl,
"0": "(close mouth:1.4), looking aside",
"12": "smile:1.3, looking at viewer",
"24": "smile:1.3, looking aside"
sitting, coffeeshop

The head prompt is,

(best quality:1.2), ultra highres, 8k, vibrant, adult girl,

Note that there is no comma (,) after the last prompt in the Frames Prompt section.

"0": "(close mouth:1.4), looking aside",
"12": "smile:1.3, looking at viewer",
"24": "smile:1.3, looking aside"

And the tail prompt is,

sitting, coffeeshop

When using AnimateDiff, it combines the Head Prompt with each frame's prompt and finally adds the Tail Prompt to create your specified frame's complete prompt. In other words, each frame's prompt will look like this,

Nth Frame Prompt = Head Prompt + Nth Frame's Prompt + Tail Prompt

This should help you understand the Prompt Travel writing style.


ComfyUI & Prompt Travel

To use Prompt Travel in ComfyUI, it is recommended to install the following plugin:

It provides a convenient feature called Batch Prompt Schedule.

Batch Prompt Schedule

If you solely use Prompt Travel for creation, the visuals are essentially generated freely by the model based on your prompts. Afterward, you rely on the capabilities of the AnimateDiff model to connect the produced images.

Prompt Travel Simple Workflow

Of course, such a connecting method may result in some unnatural or jittery transitions.

Prompt Travel Failed

So, to avoid this situation, we need to intervene with ControlNet to try to achieve the desired results.


CLI Enthusiast's Paradise

This project allows for generating good results via the command line,

AnimateDiff prompt travel

If you are an engineer or not averse to the command line and modifying JSON files, you can give it a try. Since it takes care of many things for you, you don't need to spend effort adjusting ComfyUI. Additionally, you can quickly get test results. However, it has a relatively high VRAM requirement, so if it doesn't run, you may need to reduce the usage of ControlNets in its configuration file.

This tool offers a wide range of features, and in some aspects, it currently surpasses what can be done in ComfyUI. Trying to replicate it in ComfyUI would probably result in over a hundred nodes.


ControlNet & KeyFrames

Someone in the bird nest group mentioned controlling keyframes, so I looked for ways to do it. Coincidentally, there is a similar article on C Station,

Animatediff Workflow: Openpose Keyframing in ComfyUI

After a quick look, I summarized some key points. First, the placement of ControlNet remains the same. However, we use this tool to control keyframes,

ComfyUI-Advanced-ControlNet

ControlNet Latent keyframe Interpolation

We will use the following two tools,

  • Timestamp Keyframe used to control the interface of keyframes and returns TIMESTEP_KEYFRAME to ControlNet.
    • latent_keyframe Input latent keyframe, the only input we currently need.
  • Latent Keyframe Interpolation interpolates keyframes in latent space and returns them to Timestamp Keyframe.
    • prev_latent_keyframe Previous latent keyframe.
    • batch_index_from Starting batch index.
    • batch_index_to_excl Ending batch index (exclusive).
    • strength_from Starting strength.
    • strength_to Ending strength.
    • interpolation Interpolation function to use from start to end, with four options: linear, ease-in, ease-out, ease-in-out.

After understanding the basic input and output, let's revisit this configuration. You will notice that I performed Latent Keyframe Interpolation twice and made it have a relationship between the previous and the next. This operation is also referenced from the C Station article. The main purpose is to avoid producing abrupt changes during keyframe processing, allowing adjustments to the entering and exiting interpolations.

2-pass Interpolation

The part inside the red box represents the interpolation operation done for "entering" this keyframe region, while the yellow box indicates the interpolation operation to be performed when "leaving" this keyframe region.

So we have three numbers:

  • Start ~ Middle represents the frames during which the red interpolation operation takes place.
  • Middle ~ End represents the frames during which the yellow interpolation operation takes place.

In this way, a ControlNet can be controlled for this keyframe.

ControlNet with timestep_keyframe

The ControlNet above represents the following:

  1. Inject the OpenPose from frames 0 ~ 5 into my Prompt Travel.
  2. The strength of this keyframe undergoes an ease-out interpolation.
  3. The strength decreases from 1.0 to 0.2 and then ends.
  4. The subsequent frames are left for Prompt Travel to continue its operation.

This way, we can control a part of the keyframes to make them fit the specifications of the ControlNet. In the example mentioned above, it is for the skeletal frame. Within these frames, our Prompt Travel will generate the specified results for the OpenPose.

Keyframes with OpenPose

So, what if we have many keyframes to control?

It would look like this:

5 ControlNet's Timestep_Keyframe

The above ControlNets use two sets of ControlNets for each keyframe, where one set is bypassed (ByPass), indicated by the purple color. In other words, all the ControlNets shown above add up to 10, and these 10 ControlNets are solely for controlling 5 keyframe sections.

Feeling the VRAM shiver?

Enabling all of them would require 15GB VRAM.

Here's an example output video for reference,

Prompt Travel with ControlNet


Conclusion

This is a simple introduction, and the actual operation depends on your creativity. My entire workflow is available here for those interested in exploring it.

Prompt_Travel_5Keyframes_10CN_5pass

https://github.com/hinablue/comfyUI-workflows/blob/main/Prompt_Travel_5Keyframes_10CN_5pass.png

Tips

If you want to use my workflow, pay attention to a few things,

  1. If the video magnification factor is not 2, when saving as video/h264-mp4, you may encounter errors like width not divisible by 2.
  2. The higher the multiplier number inside RIFE VFI, the slower the video speed.

If you find that the video cannot be saved due to issues like being not divisible by 2, you can modify the file custom_nodes/ComfyUI-AnimateDiff-Evolved/video_formats/h264-mp4.json,

{
    "main_pass":
    [
        "-n", "-c:v", "libx264",
        "-pix_fmt", "yuv420p",
        "-crf", "19",
        "-vf", "\"pad=ceil(iw/2)*2:ceil(ih/2)*2\""
    ],
     "extension": "mp4"
}

change it to the following, adding "-vf", "\"pad=ceil(iw/2)*2:ceil(ih/2)*2\"" will help. It will run ffmpeg and add white padding ( padding ) to your video to make its width or height divisible by 2, allowing your video to be saved normally.

Hina Chen
偏執與強迫症的患者,算不上是無可救藥,只是我已經遇上我的良醫了。
Taipei