
This ComfyUI tutorial workflow demonstrates how to generate controllable videos with Wan2.2 Fun Control (14B), guided by pose, depth, and edge inputs. At its core, the Wan22FunControlToVideo node fuses your text prompt from CLIPTextEncode with one or more control streams derived from a reference video. The model stack loads via UNETLoader and VAELoader, while CLIPLoader and CLIPTextEncode provide multilingual text conditioning. Sampling runs through ModelSamplingSD3 with KSamplerAdvanced to balance quality, speed, and control strength. Finally, VAEDecode reconstructs frames and SaveVideo exports the result.
You can preprocess control videos directly in the graph: LoadVideo and GetVideoComponents ingest your footage, Canny extracts edges, and optional custom nodes (comfyui_controlnet_aux and ComfyUI-DepthAnythingV2) add pose/depth maps. CreateVideo sets resolution, frame count, and FPS to match Wan2.2’s training regime (multi-resolution 512/768/1024, 81 frames at 16 FPS). The Start_image group lets you anchor the first frame with LoadImage for a consistent subject/look. An optional LoraLoaderModelOnly applies the Wan2.2 Lighting LoRA for faster renders at the cost of reduced motion dynamics. This gives you a flexible, reproducible pipeline for video-to-video generation with reliable structure preservation and style control.