Back

This ComfyUI workflow turns a single image and a voice recording into a lip-synced talking video using the LTX-2.3 model. You load a portrait with LoadImage and provide speech via LoadAudio or capture it live with RecordAudio. Both streams feed the LTX-2.3 generator node (98ee9e5b-467b-40aa-a534-36033f27d0b4), which synthesizes a sequence of frames where the subject speaks in time with the audio. The resulting frames are encoded to an MP4 using SaveVideo.

Under the hood, LTX-2.3 conditions on the visual identity from your reference image and the temporal features of the provided audio to drive mouth shapes and subtle facial motions over time. The node typically exposes settings like output resolution and FPS, and many builds also include a seed for reproducibility and an optional text prompt to guide motion or style. The MarkdownNote in the graph documents quick tips and links to the official Lightricks model repositories so you can download the required weights.