0:00 / 0:00

How to lip sync onto ANY Image using AI 🔥 Comment “AI” to try this on @heygen_official #HeyGen #AIStorytelling #AIAvatars #GenerativeAI

Rourke Sefton-Minns

@rourke · creator

INSTAGRAM · 2025-12-21Source

3.8Klikes

1.4Kcomments

Remix This

Recreate with Kling 3

Make your own AI viral video

Prompt

GLOBAL LOCK: The subject is a Caucasian male in his early 30s with medium-length, wavy dark brown hair and a short, neat beard. He consistently wears a light-colored baseball cap with a subtle front logo. The video alternates between cinematic, AI-generated environments featuring variations of this subject, and a practical screen-recording tutorial where the subject appears in a Picture-in-Picture (PiP) box wearing a solid brown hoodie. The camera language in the AI scenes is highly cinematic with motivated lighting, while the tutorial section is a clean, direct-to-camera capture. The speech style is energetic, conversational, and educational, with a clear, close-mic podcast-style audio signature.

[00:00–00:04]
Visuals: A 3-panel horizontal split screen. Top panel: Subject in a snowy, mountainous environment, wearing a white t-shirt, cool daylight. Middle panel: Subject in a grassy field, wearing a white t-shirt, warm sunlight. Bottom panel: Subject holding a vintage silver camera to his eye, wearing a white t-shirt, cloudy sky background. Large white text "AI Lip Sync" pops onto the screen in the center.
Camera: Medium close-ups in all panels. Static framing.
Motion: Subtle environmental motion (snow blowing, grass swaying). Subject's lips move in sync with the audio.
Speech: Subject (Voice A, energetic, amazed): "This lip-syncing technology is mind-blowingly good."
Audio: Clean, upfront vocal. No room reverb.

[00:04–00:08]
Visuals: Full screen. Subject is wearing a detailed white astronaut suit with a clear helmet visor. He is in outer space, with a highly detailed, cratered moon surface above him and the curve of Earth below.
Camera: Close-up on the subject's face inside the helmet. Slow, smooth push-in.
Lighting: High-contrast, stark directional light mimicking space, with reflections on the helmet visor.
Motion: Subject raises his gloved hands slightly in a questioning gesture. Lips sync to audio.
Speech: Subject (Voice B, slightly muffled/radio-filtered, inquisitive): "So what you're telling me is I can lip-sync onto any AI-generated image of me?"

[00:08–00:12]
Visuals: Full screen. An older, aged-up version of the subject. He has grey, wavy hair and a grey beard, still wearing the signature light-colored cap and a white t-shirt. Background is a seamless, dark grey studio backdrop.
Camera: Starts as a high-angle wide shot looking down at him, then cuts to an eye-level medium close-up.
Lighting: Soft, dramatic studio overhead key light, creating deep shadows under the cap brim.
Motion: Subject looks up, smiles warmly, and nods. Lips sync to audio.
Speech: Subject (Voice C, slightly raspy, warm, paternal): "That's exactly right, Rourke. You clever man."

[00:12–00:16]
Visuals: Full screen. Subject (original age) is looking out of a large, circular spaceship window. The HeyGen logo is faintly visible as a watermark/hologram. He wears a white t-shirt and his signature cap.
Camera: Medium shot, framing the subject symmetrically within the circular window. Static.
Lighting: Cool, cinematic blue light emanating from the window, casting a glow on his face against a dark background.
Motion: Subject has his hands pressed against the glass, looking out in awe, then turns his head to speak directly to the camera.
Speech: Subject (Voice A, original voice, enthusiastic, urgent): "But this is a huge unlock for storytelling. How do people not know about this?"

[00:16–00:30]
Visuals: Screen recording of the HeyGen desktop UI in dark mode. In the bottom left corner, there is a rectangular Picture-in-Picture (PiP) video of the subject. In the PiP, he is wearing a brown hoodie and his signature cap, sitting in a room with warm, practical lighting (an orange glow behind him). The screen recording shows the mouse cursor clicking "Create," then "Photo to video." It shows uploading an image (a silhouette profile against a giant orange sun), uploading an audio file, and typing "Hand gestures" into a "Custom motion prompt" box.
Camera: UI capture is static. PiP camera is a static medium close-up, eye-level webcam style.
Motion: Mouse cursor moves smoothly across the UI. In the PiP, the subject uses active hand gestures (pointing up, making a small pinching motion, giving a thumbs up) as he explains the steps.
Speech: Subject (Voice A, instructional, fast-paced): "Go to create and click on photo to video. Then you can upload an image that you've created and upload the audio that you want to lip-sync onto. And you can even add on more expressive emotions into your character. Then you can hit generate..."

[00:30–00:35]
Visuals: The screen recording transitions to show the final generated video playing: a stark black silhouette of the subject's profile against a massive, glowing orange sun. The PiP of the subject remains in the bottom corner. Large white serif text "A I" appears on the screen.
Camera: Silhouette video is a static profile shot. PiP remains static.
Lighting: Silhouette scene is heavily backlit by the orange sun.
Motion: The silhouette's lips move in sync with the audio. The PiP subject points towards the camera and gestures.
Speech: Subject (Voice A, concluding, call-to-action): "...and in seconds you'll have your video. If you want to try it out for yourself, type AI in the comments and I'll send you a link."

NEGATIVE PROMPT:
Visuals: deformed anatomy, extra fingers, unnatural skin textures, flickering lighting, temporal jitter, morphing backgrounds, illegible UI text, robotic or stiff body movements, mismatched eye contact, floating objects, bad green screen edges around PiP.
Audio: robotic cadence, slurred words, harsh sibilance, plosives, audio clipping, out-of-sync lip movements, unnatural pauses, metallic AI voice artifacts.

SPEECH PACK:
[00:00-00:04]
Speaker: Voice A (Creator)
Transcript: "This lip-syncing technology is mind-blowingly good."
Prosody: Emphasis on "mind-blowingly". Energetic, fast-paced.

[00:04-00:08]
Speaker: Voice B (Astronaut Avatar)
Transcript: "So what you're telling me is I can lip-sync onto any AI-generated image of me?"
Prosody: Slight pause after "is". Inquisitive tone, rising inflection at the end.

[00:08-00:12]
Speaker: Voice C (Older Avatar)
Transcript: "That's exactly right, Rourke. You clever man."
Prosody: Warm, slow, deliberate pacing. Slight chuckle on "clever man".

[00:12-00:16]
Speaker: Voice A (Creator)
Transcript: "But this is a huge unlock for storytelling. How do people not know about this?"
Prosody: Urgent, passionate. Emphasis on "huge unlock".

[00:16-00:30]
Speaker: Voice A (Creator - Tutorial Mode)
Transcript: "Go to create and click on photo to video. Then you can upload an image that you've created and upload the audio that you want to lip-sync onto. And you can even add on more expressive emotions into your character. Then you can hit generate..."
Prosody: Clear, instructional cadence. Steady pace, pausing slightly at punctuation marks to match UI actions.

[00:30-00:35]
Speaker: Voice A (Creator - CTA Mode)
Transcript: "...and in seconds you'll have your video. If you want to try it out for yourself, type AI in the comments and I'll send you a link."
Prosody: Upbeat conclusion. Strong emphasis on "type AI". Friendly, inviting tone.

Why rourke's HeyGen AI Lip Sync AI Video Went Viral — and the Formula Behind It

This Instagram Reel by @rourke is a masterclass in demonstrating AI capabilities while maintaining a highly personal, creator-driven aesthetic. The video seamlessly blends high-quality, cinematic AI-generated portraits—ranging from a snowy mountain explorer to an astronaut and a futuristic spaceship passenger—with a practical, over-the-shoulder style software tutorial. The core visual hook relies on a dynamic three-panel split screen that immediately establishes the "mind-blowing" quality of the AI lip-sync technology. By having the creator converse with different AI-generated versions of himself, the video transforms a standard software walkthrough into an engaging narrative about the future of storytelling. The transition from these cinematic hooks to a clear, picture-in-picture (PiP) screen recording of the HeyGen interface grounds the magic in reality, showing viewers exactly how achievable this is. The strategic use of an auto-DM call-to-action ("Comment 'AI'") acts as rocket fuel for the algorithm, resulting in a massive comment-to-like ratio that guarantees extended reach.

2. What You're Seeing: Visual & Audio Breakdown

The video is structured in two distinct halves: the "Hook & Proof" phase and the "Execution" phase. The creator maintains a consistent casual aesthetic (baseball cap, relaxed clothing) even when transformed into different characters. The lighting in the AI scenes is highly cinematic—notice the dramatic rim lighting on the silhouette and the cool, motivated blue light in the spaceship scene. The editing is snappy, with cuts landing perfectly on conversational beats between the creator and his AI clones. The audio mix is clean, with the AI voices matching the creator's natural cadence but adapted slightly to fit the visual context (e.g., sounding slightly older or muffled in a space suit).

Shot-by-Shot Breakdown

Time Range	Visual Content	Shot Language	Lighting & Color Tone	Viewer Intent
00:00 - 00:04	3-panel split screen of creator in snow, grass, and holding a vintage camera. Bold "AI Lip Sync" text appears.	Medium close-ups, static framing within panels.	Bright, natural daylight. High contrast in the snow scene, warm greens in the grass scene.	Immediate visual hook; establishes the high quality and variety of the AI output.
00:04 - 00:08	Creator as an astronaut in space, Earth in the background.	Close-up, slight slow push-in.	High-contrast space lighting, stark white highlights on the suit, deep black background.	Demonstrate extreme contextual flexibility of the tool.
00:08 - 00:12	Older version of the creator looking down, then smiling at the camera.	High angle shifting to eye-level medium shot.	Studio lighting, soft overhead key, neutral grey background.	Showcase emotional range and age progression capabilities.
00:12 - 00:16	Creator looking out a circular spaceship window. HeyGen logo subtly integrated.	Medium shot, framing within a frame (the window).	Cool, cinematic blue tones, motivated light from the window.	Build narrative intrigue and transition to the sponsor/tool reveal.
00:16 - 00:30	Screen recording of HeyGen UI with creator in a PiP box at the bottom.	UI capture + static medium close-up for PiP.	Warm, practical room lighting for the PiP; standard dark mode UI colors.	Provide actionable, step-by-step proof that the tool is easy to use.
00:30 - 00:35	Final generated video of a silhouette lip-syncing against an orange sun, with PiP creator wrapping up.	Profile silhouette, static.	Vibrant, saturated orange background, deep black silhouette.	Deliver the final payoff and issue the engagement CTA.

3. Why It Went Viral: The Mechanics of Engagement

Topic Selection & Psychological Triggers

This video taps directly into the current zeitgeist of generative AI, specifically addressing a major pain point for creators: scaling content production without losing personal identity. The topic appeals to a broad spectrum of indie creators, marketers, and tech enthusiasts. Psychologically, it leverages the "curiosity gap" right from the start. By showing three distinct, high-quality environments simultaneously, the viewer's brain is forced to process how this was achieved. The conversational format—where the creator talks to his own AI avatars—adds a layer of novelty and mild surrealism that keeps eyes glued to the screen. It moves beyond a dry software tutorial into a mini-narrative about the "unlock for storytelling."

The Power of the "Clone" Interaction

The interaction between the real creator and his AI counterparts is a brilliant retention tactic. When the astronaut asks, "So what you're telling me is I can lip-sync onto any AI-generated image of me?" and the older avatar replies, "That's exactly right, Rourke," it creates a dynamic dialogue that is much more engaging than a standard voiceover. This playful interaction proves the tool's effectiveness in real-time while injecting personality, making the tech feel accessible rather than intimidating.

Platform Signals & Algorithmic Push

From an Instagram algorithm perspective, this Reel is perfectly engineered. The first three seconds feature rapid visual changes (the split screen) and bold text, preventing the user from scrolling past. The pacing is relentless; there is no dead air. However, the biggest algorithmic driver is the CTA: "Comment 'AI' to try this." This strategy shifts the viewer from passive consumption to active participation. With over 1,450 comments, the platform receives a massive signal that this content is highly engaging, prompting it to push the video to a wider Lookalike audience on the Explore page.

5 Testable Viral Hypotheses

The Multi-Panel Hook: Evidence: The 0-3 second mark uses a 3-way split screen. Mechanism: High visual density overwhelms the urge to scroll, forcing the viewer to pause and process the image. Replication: Start your next tutorial with a split-screen showing the "before," "during," and "after" simultaneously.
Conversational Proof: Evidence: The creator's avatars talk to each other to explain the value prop. Mechanism: Dialogue is inherently more engaging than monologue; it demonstrates the product's capability (lip-sync) while delivering the marketing message. Replication: Use the tool you are promoting to deliver the hook of the video itself.
The "Over-the-Shoulder" Transition: Evidence: At 00:16, the cinematic visuals drop away to reveal a standard screen recording. Mechanism: This grounds the "magic" in reality, proving it's not just VFX trickery but a usable software interface, building trust. Replication: Always follow up high-end results with the raw, unpolished UI steps to achieve them.
Frictionless Engagement Bait: Evidence: "Type AI in the comments and I'll send you a link." Mechanism: Auto-DM tools (like ManyChat) turn comments into a lead generation engine while simultaneously boosting the post's algorithmic ranking. Replication: Replace "Link in bio" with a specific keyword comment CTA in your next three posts.
Consistent Persona Anchoring: Evidence: Despite changing environments and ages, the creator always wears a similar cap and maintains his facial hair style. Mechanism: Visual consistency helps the viewer track the identity of the subject through rapid cuts and AI transformations. Replication: When using AI avatars, keep one signature accessory or color palette constant across all generations.

4. How to Recreate: Step-by-Step Tutorial

Step 1: Define Your Visual Anchors

Before generating images, decide on 1-2 visual anchors that will make your AI avatars recognizable as "you." In this video, it's the creator's beard and the baseball cap. Take a clear, well-lit reference photo of yourself to use as an image prompt in tools like Midjourney.

Step 2: Generate Your Avatar Scenes

Use an AI image generator (Midjourney v6 is recommended for photorealism) to create your base images. Use prompts that specify cinematic lighting and distinct environments. Example: "Cinematic portrait of a 30-year-old man with a short beard wearing a baseball cap, inside a futuristic spaceship looking out a circular window, cool blue lighting, photorealistic --ar 9:16 --cref [URL to your photo]".

Step 3: Script the "Clone" Dialogue

Write a script where different versions of your avatar converse. Keep it punchy. Assign specific lines to specific generated images to create a logical flow (e.g., Avatar A asks a question, Avatar B answers).

Step 4: Record Your Audio Assets

Record the voiceover for each avatar. You can use your natural voice, or slightly alter your pitch/tone for different characters. Export these as separate, high-quality WAV or MP3 files.

Step 5: Animate in HeyGen

Log into HeyGen (or a similar AI video platform). Navigate to the "Photo to Video" section. Upload your first Midjourney image and the corresponding audio clip.

Step 6: Apply Custom Motion Prompts

To avoid the "stiff AI" look, utilize custom motion settings. As shown in the video, type commands like "Hand gestures" or "Subtle head nods" into the advanced settings to give the avatar natural micro-movements.

Step 7: Record the UI Walkthrough (A-Roll)

Set up your camera or webcam to record yourself explaining the process. Use screen recording software (like OBS or QuickTime) to capture your screen while you navigate the HeyGen interface. Wear a consistent outfit (like the brown hoodie) for this segment.

Step 8: Edit and Assemble

Bring all assets into your NLE (Premiere Pro, CapCut, etc.). Build the 3-panel split screen for the hook. Cut the AI avatar clips together so the dialogue flows naturally. Overlay your A-roll camera feed as a Picture-in-Picture over the screen recording.

Step 9: Add Dynamic Subtitles

Use auto-captioning tools to add bold, easy-to-read subtitles. Highlight key words (like "Lip Sync" or "Storytelling") with different colors or pop-in animations to retain attention.

Step 10: Implement the Auto-DM CTA

Set up an automation using a tool like ManyChat. Create a flow that triggers when someone comments your chosen keyword (e.g., "AI"), automatically sending them a direct message with your affiliate link or tutorial guide.

5. Growth Playbook: Hooks, Captions & Tags

3 Ready-to-Use Opening Hooks

"I just found the ultimate cheat code for faceless channels, and it’s terrifyingly good."
"Stop recording talking head videos until you’ve seen what this AI can do."
"How I cloned myself into 5 different cinematic scenes in under 10 minutes."

4 Caption Templates

The Direct Value Drop: "This AI tool is a massive unlock for storytelling. 🤯 You can now lip-sync your voice onto ANY generated image with perfect accuracy. Want the exact workflow I used? Comment 'TUTORIAL' below and I’ll DM you the step-by-step guide."
The Time-Saver Angle: "Spent 5 hours filming yesterday? I made this entire video in 15 minutes without leaving my desk. ⏱️ AI avatars are getting too realistic. Drop a '🤖' in the comments if you want the link to try it for free."
The Creator Secret: "The biggest creators aren't telling you about this workflow. By combining Midjourney with this lip-sync tech, you can create Hollywood-level scenes on an indie budget. Comment 'LINK' and I'll send you my exact prompt formula."
The Question & Hook: "Is this the end of traditional filming? 🎥 When you can turn a single photo into a fully expressive video, the game changes completely. What would you create with this? Let me know below, or comment 'AI' to test the tool yourself."

Hashtag Strategy

Broad (Algorithm Categorization): #ArtificialIntelligence #ContentCreation #VideoEditing #CreatorEconomy (Tells the platform the general industry of your video).
Mid-Tier (Target Audience): #AIToolsForCreators #VideoMarketingTips #FacelessChannel #HeyGenTutorial (Targets creators actively looking for workflow improvements).
Niche Long-Tail (Search Intent): #AILipSync #PhotoToVideoAI #MidjourneyToVideo #AIStorytelling (Captures users searching for specific technical solutions shown in the video).

6. Frequently Asked Questions

What tools make it look the most similar to this video?

Use Midjourney v6 for generating the base images with a character reference (--cref), and HeyGen for the photo-to-video lip-syncing.

What are the 3 most important words in the image prompt?

"Cinematic," "Photorealistic," and specific lighting cues like "Motivated light" or "Rim lighting."

Why does my generated face look inconsistent across scenes?

You need to use a consistent base image as a character reference in your image generator, and maintain key visual anchors like a specific hat or glasses.

How can I avoid making the lip-sync look like stiff AI?

Utilize the "Custom motion prompt" feature in HeyGen (e.g., typing "hand gestures" or "subtle nods") to add secondary motion beyond just the lips.

Is it easier to go viral on Instagram or TikTok with this type of content?

Currently, Instagram Reels heavily rewards the "Comment [Keyword]" auto-DM strategy used here, making it highly effective for engagement farming on that platform.

How should I properly disclose AI use for this type of content?

Use platform-specific AI disclosure labels (like Instagram's "Made with AI" tag) and clearly state it in the first line of your caption to maintain audience trust.