0:00 / 0:00

Comment “AI” for the 🍌 Pro Prompt👇 Finally! I’ve been trying to lock in my perfect AI twin for months! Thank me later 🙋‍♀️ Yes - of course you can just use a selfie as a reference for your image generator but the output isn’t consistent. It’s better to lock in an AI version of you that actually looks like you. That’s what I’ve been able to do with Nano Banana Pro and this optimized prompt. Just saying - new things are possible with Google’s Nano Banana Pro. #aitwin #nanobananapro #nanobananaproprompts

Gunhild Johanne Reumert

@shedoesai · Digital creator

INSTAGRAM · 2025-12-12Source

205likes

610comments

Remix This

Recreate with Runway Gen 4.5

Make your own AI viral video

Prompt

WORKFLOW
A) MISE EN PLACE
1) Segment the video into scenes/shots:
- [00:00–00:05] Single continuous shot (A composite split-screen showing two distinct scenes simultaneously).

2) Extract visual evidence:
- Keyframes: 0s, 2s, 4s.
- Left Panel: Caucasian woman, early 30s, blonde hair in a messy ponytail, wearing a mustard-yellow zip-up bomber jacket over a black top. Sitting outdoors at a cafe, daylight, string lights in the blurred background. She is laughing.
- Right Panel: Same woman, identical hair and wardrobe. Sitting indoors at a bar, warm directional lighting, amber bokeh in the background. She is holding a pint glass of beer and taking a sip.
- Overlays: White sans-serif text at the top and bottom.

3) Extract speech evidence:
- No speech. Audio is likely a trending BGM track.

4) Create an "invariants list" (LOCK THESE):
- visuals: The split-screen layout (left/right). The exact appearance of the woman (facial features, blonde ponytail, mustard jacket, black shirt). The static camera framing (MCU) on both sides. The text overlays.
- speech: N/A.

5) Create a "variables list" (TWEAK THESE):
- visuals: The micro-expressions of the laugh on the left. The liquid movement inside the beer glass on the right. The subtle background motion (patrons, bokeh shimmer).

B) SHOTLIST
- shot_id: 1
- timecode_start: 00:00
- timecode_end: 00:05
- duration: 5s
- framing: Split-screen. Both sides are Medium Close-Up (MCU), eye-level camera.
- lens: 50mm equivalent feel, shallow depth of field, creamy bokeh on both sides.
- camera movement: Static on both sides.
- subject: Left: Laughing naturally, slight shoulder movement. Right: Bringing a beer glass to her lips, taking a sip, maintaining eye contact.
- environment: Left: Outdoor cafe, daytime. Right: Indoor bar, evening.
- lighting: Left: Soft, overcast natural daylight. Right: Warm, moody practical lights, directional key light on the face.
- color grade: Warm overall tint, high contrast between the cool/neutral left and the amber/orange right.
- motion cues: Left: Subtle hair movement in the breeze. Right: Liquid dynamics in the glass.
- SPEECH / AUDIO:
- speech_present: false

C) STYLE BIBLE
- visual_style: Cinematic UGC / High-end lifestyle B-roll.
- camera_signature: Locked-off tripod feel, shallow depth of field to isolate the subject.
- lighting_signature: Motivated lighting (natural outdoors vs. practical indoors).
- grade_signature: Warm, filmic, rich skin tones, vibrant mustard yellow.
- texture_signature: Photorealistic, sharp subject with soft, pleasing background blur.
- pacing_signature: Slow, deliberate motion suitable for looping.

D) PROMPT SYNTHESIS

MASTER PROMPT
GLOBAL LOCK: A vertical 9:16 split-screen video divided exactly down the middle. On both sides, the exact same subject is featured: a 30-year-old Caucasian woman with blonde hair pulled back into a messy ponytail, wearing a distinctive mustard-yellow zip-up bomber jacket over a black t-shirt. The camera is static on both sides, framed as a Medium Close-Up (MCU) with a shallow depth of field. The top of the video features bold white sans-serif text: "STEP 5: ANIMATE YOUR VIDEOS AS B-ROLL OR TALKING HEAD VIDEOS". The bottom features text: "Animate using Google Veo 3.1 for perfect lip sync or Kling 2.6 Pro for smooth cinematic clips."

[00:00–00:05] The video plays as a continuous 5-second loop.
ON THE LEFT SIDE: The woman is sitting at an outdoor cafe table during the day. The lighting is soft, natural daylight. The background is blurred, showing outdoor seating and string lights. She is looking directly at the camera, smiling broadly and laughing naturally, with subtle, realistic head and shoulder movements.
ON THE RIGHT SIDE: The woman is sitting at an indoor bar. The lighting is warm, moody, and directional, casting a soft glow on her face. The background features rich, amber bokeh from pendant lights. She is holding a clear pint glass filled with beer. She slowly brings the glass to her mouth, takes a sip, and lowers it slightly, maintaining steady eye contact with the camera throughout the motion. The liquid in the glass moves realistically. Both sides play simultaneously in a photorealistic, cinematic style.

NEGATIVE PROMPT
morphing, warping, inconsistent facial features, changing clothes, different person on left and right, bad anatomy, extra fingers, distorted glass, floating objects, unnatural lighting, plastic skin texture, jittery motion, flickering text, spelling errors in text overlays.

SPEECH PACK
No speech present in the reference video.

Why shedoesai's Nano Banana Pro AI Twin Consistency Went Viral — and the Formula Behind It

This 5-second Instagram Reel is a masterclass in demonstrating AI character consistency. It features a side-by-side split-screen of a hyper-realistic AI-generated woman—sporting a signature mustard-yellow bomber jacket and blonde ponytail—seamlessly maintaining her identity across two entirely different lighting scenarios and actions (laughing outdoors vs. sipping a beer indoors). By explicitly calling out trending tools like "Google Veo 3.1" and "Kling 2.6 Pro" in the text overlays, and framing it as "STEP 5" of a larger tutorial, it perfectly targets the pain points of indie creators trying to build cinematic, consistent AI twins for B-roll.

2. What You're Seeing

The video is a static, vertical composition divided into three main horizontal zones: a top text header, a central split-screen video comparison, and a bottom text footer. The aesthetic is highly cinematic, mimicking high-end UGC (User Generated Content) or lifestyle commercial B-roll.

Visual & Compositional Details

The Subject: A Caucasian woman in her early 30s, blonde hair pulled back into a messy ponytail, wearing a distinct mustard-yellow zip-up bomber jacket over a black top. Her facial features and wardrobe remain perfectly consistent across both clips.
Left Panel (The Outdoor Shot): The subject is sitting at an outdoor cafe table during the day. The lighting is natural and slightly overcast. She is looking directly at the camera, laughing naturally with subtle head movements. The background features blurred patrons and string lights, creating a shallow depth of field.
Right Panel (The Indoor Shot): The subject is at an indoor bar. The lighting is warm, moody, and directional, highlighting the side of her face. She holds a glass of beer, brings it to her lips, and takes a sip while maintaining eye contact with the lens. The background features beautiful, warm bokeh from pendant lights.
Text Overlays: Bold, white, sans-serif typography. The top reads: "STEP 5: ANIMATE YOUR VIDEOS AS B-ROLL OR TALKING HEAD VIDEOS". The bottom reads: "Animate using Google Veo 3.1 for perfect lip sync or Kling 2.6 Pro for smooth cinematic clips."
Color & Texture: The overall grade leans warm, emphasizing the yellow of the jacket and the amber of the beer. The texture is photorealistic with a slight filmic softness, avoiding the overly sharp "plastic" look common in early AI generations.

Shot-by-Shot Breakdown

Time Range	Visual Content	Shot Language	Lighting & Color Tone	Viewer Intent
00:00 - 00:05	Split-screen: Left side laughing outdoors; Right side sipping beer indoors. Top and bottom text overlays remain static.	Medium Close-Up (MCU) on both sides. Static camera mimicking a tripod setup. 50mm focal length feel with shallow depth of field.	Left: Natural daylight, neutral tones. Right: Warm indoor practical lights, amber/orange tones. High contrast between the two environments.	Hook the viewer immediately with the visual proof of consistency. The contrasting lighting proves the AI model's robustness.

3. Why It Went Viral (Breakdown of the Viral Mechanism)

Topic Selection & Audience Psychology

This video hits the bullseye for a massive, current pain point in the creator economy: AI character consistency. Every small creator wants to use AI to generate B-roll or faceless content, but keeping the same "actor" across different scenes has historically been a nightmare. By showing a side-by-side comparison of the exact same character in vastly different lighting conditions, the creator provides immediate, undeniable visual proof that the problem has been solved. This taps into the psychological trigger of "seeing is believing."

Furthermore, the topic leverages the "tool drop" trend. By specifically naming "Google Veo 3.1" and "Kling 2.6 Pro," the video transitions from mere inspiration to actionable utility. Creators are constantly hunting for the "secret sauce" or the exact prompt/tool combination. Naming the specific versions of the tools triggers an immediate "save for later" response from viewers who want to test those exact platforms.

Platform Signals & The Loop Effect

From an algorithmic perspective, this video is engineered for high retention. Clocking in at just 5 seconds, it is incredibly short. However, the screen is dense with information: the viewer has to read the top text, read the bottom text, look at the left video, and then look at the right video. By the time a user processes all this information, the video has likely looped 2 or 3 times. This artificially inflates the Average View Duration (AVD) and completion rate, signaling to the Instagram algorithm that the content is highly engaging, prompting it to push the Reel to a broader audience.

5 Testable Viral Hypotheses

The "Incomplete Series" Hook: Evidence: The text starts with "STEP 5". Mechanism: This implies the existence of Steps 1 through 4. Viewers who find this valuable will immediately click through to the creator's profile to find the rest of the tutorial, driving profile visits and potential follows. Replication: Frame your standalone tips as part of a larger, numbered sequence to encourage profile exploration.
The Visual Proof Split-Screen: Evidence: The side-by-side layout of the same character in different lighting. Mechanism: It reduces the "explanation cost." Instead of talking about how consistent the AI is, it just shows it. This builds instant authority. Replication: Use split-screens to show "Before/After" or "Prompt A vs. Prompt B" to provide immediate visual evidence of your claim.
The Specific Tool Drop: Evidence: Mentioning "Veo 3.1" and "Kling 2.6 Pro". Mechanism: Generic advice gets scrolled past; specific, version-level tool recommendations get saved. It makes the content feel like insider knowledge. Replication: Always name the exact software, version, or specific prompt parameter you used in your on-screen text.
The Micro-Action Focus: Evidence: The right clip shows the complex action of drinking from a glass. Mechanism: AI historically struggles with object interaction (like hands holding glasses). Showing a successful generation of this specific action proves the creator's high skill level, prompting viewers to want their exact prompt. Replication: Showcase AI generations that overcome known limitations (e.g., eating, drinking, complex hand movements) to establish expertise.
The Auto-DM CTA: Evidence: The caption says "Comment 'AI' for the Pro Prompt". Mechanism: This forces engagement. High comment velocity in the first hour of posting is a massive positive signal to the algorithm. Replication: Use tools like ManyChat to deliver value (prompts, guides) only after the user comments a specific keyword.

4. How to Recreate (Replication Tutorial: From 0 to 1)

You don't need to generate a split-screen directly in the AI tool. The secret is generating two separate, highly consistent clips and combining them in an editor. Here is the step-by-step workflow.

Step-by-Step Checklist

Define Your Character Sheet (The Anchor): Before touching video, you need a consistent base image. Use Midjourney or Flux to generate your character. Prompt example: "A 30-year-old Caucasian woman, blonde hair in a messy ponytail, wearing a mustard-yellow zip-up bomber jacket over a black t-shirt, neutral expression, white background."
Generate Scene References: Using the same character prompt and a tool like Midjourney's Character Reference (`--cref`), generate two static images of your character in the desired environments. Image A: "Sitting at an outdoor cafe, laughing." Image B: "Sitting at a moody indoor bar, holding a pint of beer."
Choose Your Video Generator: As the video suggests, use Kling AI (great for cinematic motion) or Google Veo (if you have access, great for realism). We will use the Image-to-Video feature.
Animate Clip A (The Laugh): Upload Image A to Kling. Prompt: "Cinematic shot, shallow depth of field. The woman looks at the camera and laughs naturally, subtle shoulder movement, outdoor cafe background with slight breeze." Set motion brush on her face if needed.
Animate Clip B (The Drink): Upload Image B to Kling. Prompt: "Cinematic shot, warm indoor lighting. The woman brings the glass of beer to her mouth and takes a sip, maintaining eye contact with the camera. Smooth, realistic motion."
Assemble in CapCut: Open CapCut and create a 9:16 vertical project. Import both generated videos. Use the "Mask" or "Crop" tool to place Clip A on the left half and Clip B on the right half.
Add the Text Overlays: Add a black background layer behind the videos if necessary to make the text pop. Add bold, white, sans-serif text at the top ("STEP X: [Your Hook]") and bottom (mentioning the specific tools used).
Set Up the Automation: Before posting, set up a ManyChat flow. Create a trigger for a specific keyword (e.g., "TWIN") that automatically DMs the user your exact Midjourney and Kling prompts.

5. Growth Playbook (Distribution & Scaling Strategy)

3 Ready-to-Use Opening Hooks

"Stop struggling with inconsistent AI characters. Here is the exact workflow I use."
"How I created my perfect AI twin in 5 minutes (using these 2 tools)."
"Everyone is using AI B-roll, but 99% of it looks fake. Here is the secret to cinematic consistency."

4 Caption Templates

The Tutorial Tease: "Finally locked in my perfect AI twin! 👯‍♀️ Yes, you can use a selfie, but the output is never this consistent across different lighting. I used a specific workflow to get this result. Comment 'TWIN' and I’ll DM you the exact step-by-step guide and prompts! 👇"
The Tool Comparison: "Testing the limits of AI consistency. 🤯 Left: Outdoor daylight. Right: Moody bar lighting. The jacket and face stay 100% locked. I used [Tool A] for the base and [Tool B] for the motion. Want the prompt? Comment 'PROMPT' below!"
The Series Continuation: "Welcome to Step 5 of building a faceless AI channel. Today we are tackling the hardest part: Character Consistency for B-roll. 🎬 If you missed Steps 1-4, check my profile. Comment 'GUIDE' to get the full PDF sent to your inbox."
The Behind-the-Scenes: "It took me months of tweaking to get this mustard jacket to stop morphing into a sweater. 😅 The secret is in the base image generation. Drop a 💛 in the comments and I'll send you my exact Midjourney character sheet prompt."

Hashtag Strategy

Don't spam 30 tags. Use a targeted 3-tier approach to hit different algorithmic buckets.

Broad (Reach): #AIVideo, #CreatorEconomy, #VideoEditing (These tell the algorithm the general category of your content).
Mid-Tier (Target Audience): #FacelessChannel, #AITwin, #ContentCreationTips (These target creators actively looking for workflows and solutions).
Niche/Long-Tail (Search Intent): #KlingAITutorial, #ConsistentAICharacter, #AIFilmmaking (These capture high-intent users searching for specific technical solutions).

6. FAQ (Common Troubleshooting)

How do I keep the clothing (like the mustard jacket) from changing style?

You must use an image-to-video workflow; generate a highly consistent base image in Midjourney first, and use that as the starting frame for your video generator.

Why does the beer glass warp when she drinks?

AI struggles with object permanence during interaction; keep the motion prompt simple ("takes a small sip") and generate multiple takes, picking the one with the least artifacting.

What are the 3 most important words in the prompt?

For this specific look, the crucial words are "Cinematic," "Consistent," and the specific clothing identifier (e.g., "mustard-yellow bomber jacket").

How did they get the split-screen to sync perfectly?

They didn't generate it as a split-screen; they generated two separate 5-second clips and placed them side-by-side in a video editor like CapCut.

Is it easier to go viral on Instagram or TikTok with this type of content?

Currently, Instagram Reels heavily rewards this "short loop + text heavy + ManyChat CTA" format, making it ideal for tutorial-style AI content.

How should I properly disclose AI use for this type of content?

Use platform-specific AI labels (like Instagram's "Made with AI" tag) and clearly state in your caption or on-screen text that the character is an AI generation.