Why does the face look inconsistent?

This happens when character reference strength is too low.

0:00 / 0:00

Phil Franco | AI Tools Expert 💰

@ai.withphil

INSTAGRAM · 2026-02-18Source

225likes

144comments

Remix This

Recreate with Seedance 2

Make your own AI viral video

Prompt

GLOBAL LOCK:
Subject is a woman in her mid-20s with long, wavy black hair, pale skin tone, and a neutral, intense expression. She wears a black leather biker jacket over a sheer black top and black pants. The environment is a Parisian street at night with the Eiffel Tower in the background. The lighting is low-key, moody blue hour/night tones with wet pavement reflections. The camera language is cinematic, high-fidelity, with smooth tracking and wide-angle perspectives. No speech is present; the focus is on motion and VFX.

[00:00–00:05]
The woman is positioned in the center of a wide Parisian street, initially walking toward the camera. She then performs a sharp turn and begins running away from the camera toward the Eiffel Tower. The camera follows her from a low angle, capturing the movement of her leather jacket and the reflections on the ground. The lighting is cool blue with warm streetlights.

[00:05–00:12]
As the woman approaches the Eiffel Tower, a massive swarm of black birds (crows) erupts from the structure, filling the sky. Dark, smoky, tentacle-like shadows emerge from the edges of the frame and the ground, swirling toward the tower. The woman begins to levitate off the ground, her hair and jacket fluttering as if caught in an upward draft.

[00:12–00:20]
The woman is now suspended high in the air, positioned directly in front of the top section of the Eiffel Tower. Glowing red circular energy rings pulse outward from her body. The dark tentacles continue to swirl around her in the sky. The contrast between the deep blue night sky and the vibrant neon red energy is extreme. The camera slowly zooms out to show the scale.

[00:20–00:30]
A wide cinematic shot of the Eiffel Tower under a dark, stormy sky. Multiple red laser beams shoot out from the top of the tower and from the surrounding ground, crisscrossing the frame. The swarm of black birds continues to circle the tower frantically. The atmosphere is apocalyptic and epic. The camera maintains a steady, wide-angle view of the spectacle.

NEGATIVE PROMPT:
Visual artifacts, distorted anatomy, flickering lights, inconsistent clothing textures, blurry face, cartoonish rendering, low resolution, text, logos, watermark, sudden jumps in motion, robotic movement, messy hair physics, dull colors, lack of reflections.

SPEECH PACK:
(No speech present in this video. The audio consists of cinematic sound effects: wind whooshes, bird screeches, and low-frequency energy hums.)

How ai.withphil Made This AI Video Model Comparison Seedance Kling AI Video — and How to Recreate It

This case study analyzes a high-performing AI video comparison featuring a "Model War" between Seedance 2.0 and Kling 3.0. The video uses a single reference image of a woman in a black leather jacket in front of the Eiffel Tower at night to demonstrate the creative divergence and technical capabilities of different generative engines. With a moody, cinematic aesthetic characterized by deep blues, stark blacks, and vibrant red VFX, the content transitions from a simple walk cycle into a high-octane, superhero-style spectacle. This "A/B test" format is a goldmine for indie creators looking to establish authority in the AI niche while leveraging the visual appeal of iconic landmarks and cinematic lighting.

What You’re Seeing

The video is structured as a vertical stack of three panels. The top panel displays the "Source Image," while the middle and bottom panels show the video outputs from Seedance and Kling, respectively. The subject is a woman with long, wavy dark hair, wearing a classic black leather biker jacket over a sheer black top. The setting is a Parisian street at dusk, with the Eiffel Tower serving as the primary background element.

As the video progresses, the two models diverge significantly: Seedance leans into a dark fantasy/action genre, featuring levitation, red energy rings, and shadowy tentacles. Kling maintains a more cinematic, grounded realism initially, focusing on movement and bird swarms, before eventually introducing red laser effects. The color palette is dominated by "Blue Hour" tones, with the red energy effects providing a high-contrast visual "pop" that stops the scroll.

Shot-by-Shot Analysis

Time Range	Visual Content	Shot Language	Lighting & Tone	Viewer Intent
00:00–00:05	Woman walking towards camera; Seedance version turns and runs.	Medium Shot (MS), eye-level.	Cool blue dusk, streetlights.	Establish the "Control" vs "Variable" comparison.
00:05–00:12	Black birds swarm the tower; dark smoke/tentacles appear in Seedance.	Wide Shot (WS), low angle.	Moody, increasing shadows.	Escalate the visual stakes; introduce "supernatural" elements.
00:12–00:20	Woman levitates at the top of the tower; red energy rings pulse.	Extreme Wide Shot (EWS), VFX-heavy.	High contrast: Dark blue vs. Neon Red.	The "Wow" moment; demonstrating model creativity.
00:20–00:30	Eiffel Tower under "attack" by red lasers and bird swarms.	Cinematic Wide, fast-paced cuts.	Apocalyptic, dramatic.	Final payoff; reinforce the "epic" scale of AI generation.

Why It Went Viral: The Comparison Hook

The primary driver of this video's success is the "Model Comparison" framework. By pitting two popular AI tools against each other using the exact same starting point, the creator taps into the "tech review" psychology. Users are naturally inclined to pick a favorite, leading to high comment section engagement as they debate which model performed better. The choice of the Eiffel Tower is a strategic "biological hook"—it's a globally recognized symbol of beauty and romance, which makes the subsequent "dark/apocalyptic" subversion even more jarring and memorable.

From a platform perspective, the vertical stack layout is optimized for mobile viewing, allowing the user to track three different visual streams simultaneously. This increases "visual density," often leading to multiple rewatches as viewers try to catch the details in each panel. The lack of dialogue makes the content globally accessible, removing language barriers and increasing shareability across international AI communities.

5 Testable Viral Hypotheses

The "Fair Test" Hypothesis: Showing the source image and prompt at the top creates a sense of transparency, making the viewer trust the comparison and engage with the results.
The "Genre Subversion" Hypothesis: Starting with a romantic Paris setting and ending with a dark superhero spectacle creates a narrative "arc" that keeps viewers watching until the end.
The "Visual Density" Hypothesis: By stacking three videos, you triple the chances of a viewer seeing something they like, increasing overall retention.
The "Red vs. Blue" Color Theory: The use of complementary colors (cool blue background vs. hot red VFX) is a proven psychological trigger for attention.
The "Tool Tribalism" Hypothesis: Fans of specific AI tools (Kling vs. Luma vs. Runway) will defend their "team" in the comments, boosting the algorithm's reach.

How to Recreate: From 0 to 1

Select a High-Contrast Reference Image: Use a tool like Midjourney to create a cinematic portrait. Ensure the background is an iconic, easily recognizable location.
Define Your "Action" Prompt: Keep the prompt consistent across tools. Example: "Cinematic shot, woman in leather jacket runs toward the Eiffel Tower, black birds swarm the sky, red energy pulses from the tower, dark atmosphere."
Generate in Model A (e.g., Kling): Upload your reference image and use the "Image-to-Video" feature. Set motion intensity to high.
Generate in Model B (e.g., Luma/Runway/Seedance): Use the same image and prompt. Note the differences in how the model interprets "energy" or "movement."
Apply VFX Overlays (Optional): If the models don't provide enough "pop," use CapCut or After Effects to add red laser or energy ring overlays to enhance the comparison.
Create the Vertical Stack: In your video editor, set a 9:16 canvas. Place the source image at the top (33% height), Model A in the middle, and Model B at the bottom.
Sync the Sound: Use "Whoosh" and "Impact" SFX that match the visual beats of the VFX (e.g., when the red rings pulse).
Add Clear Labels: Use bold, sans-serif fonts to label each panel so the viewer immediately understands what they are looking at.

Growth Playbook & Distribution

3 Opening Hook Lines

"Kling 3.0 vs. Seedance 2.0: Which one actually wins?"
"I gave two AI models the same prompt. The results are terrifying."
"Stop using [Model X] until you see what [Model Y] can do."

Caption Template

The Comparison Hook: One image. One prompt. Two completely different worlds. 🤯
Value Point: I tested Seedance 2.0 and Kling 3.0 to see which handles complex VFX and character consistency better.
Engagement Question: Are you Team Seedance (Dark Fantasy) or Team Kling (Cinematic Realism)? Let me know in the comments! 👇
CTA: Follow for more AI model wars and prompt breakdowns. #AIVideo #KlingAI #Seedance

Hashtag Strategy

Broad: #AI #ArtificialIntelligence #DigitalArt #VFX
Mid-tier: #AIVideo #GenerativeAI #KlingAI #RunwayGen3
Niche: #AIComparison #PromptEngineering #IndieCreator #Seedance

Frequently Asked Questions

What tools make it look the most similar?

Using the same "Seed" number and a strong reference image (Image-to-Video) is key to consistency.

What are the 3 most important words in the prompt?

"Cinematic," "High-contrast," and specific motion verbs like "levitating" or "swarming."

Why does the generated face look inconsistent?

This usually happens when the "Character Reference" strength is set too low or the motion is too violent.

How can I avoid making it look like AI?

Add a slight film grain overlay and use realistic sound effects to ground the visuals.

Is it easier to go viral on Instagram or TikTok with this?

Instagram Reels currently favors high-aesthetic cinematic AI content, while TikTok prefers "How-to" breakdowns.