Why keep the creator visible throughout the reel?

The persistent talking-head frame builds trust and helps viewers follow the workflow while the upper panel delivers proof through changing examples.

Why are Kling 3.0 and Eleven SFX shown near the end?

They signal that the pipeline can continue from still-image generation into animation and sound design for a more complete final asset.

0:00 / 0:00

Simone Ferretti

Q: What is the core teaching point of this video?

The video shows how one portrait reference can be used as the starting point for many different AI-generated scenes inside a creator workflow.

@sferro21

INSTAGRAM · 2026-03-24Source

Remix This

Recreate with Kling 3

Make your own AI viral video

Prompt

GLOBAL LOCK: vertical 9:16 tutorial reel with split-screen social layout, top section shows screen-recorded AI workflow and generated outputs, bottom section shows the male creator speaking into a black desk microphone under warm moody lighting. Keep the same reference face for the generated subject across all example scenes: handsome young adult man, short dark brown hair parted to one side, defined jawline, light skin, black T-shirt in the reference image. Maintain character consistency across every generated scene unless the example intentionally shows a different scenario with the same identity source.

MASTER INTENT: create a one-minute creator tutorial demonstrating how a single portrait reference can be transformed into multiple cinematic scenarios using an AI workflow. The reel alternates between the creator reacting in the lower half and interface or output examples in the upper half, with quick cuts and clear social-native pacing.

00:00:00-00:00:06
Open with a striking generated still of the reference man holding a camera with flash in a bathroom mirror or indoor space. Under that image, show the creator reacting dramatically on microphone. Include a short overlay feeling like “OMG” to emphasize the result. Keep the interface visible below the still, showing a timeline or node-style creation workspace.

00:00:06-00:00:12
Show the AI workspace more clearly. Display the original portrait reference on the left and several generated scenario thumbnails on the right. The creator continues reacting in the lower panel, looking up toward the content as if explaining the process.

00:00:12-00:00:20
Cycle through output example one: a roaring lion close-up at night street lighting, preserving cinematic sharpness and dramatic frontal composition. Then switch to output example two: the same identity mapped into a bed scene with a person lying under white sheets in a softly lit room. Keep the tutorial split-screen format intact.

00:00:20-00:00:32
Continue with more scenario swaps in the upper half: a stylish woman portrait on a bed via the same workflow panel, then a pool-party or luxury-nightlife scene with energetic water splashes and bright point lights, then a couple or nightlife glamour scene with warm party lighting. The lower-half creator remains constant, speaking quickly and gesturing toward the examples.

00:00:32-00:00:45
Reveal the interface section labeled “Flows” and “vegas,” with a large reference image panel. Emphasize that the workflow begins from one clean portrait and branches into different scenes. Keep the dark UI minimal, premium, and readable.

00:00:45-00:00:55
Show additional generated outputs: gym or muscular close-up imagery, two men standing together in nightlife or event lighting, then a custom car or tuner-car scene with cinematic street reflections. Every example should feel like a new prompt branch derived from one identity source.

00:00:55-00:00:59.9
Close on the creator smiling in the lower panel while the upper panel shows the final tool stack: Kling 3.0 Startframe 1080p 4s and Eleven SFX sound effects options, suggesting the workflow goes from still reference to animated output and audio polish.

CAMERA: static creator camera in the lower half, screen-recording capture in the upper half, fast tutorial edit cadence, no shaky handheld.

LIGHTING: warm amber backlight and soft front fill on the creator; top-panel outputs vary by scene but stay high contrast and cinematic; UI remains dark mode.

GRADE: premium creator-education look, dark background, crisp subject separation, vivid generated scene colors, no washed-out tutorial screen.

MOTION: mostly interface cuts and on-screen swaps, subtle cursor or timeline transitions, creator head and hand movement in the lower frame.

SPEECH PACK: male creator voice, excited but controlled, explaining how to start from one reference image and generate many different scenes, mentioning tools like Flows, Kling 3.0 startframe, and Eleven SFX for final polish.

NEGATIVE PROMPT: UI text corruption, split-screen misalignment, identity drift in generated subject, broken anatomy in examples, inconsistent creator framing, muddy screen capture, unreadable tool labels, oversaturated party scenes, deformed vehicles, duplicate faces, subtitles, watermark, random logos.

How sferro21 Made This AI Avatar Scene Swaps Tutorial with Flows, Vegas, Kling, and Eleven SFX — and How to Recreate It

This Reel is a creator-education format built around one simple promise: start from one portrait reference and branch it into many different AI-generated scenes. Simone Ferretti uses a classic social teaching structure. The bottom half of the frame holds his talking-head reaction shot with a desk microphone and warm backlight. The top half cycles through a dark creative interface and multiple generated scenarios, including a camera-holding portrait, a lion close-up, a bed scene, nightlife imagery, and a tuner-car example.

What makes this post strong is the density of proof. Instead of showing one output and stopping, the reel keeps stacking examples so viewers understand the workflow has range. The creator is present the whole time, which turns the clip from a random AI montage into a guided recommendation.

What happens in the first 0-3 seconds

The video opens with an immediately impressive result: a sharp cinematic image of a man holding a camera with flash. That result sits in the upper section of the frame while the creator reacts dramatically below. The hook is visual, not theoretical. Before any full explanation lands, the viewer already knows the tool can produce polished character imagery from a reference face.

Shot-by-shot breakdown

00:00 to 00:06 introduces the split-screen format and the first standout result. 00:06 to 00:12 exposes more of the dark workflow interface, where the original portrait and several generated scenario variations appear together. 00:12 to 00:20 adds a lion shot and a bed scene, signaling that the system can move the same reference identity into drastically different contexts. 00:20 to 00:32 expands the gallery with more glamor, nightlife, and poolside outputs. 00:32 to 00:45 shows the interface labels more clearly, including tabs such as Flows and vegas, while reinforcing the reference-image-first workflow. 00:45 to 00:55 pushes into more examples like a gym-style close-up, group or event imagery, and a cinematic car scene. 00:55 to 00:59.9 finishes by showing the downstream stack, including Kling 3.0 Startframe 1080p 4s and Eleven SFX, which implies the creator is moving from still generation into animation and sound design.

Why this video works as a growth page

This reel succeeds because every five to ten seconds it resets curiosity with a new proof point. The creator never leaves the screen, so trust is continuous. At the same time, the upper panel keeps introducing more ambitious examples, which gives the workflow a feeling of depth. The audience is not being asked to imagine what the tool can do. They are being shown output after output with a consistent explanatory frame around it.

Visual style breakdown

The creator panel is intentionally stable: dark background, warm orange backlight, soft face illumination, white knit sweater, black microphone. That consistency creates a reliable base layer. The upper panel is the experimentation zone. Here the reel jumps between sharply lit portraits, animals, interiors, nightlife scenes, event frames, and cars. Because the UI itself stays dark and minimal, the visual variety never becomes messy. The contrast between stable presenter and changing outputs is the main design system of the post.

Prompt reconstruction notes

The prompt logic begins with a clean reference portrait. From there, each branch changes the setting, subject action, and cinematic mood while trying to preserve the identity source. Some branches likely use full facial preservation, while others lean more into stylistic reinterpretation. The important production rule is to lock the face and then vary only the environment, wardrobe, action, and camera scenario one branch at a time.

How to rebuild this workflow

First, capture or upload a clean reference portrait with good symmetry and neutral lighting. Second, create a workflow canvas where that image can branch into multiple prompt variations. Third, generate several strongly differentiated scenes: animal-adjacent surrealism, interior storytelling, nightlife glamour, event or group context, and a vehicle setup. Fourth, keep a talking-head recording active in a lower panel so the audience can track the narrative. Fifth, when still-image results are ready, push the strongest one into a tool like Kling 3.0 Startframe for motion and then add sound design with an effects layer such as Eleven SFX.

Replaceable variables

You can swap the male portrait for a female portrait, creator selfie, fashion headshot, or character render. You can change the example branches to sports, travel, fitness, fantasy, or product-ad scenarios. You can also swap the talking-head performance style from excited reaction to calm step-by-step explanation. The crucial thing is not the specific scenes. It is the reference-to-many-scenes structure.

Common failure cases

The first failure is identity collapse, where the person changes too much from scene to scene and the workflow loses credibility. The second is interface clutter, where too many thumbnails or tiny labels make the upper panel unreadable on mobile. The third is poor creator framing in the lower half; if the speaker is too dark, too small, or badly cropped, trust drops. Another common failure is mixing too many unrelated styles at once. This reel stays coherent because every scene still feels like a cinematic AI output demonstration, even when the content changes drastically.

FAQ

What is the core teaching point of this video?

It teaches that one portrait reference can seed many different AI scenes when the workflow is structured well and the examples are presented clearly.

Why keep the creator visible for almost the whole reel?

Because the talking head adds trust, pacing, and explanation while the upper panel handles visual proof.

Why mention Kling 3.0 and Eleven SFX at the end?

Those tools imply the workflow does not stop at still images. It extends into motion and sound, which makes the tutorial feel more complete and more actionable.