Ai Voice Meme Maker
In voice-led memes, the TTS is often the joke delivery system, not just background audio. This page gathers Alici examples and prompts for AI voice meme workflows where script pacing, recognizable voice style, and visual pairing all work together to make the clip land.
GLOBAL LOCK: preserve a creator-led talking-head tutorial format mixed with vertical phone screen recordings. Keep one young male creator in a backward black cap and dark hoodie speaking directly to camera in a studio setup with a microphone. Intercut iPhone-style screen captures showing ChatGPT/OpenAI image workflow steps, uploaded object photos, prompt entry, and AI video generation screens. Maintain a practical “make from your phone” educational reel structure. No random B-roll, no unrelated tools, no logo overlays beyond app UI already present in the source. Create a 37.8-second social-first AI tutorial reel showing how to turn ordinary phone photos into animated AI character videos. Begin with a hook using a simple hand-held object photo and bold on-screen teaching posture from the creator. Then show phone interfaces: photo selection, ChatGPT or image-tool screens, prompt entry, image transformation results, switching to an AI video tool, uploading the generated image, entering a motion prompt, and generating the final animated output. Use repeated face-cam segments where the creator explains the steps and emphasizes that the workflow can be done from a phone. Include the specific examples visible in the source: tiny object/food photos held in a hand, ChatGPT app icon and mobile interface, typed prompts that turn objects into cute expressive characters, a generated pear-like baby character image, a switch to another AI generation interface, upload and prompt steps for video, and a final generated moving result shown on-screen. Preserve the educational pacing and creator-marketing vibe. SHOT SEGMENTS: [00:00-00:06] Hook with object photos in hand and creator talking-head intro about making AI content from your phone. [00:06-00:14] Mobile screens show ChatGPT / image workflow setup, app screens, and prompt entry. [00:14-00:22] Creator explains the key steps while on-screen phone UI shows prompt refinement and generated object-to-character image outputs. [00:22-00:30] The tutorial switches to an AI video tool, showing upload, prompt, and generation steps from the phone. [00:30-00:37.8] Final result displays the generated animated character clip, while the creator closes with a call to try the workflow. ENVIRONMENT: creator desk/studio face-cam plus crisp mobile screen recordings. CAMERA: direct-to-camera presenter shots alternating with full-screen phone UI captures. LIGHTING: clean creator-studio lighting on face-cam; bright legible phone UI on inserts. MOTION: tutorial pacing, finger taps on phone UI, creator emphasis gestures, no cinematic narrative scenes. NEGATIVE PROMPT: generic AI ad montage, unrelated tools, desktop-only workflow, no phone UI, missing creator face-cam, subtitles replacing the actual visible UI, blurry screens, watermark, logo overlays. SPEECH PACK: creator-to-camera tutorial speech implied, but do not transcribe captions here.
Create a short-form creator tutorial video about how to make cinematic AI clips from simple ideas. The piece should feel like an Instagram Reel or TikTok posted by an AI filmmaking educator, combining direct-to-camera instruction with polished cinematic sample shots and interface cutaways. Use a confident creator host in a dark studio or moody workspace, speaking naturally to camera while explaining a repeatable workflow for generating cinematic AI videos. The pacing should be fast, sharp, and social-first, with frequent visual resets to keep attention high. Open with a strong hook where the creator talks directly to camera and promises to show viewers how to make cinematic AI clips that feel dramatic, polished, and scroll-stopping. Then cut into multiple example shots that look like finished outputs: moody action moments, dramatic close-ups, atmospheric character scenes, and premium-looking cinematic frames. Intercut those examples with prompt panels, tool UI, timeline views, or settings screens so the workflow feels grounded in real AI video creation rather than abstract inspiration. The host should stay visually consistent across talking segments: same person, same wardrobe, same lighting setup, same direct creator-teacher tone. Their performance should feel natural and creator-native, not overly scripted. They should gesture casually, point toward on-screen examples, and deliver the lesson with energetic clarity, like someone used to teaching AI video tricks on social media. The visual design should alternate between two clear modes. Mode one is the tutorial studio setup: dark background, controlled lighting, crisp face detail, shallow depth of field, subtle color accents, and a premium creator-desk atmosphere. Mode two is the cinematic demo footage: dramatic compositions, intentional movement, filmic contrast, moody lighting, and stronger environmental storytelling. Keep cutting between those modes so the audience always sees both the result and the process. Keep the entire piece optimized for vertical video. For talking-head sections, use close-ups and medium close-ups with subtle push-ins or light handheld energy. For the cinematic examples, vary the framing with wides, dramatic close-ups, push-ins, tracking shots, and controlled motion that sells the idea of “cinematic” without becoming chaotic. Everything should feel curated and premium. Lighting is important. The host footage should use flattering key light with soft falloff and a clean but moody creator-studio look. The cinematic sample shots should lean harder into contrast, rim light, atmosphere, practicals, and dramatic highlight control. The overall grade should feel modern, contrasty, and polished, with rich blacks, sharp visual separation, and subtle filmic texture. Include insert shots of prompts, settings, or example workflow screens to reinforce the educational angle. These moments can show how ideas become prompts, how cinematic references are structured, or how the creator chooses scenes and visual style. The UI should feel real and useful, not decorative. The edit should stay fast and social-first: hook, creator explanation, cinematic example, interface proof, another teaching beat, then more examples. Use cuts, punch-ins, overlays, and visual comparison moments so the viewer always feels momentum. The final result should feel like a practical creator tutorial that teaches viewers how to make cinematic AI clips while also showcasing enough premium output to inspire them to try the workflow themselves.
GLOBAL LOCK: vertical 9:16 creator tutorial reel about realistic AI lip sync and talking-head prompting, social-native educational format, fast caption-led pacing, real-looking example clips in multiple environments, sharp white and yellow text overlays, practical creator voice, multiple example subjects but one consistent teaching thesis: specify lip movement, body motion, gesture behavior, and camera stability in the prompt. Must include tool comparison callouts for VEO 3.1, KLING 2.6, and HEYGEN, and must end with a keyword-comment CTA for the prompt structure. [00:00:00-00:00:04] Open on a blonde young woman inside a bright red British-style phone booth, yellow tank top, phone handset to her ear, speaking toward camera with expressive hand movement. Use bold center captions that build the idea of “everyone’s gatekeeping how to do perfect AI lip sync.” Keep the red booth dominant in frame as the high-contrast visual hook. [00:00:04-00:00:07] Cut to a curly blond young man in a close talking-head shot with shallow depth of field. He speaks directly to lens, then the frame shifts to a transit-pole hand close-up showing grip and subtle gesture control. Use captions to introduce the first prompt principle: movement and behavior need to be specified, not left vague. [00:00:07-00:00:11] Transition through prompt-structure overlays on blue and dark panels. Show dense prompt snippets with sections for mouth movement, body movement, gesture behavior, eye contact, and realism settings. Insert a futuristic train nose shot as a visual bridge while captions stress that precise motion rules are crucial. [00:00:11-00:00:17] Move into transit examples. Show a subway platform rushing by, then medium crops of the male speaker’s torso and hands to illustrate natural hand gestures. Cut to a blonde woman seated on a subway train in a yellow top, talking with restrained but realistic body motion. The captions should explain that natural hand gestures and believable body behavior must be described in the prompt. [00:00:17-00:00:22] Start the model comparison section. Display a woman with long braids and glasses in a plain indoor room while large captions introduce VEO 3.1. Then cut to a woman with long straight hair and glasses in a subway-like station setting for KLING 2.6. Use repeated talking-head clips to compare how each model handles short spoken segments. [00:00:22-00:00:29] Continue the comparison: VEO 3.1 is positioned as strong for longer clips; KLING 2.6 is labeled as accurate for short speech and holding together on brief shots; HEYGEN appears in a clean indoor talking-head example as the strongest option for subtler avatar-style delivery. Keep the subjects facing camera, mouth motion readable, and captions explicit about the use case of each tool. [00:00:29-00:00:35] Return to the red phone booth environment and crop tighter on the woman’s torso, handset, and booth details. Overlay the final CTA in staged caption chunks telling viewers to comment "LIPS" and the creator will send the prompt structure. Hold the ending long enough for screenshot and keyword memorability. CAMERA: quick social cuts between static or gently handheld talking-head shots, occasional close crops on hands and props, brief full-frame prompt cards, no cinematic camera choreography. LIGHTING: bright frontal booth lighting in the red phone booth, neutral soft lighting for indoor talking heads, cool transit lighting in station and subway clips, readable high-contrast graphics on blue and dark text screens. GRADE: crisp social contrast, saturated red booth tones, natural skin tones, clear text overlays, slightly stylized but still realistic creator aesthetic. MOTION: lip movement must look synced and natural, hand gestures subtle and human, body movement controlled rather than stiff, transit backgrounds add motion energy without distracting from faces. SPEECH PACK: upbeat creator-educator narration explaining how to prompt for better AI lip sync. Key points: people are gatekeeping this, prompt structure matters, specify mouth and body behavior, and choose tools based on clip length and realism needs. Phone-mic style direct audio, dry mix, punchy cadence. NEGATIVE PROMPT: music-video montage, exaggerated dance movement, frozen hands, puppet-like body motion, sloppy lip sync, heavy cinematic blur, fantasy environment, unreadable prompt cards, tool logos without explanation, chaotic transitions, off-topic b-roll, overacted gestures, multiple overlapping captions.
WORKFLOW A) MISE EN PLACE 1) Segment the video into scenes/shots: - [00:00–00:05] Single continuous shot (A composite split-screen showing two distinct scenes simultaneously). 2) Extract visual evidence: - Keyframes: 0s, 2s, 4s. - Left Panel: Caucasian woman, early 30s, blonde hair in a messy ponytail, wearing a mustard-yellow zip-up bomber jacket over a black top. Sitting outdoors at a cafe, daylight, string lights in the blurred background. She is laughing. - Right Panel: Same woman, identical hair and wardrobe. Sitting indoors at a bar, warm directional lighting, amber bokeh in the background. She is holding a pint glass of beer and taking a sip. - Overlays: White sans-serif text at the top and bottom. 3) Extract speech evidence: - No speech. Audio is likely a trending BGM track. 4) Create an "invariants list" (LOCK THESE): - visuals: The split-screen layout (left/right). The exact appearance of the woman (facial features, blonde ponytail, mustard jacket, black shirt). The static camera framing (MCU) on both sides. The text overlays. - speech: N/A. 5) Create a "variables list" (TWEAK THESE): - visuals: The micro-expressions of the laugh on the left. The liquid movement inside the beer glass on the right. The subtle background motion (patrons, bokeh shimmer). B) SHOTLIST - shot_id: 1 - timecode_start: 00:00 - timecode_end: 00:05 - duration: 5s - framing: Split-screen. Both sides are Medium Close-Up (MCU), eye-level camera. - lens: 50mm equivalent feel, shallow depth of field, creamy bokeh on both sides. - camera movement: Static on both sides. - subject: Left: Laughing naturally, slight shoulder movement. Right: Bringing a beer glass to her lips, taking a sip, maintaining eye contact. - environment: Left: Outdoor cafe, daytime. Right: Indoor bar, evening. - lighting: Left: Soft, overcast natural daylight. Right: Warm, moody practical lights, directional key light on the face. - color grade: Warm overall tint, high contrast between the cool/neutral left and the amber/orange right. - motion cues: Left: Subtle hair movement in the breeze. Right: Liquid dynamics in the glass. - SPEECH / AUDIO: - speech_present: false C) STYLE BIBLE - visual_style: Cinematic UGC / High-end lifestyle B-roll. - camera_signature: Locked-off tripod feel, shallow depth of field to isolate the subject. - lighting_signature: Motivated lighting (natural outdoors vs. practical indoors). - grade_signature: Warm, filmic, rich skin tones, vibrant mustard yellow. - texture_signature: Photorealistic, sharp subject with soft, pleasing background blur. - pacing_signature: Slow, deliberate motion suitable for looping. D) PROMPT SYNTHESIS MASTER PROMPT GLOBAL LOCK: A vertical 9:16 split-screen video divided exactly down the middle. On both sides, the exact same subject is featured: a 30-year-old Caucasian woman with blonde hair pulled back into a messy ponytail, wearing a distinctive mustard-yellow zip-up bomber jacket over a black t-shirt. The camera is static on both sides, framed as a Medium Close-Up (MCU) with a shallow depth of field. The top of the video features bold white sans-serif text: "STEP 5: ANIMATE YOUR VIDEOS AS B-ROLL OR TALKING HEAD VIDEOS". The bottom features text: "Animate using Google Veo 3.1 for perfect lip sync or Kling 2.6 Pro for smooth cinematic clips." [00:00–00:05] The video plays as a continuous 5-second loop. ON THE LEFT SIDE: The woman is sitting at an outdoor cafe table during the day. The lighting is soft, natural daylight. The background is blurred, showing outdoor seating and string lights. She is looking directly at the camera, smiling broadly and laughing naturally, with subtle, realistic head and shoulder movements. ON THE RIGHT SIDE: The woman is sitting at an indoor bar. The lighting is warm, moody, and directional, casting a soft glow on her face. The background features rich, amber bokeh from pendant lights. She is holding a clear pint glass filled with beer. She slowly brings the glass to her mouth, takes a sip, and lowers it slightly, maintaining steady eye contact with the camera throughout the motion. The liquid in the glass moves realistically. Both sides play simultaneously in a photorealistic, cinematic style. NEGATIVE PROMPT morphing, warping, inconsistent facial features, changing clothes, different person on left and right, bad anatomy, extra fingers, distorted glass, floating objects, unnatural lighting, plastic skin texture, jittery motion, flickering text, spelling errors in text overlays. SPEECH PACK No speech present in the reference video.
GLOBAL LOCK: Subject is a consistent male figure based on the likeness of David Tennant but modified per shot. Apparent ethnicity: Caucasian. Skin tone: Pale green (in Shrek shots) or fair (in human shots). Age range: 40s. Hair: Bald (Shrek) or vibrant red (South Park shot). Environment: High-fidelity photorealistic textures, cinematic lighting, 8k resolution. Speech: High emotional intensity, synchronized lip movements. [00:00–00:04] Subject: David Tennant as Shrek, pale green skin, trumpet-shaped ogre ears, bald head, slight smile. Wardrobe: Tattered beige linen tunic, worn brown leather vest. Environment: Seated inside a primitive wooden outhouse made of rough-hewn planks and rope. Background is a misty, moss-covered swamp with cattails. Camera: Starts as a wide shot, slowly dollying in to a medium close-up. Lighting: Soft, diffused natural light filtered through forest canopy. Speech: "Kling 3.0 is insane!" Warm, energetic tone. High lip-sync strictness. [00:05–00:11] Subject: Caucasian man with vibrant red hair and a thick, bushy red horseshoe mustache. Intense, angry facial expression, furrowed brows. Wardrobe: Green and black plaid flannel shirt. Environment: Standing behind a dark wood podium. Background is a plain grey wall with a US flag and a blue state seal. Camera: Medium shot, static. Lighting: Bright, flat institutional lighting. Speech: "They took our jobs!" Screaming with high emotional intensity. Lips must sync perfectly to the shouting. Cut lands on the end of the sentence. [00:12–00:16] Subject: Young Caucasian man, short dark hair, look of panic. Wardrobe: Blue denim jacket over a brown sweater. Environment: A wheat field at night. A large red combine harvester with bright glowing headlights is directly behind him. Camera: Low-angle tracking shot, moving backward as the subject runs toward the camera. Lighting: Harsh, high-contrast lighting from the harvester's headlights casting long shadows. Speech: "The emotions! Incredible real now, like too real! AHHHH!" Panicked, breathless delivery. [00:17–00:26] Subject: Two men. One is tall and thin with a feathered tropical headpiece and a patterned shirt. The other is stocky, resembling a live-action version of a Shrek-like human character in a brown vest. Environment: A wooden dance floor on a tropical beach at sunset. Palm trees and other party guests in the background. Camera: Handheld feel, medium-wide shot, circling the dancers. Lighting: Warm, golden hour sunset glow. Motion: Fluid, rhythmic dancing (salsa/swing style). Speech: "Try Kling 3.0 on Higgsfield... I like to move it!" Upbeat, promotional tone transitioning into song. High energy. NEGATIVE PROMPT: Visual: extra fingers, melting faces, flickering backgrounds, distorted limbs, blurry textures, low resolution, watermark (except 'Kling 3.0'), cartoonish style, floating objects. Speech: robotic voice, monotone delivery, out-of-sync lips, muffled audio, background hiss, unnatural pauses, slurred consonants. SPEECH PACK: [00:00-00:04] Transcript: "Kling 3.0 is insane!" TAKE_A: (Energetic, wide-eyed) "Kling three point zero... is IN-SANE!" TAKE_B: (Friendly, inviting) "Kling 3.0 is just... insane." TAKE_C: (Awestruck) "Kling 3.0... it's actually insane." [00:05-00:11] Transcript: "Multi-shot storyboards, up to 15 seconds. Start to end frame control. Multi-language audio and camera movements that look like a Hollywood rig. They really took our jobs!" TAKE_A: (Angry, rapid fire) "Multi-shot storyboards! Up to fifteen seconds! They really took our JOBS!" [00:12-00:16] Transcript: "The emotions! Incredible real now, like too real! AHHHH!" TAKE_A: (Terrified, screaming) "The emotions! It's too real! AHHHH!" [00:17-00:26] Transcript: "Try Kling 3.0 on Higgsfield with 70% discount and unlimited generations. Comment vibe and I'll send you the link. I like to move it!" TAKE_A: (Salesy, rhythmic) "Try Kling 3.0 on Higgsfield... comment VIBE... I like to move it!"
GLOBAL LOCK: The video features a white male creator in his mid-30s with medium-length, wavy brown hair and a groomed beard, wearing a clean white t-shirt. He is positioned in a bright home office with a professional black condenser microphone on a boom arm in the foreground. The video uses a split-screen or multi-panel layout to compare "Source Video" (the creator) with "AI Generated Results" (various celebrities and characters). The AI characters must perfectly mirror the creator's head tilt, facial expressions, lip-sync, and hand gestures. The lighting is soft, natural window light from the side. The color grade is clean and realistic. [00:00–00:03] The screen is split into three vertical panels. Top panel: The creator waves both hands excitedly and points to his right. Middle panel: Sabrina Carpenter in a pink feathered dress mimics the exact hand wave and pointing. Bottom panel: Billie Eilish in a black outfit and sunglasses mimics the same gestures. High-fidelity lip-sync as they all say "Hear me out." [00:03–00:07] The layout shifts. Top panel: Creator continues talking with expansive hand gestures. Middle panel: Taylor Swift in a red dress mimics the gestures. Bottom panel: Kim Kardashian in a black tank top mimics the gestures. The transitions between characters are sharp cuts. [00:07–00:10] Split screen: Creator (top) vs. Queen Elizabeth II (bottom). The creator looks to his left and then back to the camera with a skeptical expression. The Queen, wearing a crown and sash, mirrors the look perfectly. [00:10–00:13] Split screen: Creator (top) vs. Edna Mode from The Incredibles (bottom). The creator scratches the top of his head with his right hand. Edna Mode, with her signature bob and glasses, scratches her head in perfect sync. [00:13–00:20] A screen recording of a software interface (Enhancor). A cursor selects the "Wan2.2" model from a dropdown menu. The UI shows a "Source Video" of the creator and a "Character Image" of a woman. The cursor toggles "Pro Mode" on and adjusts resolution to 720p. [00:20–00:23] Split screen: Creator (top) vs. a woman with long brown hair in a floral dress (bottom). They are both in the same room. The creator raises his hands in a "stop" gesture; the woman mirrors him perfectly. [00:23–00:27] The UI returns, showing the "Photo Animate" tab being selected. A different reference photo of the same woman is used. The cursor clicks "Generate Video." [00:27–00:35] Final comparison. Split screen: Creator (top) vs. the woman (bottom). The creator looks around the room and then smiles at the camera while touching his hair. The woman mirrors the hair-touching and the smile, but her background is now a different indoor setting matching her reference photo. The text "AI" appears centered on the screen. NEGATIVE PROMPT: Visual: flickering faces, distorted limbs, extra fingers, blurry textures, face-swapping artifacts, unnatural skin smoothing, background warping, robotic movements, low resolution, watermarks. Speech: robotic voice, mismatched lip-sync, muffled audio, background noise, unnatural pauses, clipping audio. SPEECH PACK: [00:00–00:07] Transcript: "Hear me out, all of your favorite movies and animations are going to be completely acted out by someone else in the next two years." TAKE_A: Energetic, fast-paced, direct-to-camera. TAKE_B: Mysterious, slightly slower, emphasizing "completely." TAKE_C: Casual, conversational, like a friend sharing a secret. [00:07–00:13] Transcript: "So I'm going to teach you everything you need to know about this in the next 20 seconds so that you can do this for yourself and stay ahead of the curve." TAKE_A: Authoritative, instructional, rhythmic. TAKE_B: Helpful, warm, encouraging. TAKE_C: Urgent, fast-talking to fit the "20 seconds" claim. [00:13–00:35] Transcript: "So right now you have two options with this new AI video model called Wan 2.2. The first option is Character Swap... The second option is Photo Animate... This is absolutely mind-blowing. Comment AI for the link." TAKE_A: Professional narrator style, clear enunciation. TAKE_B: Enthusiastic, high energy on "mind-blowing." TAKE_C: Calm, tech-reviewer tone, clear CTA at the end.
GLOBAL LOCK: Cinematic photorealistic style, 4k resolution, high dynamic range. Consistent subject identities: Donald Trump (fair skin, blonde-orange hair, navy suit, red tie), Kim Kardashian (tan skin, dark hair, black evening dress, pearl necklace), Kanye West (black male, beard, military uniform). Environment transitions from post-apocalyptic ruins to a dark concrete bunker to a baroque palace. Lighting is dramatic and motivated. Speech is clear, celebrity-accurate cadence, high-quality mic signature. [00:00–00:08] Subject: A metallic Terminator-style robot skeleton with glowing red eyes, wearing a messy blue wig and a black t-shirt with a Palestinian flag. Environment: Post-apocalyptic city ruins, destroyed cars, collapsed highway bridge, massive orange explosions in the background. Action: The robot walks toward the camera with a heavy, mechanical gait, carrying a futuristic plasma rifle. Camera: Low-angle tracking shot, moving backward as the robot advances. Lighting: High contrast, warm orange glow from explosions clashing with cool blue ambient light. Speech: Narrator (female, British accent, serious tone): "The year is 2027. AI has gone woke. Skynet has sent Terminator back in time to make Baby Hitler trans." Sync: Cut on "2027", "woke", and "trans". [00:08–00:16] Subject: Donald Trump in a dark bunker, pointing aggressively. Kim Kardashian stands opposite him in a black strapless dress and pearls. Environment: Underground military bunker, concrete walls, hanging industrial lightbulbs, maps on the table. Action: Trump speaks with characteristic hand gestures and facial expressions. Kim K stands still, looking serious. Camera: Medium shot, over-the-shoulder from Kim K to Trump. Lighting: Harsh top-down lighting from a single bulb, creating deep shadows. Speech: Trump: "You're the only one plastic enough to survive time travel." Sync: High lip-sync strictness for Trump. [00:16–00:18] Subject: Close-up of Kim Kardashian's face. Action: She blinks slowly, looking determined. Camera: Extreme close-up, shallow depth of field. Lighting: Soft cinematic key light on her face. [00:18–00:21] Subject: Kim Kardashian in her black dress. Environment: A dark, cobblestone street in an old European village at night. Action: A massive blue electrical portal opens behind her with sparks and lightning. Camera: Wide shot, static. Lighting: Bright blue emissive light from the portal illuminating the street. [00:21–00:26] Subject: Kim Kardashian running toward the camera, holding a baby dressed in a tiny suit. Environment: A grand, opulent palace hallway with gold trim, chandeliers, and red curtains. Action: Fast-paced running, high heels clicking on marble. A robot chases in the far background. Camera: Low-angle tracking shot, moving fast. Lighting: Bright, warm, high-key palace lighting. [00:26–00:29] Subject: Close-up of Donald Trump in the bunker. Action: He looks directly into the camera with a serious, concerned expression. Camera: Tight close-up. Speech: Trump: "How will we know if she succeeds?" Sync: High lip-sync strictness. [00:29–00:33] Subject: Kanye West in a tan military uniform with a black beret and Nazi-style armbands, saluting. Environment: A wide city square with large red banners featuring swastika-like symbols. Rows of soldiers with bright blue hair stand at attention. Action: Kanye stands still in a salute. The blue-haired soldiers are perfectly synchronized. Camera: Wide symmetrical shot, static. Lighting: Flat, overcast daylight, vintage film grain texture. Speech: Audio of Kanye West's song "Black Skinhead" (instrumental/distorted). NEGATIVE PROMPT: Cartoonish, low resolution, blurry faces, inconsistent celebrity likeness, robotic movement, flickering lights, text watermarks, messy hair, distorted limbs, bad lip-sync, muffled audio, background noise. SPEECH PACK: [00:00-00:08] Narrator: "The year is 2027. AI has gone woke. Skynet has sent Terminator back in time to make Baby Hitler trans." TAKE_A: Dramatic, cinematic trailer voice. TAKE_B: Cold, robotic, monotone. TAKE_C: Whispered, suspenseful. [00:09-00:13] Trump: "You're the only one plastic enough to survive time travel." TAKE_A: Assertive, pointing for emphasis. TAKE_B: Fast-paced, urgent. TAKE_C: Sarcastic, slight smirk. [00:27-00:29] Trump: "How will we know if she succeeds?" TAKE_A: Deep concern, slow delivery. TAKE_B: Whispered, looking away. TAKE_C: Loud, demanding an answer.
GLOBAL LOCK: Two consistent subjects: 1) Donald Trump, elderly Caucasian male, blonde coiffed hair, navy blue suit, white shirt, solid red tie, American flag pin. 2) Kim Jong Un, middle-aged Asian male, black pompadour hair, black-rimmed glasses, dark black high-collared Mao suit. Environments: 1) Oval Office style with American flag and gold curtains. 2) North Korean office with dark wood and papers. Lighting: Cinematic, high-contrast office lighting. Speech: Trump has a breathy, rhythmic New York cadence; Kim has a slightly higher-pitched, eager tone in the parody sections. [00:00–00:06] Split-screen diagonal cut. Top-left: Kim Jong Un in his office, holding a black smartphone to his right ear, looking serious and listening. Bottom-right: Donald Trump in the Oval Office, holding a smartphone to his right ear, speaking with an assertive expression. Static camera, medium close-up for both. [00:06–00:11] Trump lowers his phone, looks down at the screen, and taps with his thumb as if sending a message. He looks back at the camera and gestures with his left hand while speaking. Kim Jong Un is shown in a separate shot, looking at his phone screen with a curious expression. [00:11–00:17] Over-the-shoulder shot of Kim Jong Un's hand holding a smartphone. The screen shows a loading animation with a purple and blue gradient background. A notification appears: "Dorothy seeks a friend or more." Kim's thumb taps the "Episode 1" button. [00:17–00:23] Close-up of the smartphone screen. A role-play app interface is visible. A character named "Dorothy" (young woman in a blue checkered dress) is shown. A chat bubble from Kim says "Hi, I'm Kim Jong Un, Supreme Leader." The app responds with a text bubble. Realistic typing animation on the screen keyboard. [00:23–00:27] Medium shot of Kim Jong Un sitting on a wooden park bench outdoors. He is holding his phone, looking at the screen, and laughing heartily. Background shows green trees and a classical building with a gold statue. Bright, natural daylight. [00:27–00:30] Medium shot of Kim Jong Un sitting in a tan leather armchair indoors. Next to him is a large white and blue rocket model. He is looking at his phone and smiling widely. Warm interior lighting. [00:30–00:33] Close-up of Kim Jong Un in a study, leaning over a desk with an open book, but he is focused on his phone, tapping the screen and giggling. Soft, focused desk lighting. [00:33–00:37] Low-angle shot of Kim Jong Un lying in a bed with white sheets, wearing his Mao suit. He is holding the phone above his face, smiling, and making a peace sign with his free hand. The room is dimly lit with warm bedside light. Fade to black with white text overlay: "Go try the role-play yourself." NEGATIVE PROMPT: blurred faces, inconsistent clothing, extra fingers, morphing phone, distorted American flag, robotic mouth movements, flickering background, low resolution, watermark, text artifacts on skin, unnatural eye movements. SPEECH PACK: [00:00-00:01] Trump: "Yes, Kim." [00:01-00:02] Kim: "Donald, can I join the war?" [00:02-00:05] Trump: "Are you crazy? Of course not. I'm already messed up with Iran, just forget it." [00:05-00:06] Kim: "So, what should I do then?" [00:06-00:11] Trump: "I'll send you a link. Go have some fun there. Bye." TAKE_A (Trump): Breathier, more dismissive. TAKE_B (Trump): Faster pace, more annoyed. TAKE_C (Trump): Slower, more paternal/advising. TAKE_A (Kim): High-pitched, questioning. TAKE_B (Kim): Disappointed tone. TAKE_C (Kim): Eager and curious.
GLOBAL LOCK: horizontal-to-vertical cropped cinematic AI promo reel, hyperreal astronaut-capsule visual motif used as the recurring hero asset, one blond curly-haired white male astronaut in a white EVA suit seated inside a spacecraft cabin beside a second astronaut, muted teal-and-cream cinematic grade, soft filmic contrast, shallow depth of field, premium startup-promo edit style, alternating between talking-point text cards on warm beige backgrounds, floating UI/product mockups, and dark feature boards showing automations and model stacks. Voiceover is implied by subtitle-led pacing rather than visible speaker-to-camera footage, with confident founder-demo cadence and high-end product-marketing clarity. [00:00-00:05] Open inside a spacecraft cabin on a close cinematic shot of a blond male astronaut in a white suit, lit by soft practicals and teal cabin reflections. Subtitle-led narration states that time was spent building an AI system that creates the most realistic assets. Keep the camera intimate and the environment premium, like a film still rather than generic sci-fi art. [00:00-00:05] Intercut quick flashes of a second astronaut in the same capsule and text-led beats on minimalist beige title cards, reinforcing the idea that the workflow can be explained in under a minute. The pacing should feel deliberate and persuasive, not frantic. [00:05-00:12] Cut to spacecraft window and workstation angles, then to the astronaut working at a side panel, while subtitles explain the problem with traditional AI pipelines: too much manual work spread across multiple steps. Preserve the same cinematic asset identity so the audience understands this one astronaut scene is the hero example being discussed. [00:12-00:20] Introduce clean product and automation visuals on dark boards, showing clusters of image generations and labeled tool ecosystems. Subtitle-led narration explains that instead of doing everything at once, the system focuses on one asset and improves the process through automations. Show brand or tool references like Midjourney, Nano Banana, TapNow, and related creative tools as part of a stacked workflow. [00:20-00:28] Display the astronaut asset embedded inside UI mockups and variation cards. The same scene appears in multiple frames, implying automated iteration, refinement, and derivative outputs from a single source asset. The edit should make the system feel modular: one cinematic input, many downstream outputs. [00:28-00:36] Transition to feature-board sequences and gallery walls of generated outputs, then briefly show a challenge or contest card with a prize headline. The visuals should communicate that the workflow is not just for one image but for scalable campaign, content, or challenge production across a broader creative system. [00:36-00:43] Return to the astronaut close-up, now clearer and more emotionally direct, with the capsule background softened behind the visor. Subtitle-led narration shifts into CTA mode, telling viewers to comment a keyword to receive the workflow. The premium cinematic scene remains the proof asset for the entire pitch. NEGATIVE PROMPT: cheap sci-fi costume, broken astronaut helmet reflections, warped faces, inconsistent blond hair, generic stock footage look, unreadable UI, cluttered dashboards, oversaturated colors, harsh shadows, flicker, low-detail spacecraft cabin, robotic timing, noisy typography, watermark, temporal jitter. SPEECH PACK: - Hook: I spent the last couple of days building an AI system that creates the most realistic assets. - Beat 1: Traditional AI takes time because you’re doing too many manual steps yourself. - Beat 2: Instead of doing everything at once, this workflow focuses on one asset and uses automations to improve it. - Beat 3: It works across tools like Midjourney, Nano Banana, TapNow, and more. - CTA: Comment TAP and I’ll send you the workflow to try yourself.
GOAL
Maximize visual + motion + speech (audio) similarity to the reference video, prioritizing:
1) Subject consistency (identity / wardrobe / props)
2) Environment consistency (location, set dressing, weather)
3) Camera & lens language (focal length, movement, framing)
4) Lighting & color grade (contrast, saturation, highlight rolloff)
5) Motion choreography (subject motion + camera motion + scene dynamics)
6) Temporal feel (fps vibe, shutter feel, motion blur, speed ramps)
7) Speech signature (script/semantics, speaker turns, cadence/prosody, emotion arc, mic/room sound, loudness & intelligibility, sync to cuts/lips)
WORKFLOW
A) MISE EN PLACE
2) Segment the video into scenes/shots:
- [00:00 - 00:08]: Single continuous shot.
3) Extract visual evidence:
- Keyframes show a blonde woman in a gold top holding a beer in a bar.
4) Extract speech evidence:
- Speaker A (On-camera female).
- Transcript (Inferred based on context, as no audio provided): "So, you've got your images. Now, step four is where we bring her to life. We are going to animate your AI influencer so she can actually talk to your audience."
5) Invariants list:
- Visuals: Blonde hair, red lipstick, gold strapless top, black leggings, holding a pint of beer, warm pub background, directional warm lighting.
- Speech: Enthusiastic, tutorial-style delivery, female voice.
6) Variables list:
- Visuals: Exact mouth shapes, subtle left-hand movements, slight shifting of the shadow on her chest.
B) SHOTLIST
- shot_id: 1, timecode_start: 00:00, timecode_end: 00:08, duration: 8s
- framing: Starts Medium Shot (MS), pushes in to Close Up (CU). Eye level.
- lens: ~50mm, shallow depth of field, warm bokeh in background.
- camera movement: Continuous, slow, smooth push-in (zoom/dolly forward).
- subject: 20s female, blonde straight hair, red lips. Smiling, talking animatedly, gesturing slightly with left hand. Right hand holds beer steady.
- environment: Indoor pub/bar. Wooden counter foreground. Blurred patrons and shelves background.
- lighting: Strong, warm directional light from front-right. Creates deep shadow of the beer glass on her left chest/shoulder.
- color grade: Warm, amber tones, high contrast, rich saturation on the gold top and red lips.
- motion cues: Subtle movement of hair, dynamic facial expressions, shadow shifting slightly with breath/movement.
- SPEECH / AUDIO
- speech_present: true
- speakers: [A] (on-camera)
- transcript_segments: [{start_tc: 00:00, end_tc: 00:08, speaker_id: A, text: "So, you've got your images. Now, step four is where we bring her to life. We are going to animate your AI influencer so she can actually talk to your audience.", emotion: enthusiastic}]
- delivery_direction: Young female, energetic, warm, tutorial-style pacing, crisp articulation.
- mic_room_signature: Close mic, slight room reverb to match the pub setting.
- sync_requirements: High lip-sync strictness required as she is facing the camera directly.
C) STYLE BIBLE
- visual_style: Photoreal UGC advertorial.
- camera_signature: Smooth gimbal/tripod push-in, cinematic shallow focus.
- lighting_signature: Motivated practical lighting, high-key warm, strong directional shadows.
- grade_signature: Amber/gold palette, high contrast.
- texture_signature: Clean skin detail, metallic reflection on the top, liquid texture in the glass.
- pacing_signature: Single continuous take, slow build in intimacy via zoom.
- SPEECH STYLE BIBLE
- speech_style: UGC direct-to-camera tutorial.
- speaker_profile: Energetic, confident, engaging.
- pronunciation_profile: Clear, standard American accent.
- mic_mix_profile: Clean VO with subtle environmental room tone.
D) PROMPT SYNTHESIS
1. MASTER PROMPT
GLOBAL LOCK: Photorealistic UGC style. A beautiful young woman in her 20s, long straight blonde hair, bright red lipstick, wearing a metallic gold strapless tube top and black athletic leggings. She is holding a pint glass of light beer in her right hand. The setting is a cozy, dimly lit pub or bar with a polished wooden bar counter in the foreground. The background is out of focus with warm bokeh, showing blurred figures of other patrons and shelves of bottles. Lighting is warm and directional from the front-right, casting distinct, hard shadows—specifically the shadow of the beer glass and her arm across her chest. The color grade is rich in amber and gold tones with high contrast. Single female speaker, energetic tutorial delivery, close-mic with slight room tone.
[00:00–00:02] The camera starts in a medium shot, framing the woman from the waist up. She is smiling broadly, looking directly into the lens. Her right hand holds the beer glass steady near her chest. The strong directional light casts a clear shadow of the glass onto her gold top. The camera begins a very slow, smooth push-in. She takes a breath to begin speaking.
[00:02–00:05] The camera continues its slow push-in, transitioning to a medium-close up. The woman begins speaking animatedly. Her mouth opens wide with clear enunciation. Her facial expressions are enthusiastic. Her free left hand comes up slightly into frame, making a subtle, closed-fist gesture to emphasize her words. The shadow on her chest shifts slightly as she moves. The background remains a warm, soft blur.
[00:05–00:08] The camera push-in slows as it reaches a close-up on her face and upper chest. She continues speaking directly to the viewer, maintaining strong eye contact and a bright smile. The metallic texture of her top and the red of her lipstick pop against the darker, warm background. The shot ends with her holding a confident, engaging expression.
2. NEGATIVE PROMPT
Visual: morphing, extra fingers, bad anatomy, distorted glass, spilling liquid, flickering lighting, text/logos on clothing, temporal jitter, background coming into focus, loss of shadow consistency.
Speech: robotic cadence, unnatural emphasis, slurred words, harsh sibilance, plosives, clipping, lip-sync mismatch, audio dropping out.
3. SPEECH PACK
Segment 1 [00:00-00:08]
TAKE_A: "So, you've got your images. [pause] Now, step four is where we bring her to life. We are going to animate your AI influencer so she can actually talk to your audience." (Energetic, punchy emphasis on "four" and "animate")
TAKE_B: "Alright, you have the pictures. [pause] Step four is the fun part—bringing her to life. Let's animate your AI influencer so she can speak directly to your fans." (Slightly more conversational, warm tone)
TAKE_C: "Images are done. [pause] Now for step four: bringing her to life. We'll animate your AI influencer, giving her a voice to connect with your audience." (Slightly faster pace, highly enthusiastic)MASTER PROMPT Create a vertical 9:16 creator tutorial reel about AI UGC actors for product marketing videos. The format should feature a male host in a small lower-frame talking-head window reacting to and explaining examples above him: short selfie-style clips of different people, a software interface for creating actors, a library of avatar options, and a finished product-demo example where a woman holds up a perfume bottle. The overall feel should be educational, creator-friendly, and built around the promise that branded talking-head content can be generated or iterated quickly with AI actors. GLOBAL LOCK - Format: 9:16 educational reaction/tutorial reel with a persistent host cutout at the bottom. - Host anchor: adult male with beard, cap, microphone, seated and speaking directly to camera in a commentary format. - Demo anchor: upper frame cycles through creator selfie clips, software screens, actor libraries, and generated UGC-style product ads. - Core topic: building or selecting AI actors to create marketing videos, especially spokesperson-style content for products. - Example anchor: a fragrance or beauty-product demo where a female actor holds up a perfume bottle and talks in a polished UGC style. TIMELINE 0.0s - 10.0s Open with the host reacting to a sequence of polished selfie videos from different people. The clips should look like ordinary creator content at first, setting up the reveal that these are examples relevant to AI-generated actor workflows. 10.0s - 22.0s Transition into software explanation. Show an interface with sections for actors, a search area, and a prominent create-actor card. Include a visible library of different human presenters so the audience understands that the system offers multiple spokesperson options rather than one generic avatar. 22.0s - 34.0s Go deeper into actor setup. Show screens labeled around defining or selecting an actor, vertical preview cards, and iteration choices. The visual emphasis should be on turning a chosen face or persona into reusable ad-ready performance footage. 34.0s - 44.0s Close on a concrete output example: a polished female presenter holding a perfume bottle in bright soft light, speaking like a beauty or lifestyle creator. Keep the host visible at the bottom so the reel ends as a clear explanation of how AI actors can produce UGC-style branded content at scale. NEGATIVE PROMPT No sci-fi humanoid robots, no uncanny 3D avatars, no coding-terminal focus, no heavy meme editing, no dark moody tech aesthetic, no unrelated gaming footage, no horror effects, no cinematic drama. Keep it practical, creator-oriented, and software-demo clear. SHOT PROMPTS - Vertical tutorial shot with a male host in a lower-frame reaction box and a large selfie-style creator clip above him. - Software-demo screen showing an actor library with create-actor options, thumbnails of different presenters, and clean UI. - AI actor workflow screen displaying define-your-actor and select-your-actor steps for generating ad-ready spokesperson videos. - Product-marketing example featuring a polished female presenter holding a perfume bottle in a bright indoor setting, UGC ad style. - Closing host commentary frame tying together the workflow from sample clips to actor selection to finished branded output. SPEECH PACK - Spoken delivery should explain AI UGC actor creation, who it is useful for, and how it can turn products into talking-head marketing videos. - Audio should prioritize clear host narration over a light background music bed.
Vertical comedic office mockumentary about being an “AI artist,” set in a bright open-plan creative workplace. A bearded man in casual office clothes walks through the hallway carrying work materials, then sits at a desk and deadpans to camera that making films with AI is extremely simple, as if all you have to do is press a big red button. The reel cuts between him speaking confidently in interview-style framing, bold oversized on-screen text calling him “THE AI ARTIST,” shots of thick paper briefs and office tasks, an elderly colleague handing over documents, and absurd visual metaphors where everyday chores or output volume become part of the joke. The tone should be satirical and self-aware, poking fun at the idea that AI filmmaking is effortless while also showcasing the studio environment and creative process. Clean commercial lighting, office comedy pacing, direct-to-camera delivery, punchy captions, and workplace absurdity rather than dramatic storytelling.
GLOBAL LOCK: A vertical 9:16 creator tutorial reel teaching how to make first-person time-travel vlogs with AI. The lower half of the video holds a young male creator speaking directly to camera in a dark studio with red side lighting, black hoodie or jacket, and a backward cap. The upper half alternates between social-proof examples, smartphone search screens, browser pages, prompt-writing documents, and final generated historical selfie videos. The core output style is a realistic vlog shot where a modern creator appears to be filming himself inside major historical moments such as Viking England, the Wild West, or D-Day. The entire reel should feel practical and system-driven, built for viewers who want repeatable viral history content. [00:00-00:12] Open on two successful example clips above the speaker: one where a young woman appears to selfie-vlog among Vikings in England in 865 AD, and another where she appears in a Wild West town in 1880. Both examples should look like genuine first-person historical vlogs with modern camera behavior but era-correct surroundings. View counts or social-proof markers should be visible to show that this content format already works. [00:12-00:28] Move into the workflow entry step through a smartphone UI. Show a phone search screen with “Time Travel” typed in, then a Google-like result page for “Higgsfield AI.” The creator below explains the process in clear terms, making the tutorial feel accessible. The emphasis is on how surprisingly simple the setup is once the right tools are known. [00:28-00:46] Show prompt-building and script-generation stages. Display a prompt document or text page labeled for text-to-video prompts, with entries for historical scenarios like landing craft before a beach assault or other era-specific vlog scripts. The interface should feel like a practical creator workflow rather than a polished marketing demo. The point is that the output begins with scripting the right first-person historical situation. [00:46-01:01] End on a dramatic finished example where the creator appears to be selfie-vlogging during a World War II beach landing, with smoke, soldiers, landing craft, and battlefield chaos behind him. Overlay a small thumbnail or packaging element suggesting how the final video can be turned into a clickable social or YouTube asset. The result should feel both absurd and convincing: modern vlog behavior dropped into a massive historical event. NEGATIVE PROMPT: static history painting look, third-person documentary framing, no selfie perspective, bland phone UI, generic prompts, inconsistent main character face, casual modern backgrounds, low-detail crowds, weak historical setting, no social-proof packaging. SHOT PROMPTS: Viking time-travel selfie vlog; Wild West selfie vlog; phone search Time Travel; Higgsfield AI search result; ChatGPT prompt document; text-to-video historical script; D-Day beach selfie vlog; viral history series tutorial. SPEECH PACK: One male speaker only. Tone is practical and energetic, emphasizing simplicity, virality, and repeatability. Stress “time travel vlogs,” “Higgsfield AI,” “ChatGPT prompts,” and the historical selfie angle.
GLOBAL LOCK: Subject is a Caucasian male in his early 30s, dark wavy hair, well-groomed medium-length beard, expressive brown eyes. He maintains a consistent facial structure across all shots. The visual style is a mix of high-end editorial photography and UGC tutorial footage. Lighting is cinematic with soft key lights and motivated rim lighting. Color grade is professional with deep blacks and vibrant but natural skin tones. Speech is clear, energetic, and instructional, delivered with a warm, authoritative tone. [00:00–00:01] Subject: MCU of the man wearing a dark suit, white dress shirt, black tie, and a white baseball cap with a green brim. Action: Talking directly to the camera. A vertical white rectangular mask moves across his face, revealing a slightly different version of the same scene. Camera: Static MCU, eye-level. Lighting: Soft studio lighting, neutral background. Speech: "This is how you can create..." [00:01–00:04] Subject: Rapid montage of AI-generated images. 1. Man in a dark suit and sunglasses driving a green car at night, "AI MAG" text overlay. 2. Man in a checkered blazer and paisley tie in front of a brick wall. 3. Man in a white short-sleeve shirt with multiple pens in his pocket, standing in a white studio. Action: Static editorial poses. Camera: Various (MS, MCU). Lighting: Cinematic, high contrast, nighttime car lighting, studio softbox. Grade: Magazine editorial style. [00:05–00:08] Subject: A 3x4 grid of 12 different AI portraits of the same man in various outfits (boxing gloves, red car, street style, suit). Action: Static images. Overlay: Large bold text "UNLIMITED GENERATIONS" in orange and blue. Camera: Flat grid layout. Lighting: Varied per image. [00:09–00:14] Environment: Screen recording of the Higgsfield.ai website interface. A cursor moves to click "Image" then "Soul ID Character". Action: UI navigation. Speech: "On Higgsfield.ai, go to image and select Soul ID Character..." [00:15–00:20] Subject: Picture-in-picture of the man talking (wearing a tan cap and beige shirt) over a screen recording of the "Make Your Own Character" page. Action: Explaining the process while gesturing. Speech: "...where you can actually create your own custom character of yourself by uploading a bunch of photos." [00:21–00:24] Subject: Montage of AI images with text prompts. 1. Man in a suit drinking from a glass (trippy lens effect). 2. Man in a tan suit with a "Micky Mouse Bag" in a city street. 3. Man in a white tank top and jeans in front of a "Tokyo Red Car". Action: Posing. Camera: Full body and MS. Lighting: Bright daylight, stylized urban lighting. [00:25–00:34] Environment: Screen recording of the "Lipsync Studio" interface. Subject's PIP continues. Action: Selecting "Video", then "Lipsync Studio", uploading an image of himself at the beach, and dragging an audio file named "voiceover.wav". Speech: "Now you can go to video at the top of the page and select the Lipsync Studio where you can upload your photo and audio..." [00:35–00:38] Subject: CU of the man at a tropical beach. He is shirtless, wearing black swimming goggles on his head. Action: He is lip-syncing perfectly to the audio, smiling slightly. Environment: Bright blue ocean water with small waves in the background. Camera: CU, static. Lighting: Bright, direct sunlight with natural shadows. Speech: "...and it will combine those two together with the best lip-sync models." NEGATIVE PROMPT: Visual: robotic movement, distorted facial features, inconsistent beard growth, blurry textures, flickering background, extra fingers, warped UI elements, low resolution, watermarks. Speech: robotic monotone, lip-sync delay, muffled audio, background hiss, unnatural pauses, slurred consonants, popping sounds. SPEECH PACK: [00:00-00:08] Transcript: "This is how you can create 25 magazine-ready images of yourself using AI and then you can even lip-sync on top of them with this brand new feature." TAKE_A: (Energetic, fast-paced) "This is how you can create TWENTY-FIVE magazine-ready images of yourself using AI... and then you can even LIP-SYNC on top of them with this brand new feature!" [00:09-00:20] Transcript: "On Higgsfield.ai, go to image and select Soul ID Character where you can actually create your own custom character of yourself by uploading a bunch of photos." TAKE_A: (Instructional, clear) "On Higgsfield dot A-I, go to image and select Soul I-D Character... where you can actually create your own custom character of yourself... by uploading a bunch of photos." [00:25-00:38] Transcript: "Now you can go to video at the top of the page and select the Lipsync Studio where you can upload your photo and audio and it will combine those two together with the best lip-sync models." TAKE_A: (Helpful, concluding) "Now you can go to video at the top of the page and select the Lipsync Studio... where you can upload your photo and audio... and it will combine those two together with the best lip-sync models."
GLOBAL LOCK: A vertical 9:16 social video featuring one white European-looking man in his late 20s to early 30s with fair neutral skin, blue eyes, dark brown side-swept hair, athletic build, clean-shaven face, and fitted black t-shirt, always presented as the same creator across every shot. Keep his identity, facial proportions, hairstyle, shirt, black watch, and confident tutorial energy locked. The visual world alternates between a warm tungsten bedroom-office with textured walls, shelf decor, practical lamp glow, and shallow depth of field, and clean dark UI demo layouts with rounded white software panels floating above black backgrounds. Camera language is creator-economy cinematic UGC: medium close-ups, chest-up framings, slight handheld energy, occasional push-ins, and crisp eye-level talking-head setups. Lighting stays motivated and contrasty, with orange practical light on one side and cooler fill on the opposite side during the “after” setup. Grade is rich, warm, polished, slightly contrasty, with soft highlight rolloff and subtle skin texture. One male speaker only, on-camera and off-camera from the same person, speaking energetic tutorial English with quick cadence, punchy emphasis, clean studio-style voice, close microphone presence in the second half, and tight lip sync whenever his mouth is visible. [00:00-00:03] In a dim, warm, low-budget looking setup, frame the creator seated against a textured gray wall, lit by a harsh orange practical glow. He gestures with both hands while looking directly into the lens. Large bold white words appear one beat at a time over his chest, matching the spoken hook: “you go from this”. Keep the frame slightly cramped, the background plain, and the mood intentionally mediocre to set up contrast. [00:03-00:00:07] Smash cut to a cleaner, brighter version of the same creator in a polished bedroom-office. Use a centered medium shot with soft warm lamp light behind him, shelf decor and trailing green plant on camera left, and a vertical tube light on camera right. Continue the chest-level kinetic subtitles: “to this, or this”. Keep his black t-shirt and posture consistent while the room looks instantly more premium. [00:07-00:10] Show another talking-head angle in the polished room, then briefly cut to a behind-the-scenes view with a large softbox, chair, phone, and wall, revealing the practical filming setup. Bold subtitle words continue timing with speech: “AI”, “a key”, “45 degrees”, “of you”, “pocket”, “background”. Preserve the tutorial rhythm and creator hand gestures. [00:10-00:15] Cut to smartphone recording interface views and the creator framed vertically on a phone screen. Emphasize that a screenshot of the clip is taken. Show the recording button, the portrait frame, and a quick screen capture moment. Transition into dark UI layouts branded around ElevenLabs, keeping the creator visible in a picture-in-picture talking-head box at the bottom. [00:15-00:24] On a dark background, display rounded white software panels with image upload areas and labels such as “Image refs” and “Nano Banana Pro.” The creator appears in a small lower talking-head box speaking directly into a large black microphone. He points upward and times his gestures to each UI step. Show his portrait reference being uploaded, then the generated clean headshot-style reference. Keep the mic close, the room tone dry, and the delivery crisp. [00:24-00:33] Reveal multiple generated stills of the same creator in slightly different rooms and lighting conditions, including warm interiors and cool-blue accented backgrounds. Show a blue “Download” button on one version. Then move into a “Kling 2.6 Motion Control” interface with two slots labeled for a character image and a motion video. The creator keeps explaining while pointing up toward the interface, maintaining fast tutorial cadence. [00:33-00:39] Fill the motion-control interface step by step: first add the clean portrait as the character reference, then add the motion source video, then display the prompt text instructing the tool to transfer the motion of the first attached video into the attached image perfectly. Show the cursor moving to the upload arrow. Keep the software card large, centered, and readable, with the presenter anchored below. [00:39-00:46] Cut back to vertical result examples of the same creator composited into new backgrounds while preserving body motion and framing. Show one scene with a dark studio doorway and plants, another with a warm shelf-lit interior. The creator continues speaking into the mic from the lower frame, emphasizing that the method preserves motion while swapping environment. [00:46-00:52] Switch to the ElevenLabs Creative Platform UI. Show the creator clip inside the workspace, then navigate into audio features. Surface labels like “Sound effects” and “Studio quality voice,” plus a dropdown list of available voices. Keep the UI white and minimal, floating on a black canvas, while the creator explains how to finish the polish. [00:52-00:57] Display a detailed equipment/setup page with headings like camera and lens suggestions, price examples, and notes about depth of field and aperture. Then cut back to a dark layout where the motion-transfer prompt card is visible alongside stacked vertical examples of the creator in different backgrounds. The creator maintains an urgent, confident CTA tone. [00:57-00:59] End on a strong conversion frame: oversized yellow and white text reads Comment “Setup” while the creator points upward with one finger from the bottom talking-head box. Keep the black background clean, the examples stacked above, and the CTA unmistakable, optimized for saves and comments. NEGATIVE PROMPT: do not change the presenter’s face, hairstyle, age range, build, shirt color, or watch between shots; avoid extra fingers, warped arms, asymmetrical eyes, rubbery skin, unstable jawline, drifting hairline, or mismatched ear shape; avoid random wardrobe swaps, logo changes, or added accessories; no flicker, temporal jitter, morphing backgrounds, UI text corruption, duplicated limbs, or inconsistent room geometry; no muddy compression, over-sharpening, clipped highlights, strange shadow directions, or cartoon skin smoothing; do not let the microphone appear in shots where it should be absent; avoid robotic speech, flat cadence, clipped plosives, harsh sibilance, room echo, bad lip sync, or subtitles that lag the spoken emphasis.
INVARIANTS TO LOCK - Vertical 9:16 split-comparison Reel. - Same young adult white male creator in every shot: light skin, slim build, side-swept brown hair, clean-shaven, expressive face. - Neutral studio setup with soft gray background, clean frontal lighting, medium framing from chest to head. - Video alternates between “Original:” and “AI:” versions of the same gesture performance. - The AI versions keep the exact body movement and timing, but swap wardrobe, accessories, and visual effects. - Tone is demo-first, highly legible, fast, and social-native. SHOTLIST 1. [00:00-00:02] AI label over a dark tactical outfit, then a red-and-blue spider-inspired superhero suit, then a brown aviator jacket with patches and sunglasses. Matching “Original:” frames underneath show the presenter in a plain black shirt doing the same finger snap gesture. 2. [00:02-00:05] The comparison continues with the aviator look in a warmer room setting with vertical blinds and a plant, still mirroring the original hand choreography. 3. [00:05-00:07] Fire effects appear behind and around the AI version while the original remains clean and unstyled below. 4. [00:07-00:09] Large subtitle CTA appears over the AI version: comment “AI” for guide. Final frames push the fiery transformation while the original keeps the same open-handed pose. STYLE BIBLE Visual style: creator demo of motion-consistent character transformation. Camera signature: locked tripod, eye-level medium shot, no camera movement. Lighting signature: soft even front light on the original clip; AI variants maintain similar face lighting while changing wardrobe and environment mood. Grade signature: clean studio neutrals in the original; richer contrast and warmer highlights in the AI versions. Speech style: brief solo creator commentary or silent caption-driven demo; if voice is present, it should sound casual, impressed, and direct. MASTER PROMPT GLOBAL LOCK: Create a vertical 9:16 Instagram Reel that compares an original studio performance against AI-transformed outputs. Use the same young adult white male creator with light skin, slim build, side-swept brown hair, and clean-shaven face throughout. Keep the original clip on a soft gray studio background with the creator in a plain fitted black shirt, medium framing, frontal lighting, and simple hand gestures. Every AI version must preserve identical timing, pose, eye line, and hand motion, while changing outfit, accessories, background mood, and effects. Use bold yellow labels “AI:” and “Original:” so the comparison is instantly readable. [00:00-00:02] Show the creator snapping or flicking his fingers in sync across paired comparison frames. In the AI version, first dress him in a dark armored tactical costume, then switch to a red-and-blue spider-inspired superhero suit, then to a brown aviator jacket with sewn patches and black sunglasses. In the original version, keep the same gesture in a plain black shirt against a gray backdrop. [00:02-00:05] Continue the gesture-matched comparison. The AI variant now settles into the aviator look in a warmer cinematic room with vertical blinds and a leafy plant, preserving exact mouth shape and hand timing from the original clip. The original remains unchanged below, emphasizing how the motion has been transferred rather than reanimated from scratch. [00:05-00:07] Add stylized flames behind the AI character and subtle orange light wrapping around the jacket sleeves. Keep the original clip clean and neutral for contrast. Maintain sharp alignment between both performances so viewers can read the transformation as one-to-one motion mapping. [00:07-00:09] End with the most dramatic fiery aviator transformation while overlaying a clear CTA: comment “AI” for guide. The original clip still mirrors the same open-handed pose. Finish on a high-energy, creator-demo beat. NEGATIVE PROMPT Do not drift the face identity, hairstyle, body proportions, or gesture timing between original and AI versions. Avoid extra fingers, broken sunglasses, distorted jacket patches, muddy flames, inconsistent eye direction, unreadable labels, flickering backgrounds, or cartoonish facial deformation. Do not let the AI transformation lose the exact one-to-one motion match with the original clip. SPEECH PACK [00:00-00:04] Speaker A, direct-to-camera, meaning: this is how the same motion can be restyled with AI. Delivery: short, confident, creator-demo cadence. TAKE_A: “Same motion, completely different character styling.” TAKE_B: “This is the exact same performance, just transformed with AI.” TAKE_C: “Watch how the motion stays locked while the look changes.” [00:04-00:09] Speaker A or on-screen text, meaning: these tools save creators time and a guide is available by comment. Delivery: casual CTA. TAKE_A: “Comment AI if you want the full guide.” TAKE_B: “If you want the workflow, comment AI below.” TAKE_C: “Comment AI and I will send the guide.”
Ai Voice Meme Maker
AI voice meme clips work because the voice itself becomes part of the punchline. The wrong voice can flatten a joke, while the right one can make a simple script feel instantly recognizable, absurd, or much funnier than it reads on paper. That is why good examples in this category are really about voice, script, and visual pairing as one system.
This page is useful when you want more than just a funny line on screen. Look for prompts and examples where the chosen voice style, the pacing of the line, and the supporting visuals all point in the same direction. The best results usually understand that meme narration is about timing as much as text.
What matters most in an AI voice meme? Voice choice, line pacing, and how well the visual supports the spoken joke usually matter most.
Do I need a famous-sounding voice to make it work? Not always. A generic but well-matched tone can still land if the script and edit timing are strong.
Should the visuals be complex? Usually no. Many voice memes work best when the visual gives context quickly and lets the narration carry the joke.