How to Generate AI Images

How to generate AI images pages are for complete beginners who want a clear first path into image generation. The goal is to explain how to choose a tool, write a first prompt, read the output, and improve it step by step without assuming prior experience. This page helps users move from curiosity to a usable first image with practical guidance that works across tools.

Video

MASTER PROMPT
GLOBAL LOCK: Vertical creator tutorial reel about a viral 360 product effect. A male creator speaks to camera in a dim warm room while the video alternates between product photo examples, UI screens, and premium-looking product outputs. Use clean isolated product images, dark interface panels, and clear step-by-step pacing.

[00:00-00:04] Open on a premium product example and the host explaining that the 360 product effect is going viral.
[00:04-00:09] Show multiple products photographed from different angles to explain the input requirement.
[00:09-00:15] Move into Advertising Studio and the 360 preset workflow.
[00:15-00:20] Show the polished output that feels like a studio product shoot.
[00:20-00:24] End with a CTA asking viewers to comment studio for the full guide.

NEGATIVE PROMPT
Avoid broken product geometry, bad angle stitching, muddy lighting, unreadable UI, label drift, and jittery rotation.

SPEECH PACK
Open by saying the 360 product effect is going viral. Explain the workflow: take angle photos, open Advertising Studio, choose the 360 preset, upload, and generate. Close by asking viewers to comment studio for the step-by-step guide.
Video
Create a vertical 9:16 minimal premium design-poster visual for an AI creative workflow, featuring a bright yellow tennis ball floating just above an outstretched human hand against a clean blue sky. The hand should rise from the lower portion of the frame wearing a white wristband, with the ball suspended in crisp sunlight so it feels like a polished 3D object hovering in space. Bold yellow Lovart text repeats in the upper left, while repeated Design text appears in the lower right like confident editorial poster typography. The overall result should feel like a high-end animated 3D poster concept for designers: simple, modern, vector-friendly, and easy to manipulate as a motion design asset. No clutter, no subtitles, no extra objects, no cartoon style.
Video
Kallaway
GLOBAL LOCK:
Subject Identity: A Caucasian male in his mid-30s, short dark hair, clean-shaven, wearing a black baseball cap with a white logo and a black hoodie.
Environment: A home studio/office with dark shelves filled with books and tech gear, warm orange practical lights, and a soft blue/purple neon glow in the background.
Camera: Medium close-up, static, eye-level, shallow depth of field.
Lighting: Warm key light from the side, soft fill, cool rim lighting.
Color Grade: Modern digital look, slightly warm skin tones, deep blacks.
Speech Style: Energetic, direct-to-camera, fast-paced "tutorial" cadence, crisp articulation, medium room reverb.

[00:00–00:04]
Subject: The male subject is integrated into two cinematic scenes. 1) Titanic: He is Jack, wet hair, clinging to a wooden door in dark blue water, looking at Rose. 2) Avatar: He is a Na'vi with blue skin and bioluminescent dots, holding a bow in a dark jungle.
Action: Subtle facial movements, blinking, intense eye contact.
Camera: Close-ups, mimicking original movie cinematography.
Lighting: 1) Cold, moonlight blue. 2) Bioluminescent green and blue.
Speech: "I just figured out how to make fully cinematic scenes with me as the main character."
Lip-sync: High strictness on the bottom tutorial frame.

[00:05–00:07]
Subject: Three static photos of the male subject in a bright kitchen, smiling, wearing a black t-shirt.
Environment: Modern white kitchen, bright natural light.
Action: Photos slide across the screen.
Speech: "Look at this. All we did was upload these basic photos of me in my kitchen."

[00:08–00:17]
Subject: Rapid montage of the subject in iconic scenes: Taxi Driver (looking in mirror), Pulp Fiction (suit and tie), The Godfather (dark office).
Action: Character-specific gestures (hand to forehead, laughing, sitting in a car).
Camera: Fast cuts, matching the original film's framing.
Lighting: Varies by scene (moody noir, 70s grit, warm shadows).
Speech: "And we were able to remake all these iconic scenes extremely easily in Artlist. Perfect physics, perfect lighting."

[00:18–00:30]
Subject: Back to the tutorial setup in the studio.
Action: Gesturing with hands, pointing upwards, making an "OK" sign.
Speech: "And most importantly, there's absolutely no doubt it's me in the scenes. The character consistency is absolutely crazy. If you want to do this for yourself..."

[00:31–00:51]
Subject: Tutorial setup continues on the bottom; top shows screen recording of Artlist UI.
Action: Mouse clicking through "Nano Banana Pro" and "Kling 2.6 Pro" interfaces.
Speech: "First, take a screenshot or generate the base scene you want to be in... then use Nano Banana Pro to generate any new angles... drop in that base photo of you..."

[00:52–01:13]
Subject: New AI-generated scenes: A basketball player in a locker room, a man in a vintage car, a Kung Fu student in a yellow robe.
Action: High motion: drinking from a bottle, driving fast, performing a flying kick over lily pads.
Camera: Dynamic tracking shots, speed ramps.
Lighting: High contrast, cinematic.
Speech: "Watch these back, it absolutely blows my mind how character accurate these models have gotten... you're getting cutting edge access to both image and video models."

[01:14–01:18]
Subject: Final tutorial shot in the studio.
Action: Pointing at the camera, smiling.
Speech: "If you want to try this whole workflow out for yourself, start creating on Artlist."

NEGATIVE PROMPT:
Visual: Face warping, flickering facial features, inconsistent hat logo, blurry textures, unnatural limb movement, AI "hallucinations" in the background, robotic skin texture, mismatched lighting between face and body.
Speech: Robotic monotone, slurred words, background hiss, popping "P" sounds, lip-sync lag, unnatural pauses, metallic voice quality.

SPEECH PACK:
[00:00–00:04] "I just figured out how to make fully cinematic scenes with me as the main character."
TAKE_A: (Excited, fast) "I just figured out how to make fully cinematic scenes with me as the main character!"
TAKE_B: (Confident, steady) "I just figured out how to make fully cinematic scenes... with *me* as the main character."
TAKE_C: (Casual, conversational) "So, I just figured out how to make these cinematic scenes with me as the lead."

[00:05–00:07] "Look at this. All we did was upload these basic photos of me in my kitchen."
TAKE_A: (Pointing, amazed) "Look at this! All we did was upload these basic photos of me in my kitchen."

[00:18–00:20] "The character consistency is absolutely crazy."
TAKE_A: (Emphasizing 'absolutely') "The character consistency is *absolutely* crazy."
Video
by.shlabu

INVARIANTS TO LOCK
- Vertical 9:16 AI car edit tutorial with premium automotive mood.
- Main subject is a dark Porsche-style sports car in a snowy forest road, occasionally contrasted with a pink version as an alternate frame state.
- Layout repeatedly shows image cards, before/after frame pairs, and explanatory blocks for first-frame / last-frame controlled motion.
- Environment is wintery, foggy, and minimal: snow-covered road, bare trees, overcast sky, cold atmosphere.
- Key detail inserts include wheel close-up, glowing rear light, rear driving shot, trunk/engine detail, and side profile motion.
- Tone is controlled, precise, and tutorial-driven rather than random cinematic montage.

SHOTLIST
1. [00:00-00:06] Open on snowy-road car stills with framed card layout, showing a dark sports car from side and front views plus a pink alternate example.
2. [00:06-00:11] Introduce frame-pair logic with text panels beneath the car images, signaling how the motion edit is built from structured image states.
3. [00:11-00:16] Close-ups of the wheel and rear light appear, still packaged inside design-card panels to show exact motion targets.
4. [00:16-00:20] Rear driving shot on the snowy road with glowing taillights, plus more start/end frame comparison cards.
5. [00:20-00:23] Final details include the hood or rear-compartment reveal and a side profile accelerating away through the winter setting.

STYLE BIBLE
Visual style: premium automotive AI breakdown, clean editorial layout mixed with cold cinematic car imagery.
Camera signature: mostly static image-card presentation with occasional implied motion through frame sequencing; car shots use low automotive angles and centered road symmetry.
Lighting signature: flat winter daylight, soft fog, red taillight accents, dark glossy paint reflecting the white environment.
Grade signature: cool desaturated winter tones with crisp red highlights and minimalistic gray-white UI cards.
Speech style: text-led or voice-led tutorial, emphasizing control, structure, and repeatability.

MASTER PROMPT
GLOBAL LOCK: Create a vertical tutorial Reel that teaches controlled AI automotive motion using first-frame and last-frame image logic. Keep the subject as a dark high-performance Porsche-style sports car staged on a snow-covered forest road under overcast winter light. Present the content inside an editorial mobile-friendly layout with image cards, small explanatory text blocks, and paired visual states that show how motion is planned. Include detail shots of the wheel, rear lights, and bodywork, plus occasional alternate color variation such as a pink version of the same car to demonstrate source-frame flexibility. The entire edit should feel premium, cold, and highly controlled.

[00:00-00:05] Show side and front views of the sports car on a snowy road, framed inside rounded editorial cards with small captions below. Introduce the concept that this exact motion can be recreated with AI.

[00:05-00:09] Display paired start-frame and end-frame logic for the car, including a pink alternate front view and a dark version, to show how the motion direction is being designed rather than guessed.

[00:09-00:14] Cut into detail inserts: a wheel close-up and a glowing red rear light module, each inside clean layout blocks with explanatory text beneath.

[00:14-00:19] Show the rear of the car driving away through the snowy forest road, taillights glowing through mist, while more paired frame cards reinforce the first-frame/last-frame method.

[00:19-00:23] End with additional car detail shots such as hood or engine-area reveal and a side profile pulling away, emphasizing seamless controlled motion instead of randomness.

NEGATIVE PROMPT
Do not make the car generic or deform its proportions between frames. Avoid muddy snow textures, unrealistic wheel geometry, unreadable text cards, inconsistent lighting across frame pairs, or motion that feels chaotic and unplanned. Preserve the cold premium automotive mood and the tutorial-card structure.

SPEECH PACK
[00:00-00:10] Speaker A or text-led. Meaning: you can recreate this exact AI car edit using controlled first and last frames. Delivery: precise and instructional.
TAKE_A: “This is how you can create this exact edit using AI.”
TAKE_B: “You start by generating your base images, then you control the motion with frame planning.”
TAKE_C: “The key is not randomness, it is start-frame and end-frame control.”

[00:10-00:23] Speaker A or text-led. Meaning: use Nano Banana for base images, Kling 2.5 for controlled motion, then stitch and retime in the edit. Delivery: methodical and confident.
TAKE_A: “Generate the base images, organize the first and last frame in Kling, then stitch and retime the outputs.”
TAKE_B: “Use a structured prompt with Kling 2.5 to generate clean motion between the frame states.”
TAKE_C: “Comment KLING if you want the full breakdown and prompt structure.”
Video
Kallaway
GLOBAL LOCK: The video features a male creator in his 30s, white, with short dark hair, wearing a black baseball cap with a small animal logo and a black "REPRESENT" hoodie. He is in a dark room with soft blue and orange accent lighting. The AI-generated visuals must maintain a high-fashion editorial aesthetic, featuring diverse models (Caucasian, East Asian, Black) in high-end clothing (puffer jackets, suits, dresses). The lighting for AI shots is cinematic, ranging from high-contrast studio light to warm golden hour. The camera language is a mix of static talking-head shots and slow-motion, cinematic pans for the fashion visuals. Speech is direct-to-camera, energetic, and authoritative.

[00:00–00:03]
Subject: Male creator talking to camera.
Visuals: A fast-paced montage of high-fashion models in black suits and dresses against a mirrored, geometric background. Bold white text "THE BEST AI MARKETING TOOL OF 2025" overlays the center.
Camera: Medium shot of the creator; fast cuts for the montage.
Lighting: Warm key light on creator; high-contrast editorial light for models.
Speech: "This might be the best AI marketing tool I’ve tried this year."

[00:03–00:07]
Subject: A blue puffer jacket.
Visuals: A UI window shows a prompt: "360 degree shot of a blue puffer jacket." The jacket then appears and spins 360 degrees against a soft blue studio background.
Camera: Close-up on the jacket, rotating smoothly.
Lighting: Soft, even commercial lighting.
Speech: "It’s called Verv. You can take any physical product and instantly generate aesthetic images and videos with it."

[00:07–00:14]
Subject: Diverse fashion models.
Visuals: Rapid cuts: 1) A woman in the blue puffer jacket on a city rooftop at sunset. 2) An East Asian man in the same jacket in a loft. 3) A man in a red blazer in a gallery. 4) A woman in a blue dress on a beach.
Camera: Mix of medium shots and close-ups; cinematic slow motion.
Lighting: Sunset golden hour, indoor industrial, and bright beach daylight.
Speech: "But these are not the standard AI visuals. You can tell immediately these have taste and a visual art direction you typically don't see from an AI tool."

[00:14–00:26]
Subject: Tool UI and various models.
Visuals: Scrolling through a grid of "Style Filters" like "Cinematic Noir," "Earthy Natural," and "Scandinavian Minimal." Each filter shows a preview image of a model.
Camera: Screen recording of the app UI.
Lighting: Dark mode UI.
Speech: "Now the secret sauce with Verv is that you’re able to pre-select from a bunch of different aesthetic style filters. And these were designed by legit Hollywood artists and directors."

[00:26–00:34]
Subject: "Western Revival" style.
Visuals: A woman in a long dress walking through a grassy field; a close-up of a leather boot stepping through wildflowers with sun flare.
Camera: Low angle, tracking shots, slow motion.
Lighting: Warm, hazy sunset lighting with prominent lens flares.
Speech: "You get access to things like Cinematic Noir, Scandinavian Minimal, and Nostalgic Cool. My personal favorite is Western Revival, which absolutely slaps for clothing and leather goods."

[00:34–00:41]
Subject: UI walkthrough.
Visuals: A user uploads a photo of a brown leather bag, selects "Western Revival," adds a prompt "A man carrying his golf bag down a scenic golf course," and hits "Create."
Camera: Close-up on the mobile UI.
Lighting: Bright, clean interface.
Speech: "After you pick a style, all you have to do is upload a product image, add an optional prompt, and hit generate."

[00:41–00:52]
Subject: Website redesign.
Visuals: A "Before" shot of a minimalist clothing website with a flat product photo of a dress. An "After" shot shows the same website but with a high-end editorial photo of a model wearing the dress in a stylish setting.
Camera: Split-screen comparison and scrolling web view.
Lighting: Clean, bright web aesthetic.
Speech: "And out of the box, these results are pretty amazing. The best use cases for Verv are for social media ad creative and website redesigns. We took a website without imagery and used Verv to completely level it up in just a couple minutes."

[00:52–01:04]
Subject: Male creator closing.
Visuals: Creator talking to camera, gesturing with his hands. Large text "Verv.fm" appears at the bottom.
Camera: Medium shot.
Lighting: Consistent warm studio lighting.
Speech: "And look how much of a difference that makes. It's pretty amazing. Because the design styles are pre-built, it makes it super easy to generate a ton of images and videos that all match a consistent visual aesthetic. Huge props to the team for building this. If you want to try it out, it’s Verv.fm."

NEGATIVE PROMPT: Robotic speech, monotone delivery, plastic skin texture, distorted limbs, flickering backgrounds, blurry product details, inconsistent clothing colors, low-resolution UI, harsh digital noise, mismatched lip-sync, floating objects.

SPEECH PACK:
[00:00-00:03]
TAKE_A: "This might be the best AI marketing tool I’ve tried this year." (Energetic, high pitch on "best")
TAKE_B: "This is easily the most impressive AI marketing tool of the year." (Confident, steady pace)
TAKE_C: "I think I found the best AI marketing tool of 2025." (Future-focused, emphasis on "2025")

[00:14-00:20]
TAKE_A: "Now the secret sauce with Verv is that you’re able to pre-select from a bunch of different aesthetic style filters." (Fast-paced, emphasis on "secret sauce")
TAKE_B: "The magic happens when you choose from these pre-built aesthetic style filters." (Warm, explanatory)
TAKE_C: "Verv’s secret is these curated style filters that give you instant art direction." (Punchy, direct)

[00:58-01:04]
TAKE_A: "If you want to try it out, it’s Verv.fm." (Clear, direct CTA)
TAKE_B: "Go check it out for yourself at Verv.fm." (Inviting)
TAKE_C: "Head over to Verv.fm to start building your brand visuals." (Action-oriented)
Video
Kallaway
GLOBAL LOCK: One single male creator remains consistent across the full video: a light-skinned man in his late 20s to early 30s with a slim build, wearing a black baseball cap and black hoodie, speaking directly to camera from a dark creator studio with subtle blue and warm accent lighting. The video is a vertical 9:16 tutorial about an “ultimate AI cheat code” for recreating image styles using visual analysis, reference images, style reference codes, prompt breakdowns, and image-generation workflows. On-screen visuals include cinematic image grids, red and black graphic compositions, moodboard-like galleries, prompt boxes, style reference code text, ChatGPT or AI assistant windows, and image-generator interfaces. The editing style alternates between talking-head explanation and crisp screen recordings, with bold subtitle emphasis and rapid creator-education pacing. Speech is single-speaker, clear, energetic, and instructional, with high lip-sync importance whenever the creator is on screen.

[00:00-00:06] Open with a strong hook calling this the ultimate AI cheat code. Flash multiple stylized image examples on screen, including cinematic portraits, surreal visuals, and polished art-directed compositions. The creator speaks directly to camera in a medium close-up, hands raised to stress the promise.

[00:06-00:14] Show how the method starts from any image or visual example. Alternate between the creator and moodboard grids of different aesthetics, including pink sunset scenes, red graphic posters, and cinematic portraits. The creator explains that the system can analyze style rather than just copy random prompts.

[00:14-00:22] Move into the reference and analysis stage. Display image-library interfaces, style examples, and tools that inspect visual characteristics. The creator explains that visual style is hidden inside references, not just in obvious prompt text. Screen recordings should be crisp and legible.

[00:22-00:31] Introduce style reference codes and code-like descriptors. Show a clean screen with “Style Reference Codes” or similar text, followed by example outputs generated from these references. The creator describes how the code or extracted pattern can be applied to other images to keep a consistent visual language.

[00:31-00:40] Bring in AI assistant windows or chat interfaces where the creator asks for word-based breakdowns of the visual style. Display prompt boxes, short analytical responses, and extracted descriptors that summarize lighting, palette, mood, composition, and texture. He explains that words plus references create stronger reproduction.

[00:40-00:49] Show comparison grids and more style examples across different subjects. The creator explains how you can take one visual system and reuse it on other scenes, people, or concepts. The interfaces display image sets, generated outputs, and moodboard transitions to demonstrate consistency.

[00:49-00:55] End on the creator in close-up with a concise final takeaway that the easiest way to recreate strong visuals is to combine references, extracted words, and style codes rather than guessing prompts from scratch. Finish with confident tutorial energy and a direct promise of better outputs.

NEGATIVE PROMPT: multiple presenters, podcast microphones, bright casual room, unrelated stock footage, blurry UI, no image grids, no reference code text, no AI assistant windows, generic filler b-roll, identity drift, unsynced lips, cartoon overlays, or slow low-energy pacing.

SPEECH PACK: Single male tutorial speaker only. Fast creator-educator cadence, crisp articulation, close-mic dry sound, emphasis on terms like style, references, words, codes, and images, high lip-sync importance in all talking-head segments, no second voice.
Video
GLOBAL LOCK:
Subject: A Caucasian woman in her late 20s, blonde hair tied in a neat ponytail, wearing a leopard-print (cheetah pattern) blouse.
Environment: A cozy home studio/office background with dark grey walls, wooden bookshelves filled with books, green indoor plants, and soft dual-tone lighting (warm orange light from one side, cool blue light from the other).
Camera: MCU (Medium Close-Up) framing, eye-level, 35mm lens feel with shallow depth of field.
Style: Professional UGC creator aesthetic, high-quality video, crisp audio.
Speech: Direct-to-camera delivery, energetic and authoritative tone.

[00:00–00:05]
Visual: Rapid montage of extreme macro close-ups (ECU). First, a human eye with visible iris patterns and eyelashes. Second, an ear with a gold hoop earring showing skin texture. Third, a wrist with a simple black line tattoo showing skin pores and fine hairs.
Action: Static macro shots.
Lighting: Bright, natural daylight feel for the macros.
Text Overlay: "most AI" -> "look fake" -> "because" -> "is trained".
Speech: "Most AI images look fake for one reason. Because AI is trained to remove flaws."

[00:05–00:11]
Visual: The woman (Subject) in the MCU studio setting, gesturing with her hands. Floating icons of AI tools (ChatGPT, Freepik, Ideogram, Nano Banana) appear around her.
Action: Subject talks directly to the camera, moving hands to emphasize points.
Lighting: Studio setup (Orange/Blue).
Text Overlay: "need" -> "AI tools" -> "to prompt".
Speech: "But we don't need better AI tools. We just need to prompt the model to create images that actually look real."

[00:11–00:21]
Visual: Transition to a black screen with white text titled "Master Prompt". The text scrolls or highlights specific sections. Then, a split screen showing the woman talking in a small window and the prompt text in a larger window.
Action: Subject continues talking while the prompt text is displayed.
Lighting: Studio setup for the talking head.
Text Overlay: "to create" -> "that actually" -> "look real".
Speech: "The key to realistic AI images is using a prompt with a specific structure. This prompt should force skin detail, including visible pores, uneven tone, and natural imperfections."

[00:21–00:30]
Visual: Montage of AI-generated faces with high realism. A man's face with stubble and pores, a woman's face with freckles and slight redness. Then, a screen recording of the Freepik interface showing a gallery of realistic portraits.
Action: Fast cuts between the portraits and the UI.
Lighting: Varied, matching the generated images.
Text Overlay: "most people start" -> "make" -> "image".
Speech: "Most people start their prompt with 'make a realistic image of'. I start by telling the model how the camera behaves."

[00:30–00:42]
Visual: Screen recording of a prompt being typed into a text box. Keywords like "iPhone 14 Pro", "handheld framing", and "imperfect composition" are highlighted in yellow.
Action: Scrolling through the prompt text.
Lighting: Digital UI.
Text Overlay: "model that" -> "camera behaves" -> "casual hand" -> "imperfect composition".
Speech: "Casual handheld framing, slightly imperfect composition, and a smartphone camera perspective. This alone already breaks the AI look."

[00:42–00:52]
Visual: The woman back in the MCU studio setting. She gestures toward floating app icons for "Enhancor" and "Higsfield". A screen recording shows a "Skin Enhancer" tool being used on a photo of a woman with goggles.
Action: Subject explains the final step.
Lighting: Studio setup.
Text Overlay: "But Most People Stop There" -> "Final Step" -> "Most Creators Are Gatekeeping".
Speech: "But most people stop there. I use a final step that most creators are gatekeeping. I run each image through a final skin enhancement step using Enhancor or Higsfield."

[00:52–01:00]
Visual: The woman in MCU, pointing down toward a text box that says "Comment GUIDE". A final zoom-out effect or a slight blur transition.
Action: Subject smiles and points.
Lighting: Studio setup.
Text Overlay: "Prompt Structure" -> "Workflow" -> "Comment GUIDE".
Speech: "If you want my exact prompt structure and the full workflow, just comment GUIDE and I'll send it over."

NEGATIVE PROMPT:
Smooth skin, plastic texture, perfect symmetry, airbrushed look, 6 fingers, distorted eyes, watermark, logo, blurry background (unless specified), robotic voice, lip-sync lag, harsh sibilance, flickering lights, low resolution.

SPEECH PACK:
[00:00-00:05] "Most AI images look fake for one reason. Because AI is trained to remove flaws."
[00:05-00:11] "But we don't need better AI tools. We just need to prompt the model to create images that actually look real."
[00:11-00:21] "The key to realistic AI images is using a prompt with a specific structure. This prompt should force skin detail, including visible pores, uneven tone, and natural imperfections."
[00:21-00:30] "Most people start their prompt with 'make a realistic image of'. I start by telling the model how the camera behaves."
[00:30-00:42] "Casual handheld framing, slightly imperfect composition, and a smartphone camera perspective. This alone already breaks the AI look."
[00:42-00:52] "But most people stop there. I use a final step that most creators are gatekeeping. I run each image through a final skin enhancement step."
[00:52-01:00] "If you want my exact prompt structure and the full workflow, just comment GUIDE and I'll send it over."
Video
GLOBAL LOCK: A high-definition screen recording of a web browser. The interface is the Freepik website in dark mode. The cursor is a standard white arrow. The subject identity is a consistent AI-generated character: a blonde woman with a friendly, professional appearance, light skin tone, and casual-chic wardrobe. The environment is the Freepik AI Image Generator workspace. The lighting is the digital glow of the UI. The color grade is clean, high-contrast, and modern. The speech is a warm, enthusiastic female voiceover, recorded with a close-mic, dry studio signature.

[00:00–00:02]
The browser is on the Freepik homepage. The cursor moves smoothly toward the "AI Suite" menu item in the top navigation bar.
Speech: "This is Nano Banana Pro."
Lip-sync: N/A (Screen recording)

[00:02–00:05]
The cursor clicks "AI Suite" and then selects "AI Image Generator." The page transitions quickly to the generator workspace.
Speech: "I spent the last two days testing it."

[00:05–00:08]
The user clicks the model selection dropdown. The list scrolls down to reveal "Google Nano Banana Pro." The cursor selects it.
Speech: "It is mind-blowing."

[00:08–00:10]
The user clicks the "Character" tab. A grid of faces appears. The cursor selects the first character, a blonde woman labeled "@johanne."
Speech: "Look at how it handles character consistency."

[00:10–00:13]
The cursor clicks into the prompt box. Text appears rapidly as if pasted: "@johanne - Hyper-realistic studio podcast scene featuring the man sitting across from a bearded neuroscientist in a dim, moody podcast studio..." The user then clicks the "9:16" aspect ratio icon.
Speech: "You just drop in your prompt, pick your ratio..."

[00:13–00:15]
The "Generate" button is clicked. After a brief loading animation, a 2x2 grid of four cinematic, high-quality images appears, showing the character in a professional podcast setting with warm, moody lighting.
Speech: "...and the results are professional grade. Comment 'AI' to try it."

NEGATIVE PROMPT: Visual artifacts, blurry UI text, shaky camera, external glare on screen, messy browser tabs, slow loading times, robotic voiceover, harsh sibilance, background noise, inconsistent character features, low-resolution AI results.

SPEECH PACK:
[00:00–00:05]
TAKE_A: "This is Nano Banana Pro. I spent the last two days testing it." (Enthusiastic, fast-paced)
TAKE_B: "Check out Nano Banana Pro. I've been playing with this for two days straight." (Casual, conversational)
TAKE_C: "You need to see Nano Banana Pro. Two days of testing and I'm hooked." (Authoritative, punchy)

[00:05–00:15]
TAKE_A: "It is mind-blowing. The character consistency is perfect. Just paste your prompt and hit generate. Comment AI for the link." (Clear, instructional)
TAKE_B: "It's honestly mind-blowing. Look at that consistency! Set your ratio, hit generate, and boom. Comment AI to get access." (Excited, high energy)
TAKE_C: "Mind-blowing results. It keeps the character perfectly. One click and you're done. Comment AI and I'll send it over." (Direct, CTA-focused)
Video
GLOBAL LOCK:
Subject is a Caucasian male, mid-30s, with a well-groomed dark beard and mustache. In the cinematic sequence, he is wearing a full suit of polished silver medieval knight armor with intricate engravings. He wears a dark green baseball cap backwards under his helmet or as a stylistic choice. The environment is a dramatic, smoky battlefield with an overcast, moody sky and orange flames/explosions in the background. The color grade is cinematic, desaturated with high contrast and warm highlight roll-off from the fires. Camera movement is dynamic, following the subject.

[00:00–00:05]
Split-screen view. Bottom: Creator talking to camera in a white/black striped hoodie and "VANS" cap. Top: A dark digital interface showing a node-based workflow with lines connecting "Creation," "Text," and "Image Generator" boxes. The creator points down toward the microphone.

[00:05–00:10]
Top screen: A full-body photo of the male subject in a white t-shirt and striped pants against a white wall. The background of the photo then turns into a bright, solid green screen.

[00:10–00:15]
Top screen: Individual 3D-rendered silver armor pieces (gauntlet, chest plate, greaves) float around the subject on the green screen, then snap onto his body, replacing his clothes.

[00:15–00:25]
Top screen: The subject, now in full knight armor, is seated on a majestic white horse. The background is still a green screen. A white horse asset appears and he is composited onto it.

[00:25–00:45]
Top screen: A cinematic wide shot of the knight on the white horse galloping through a war-torn field. Thick grey smoke billows behind him. He holds a large red and green flag with a "GenHQ" logo that waves violently in the wind. Explosions of orange fire erupt in the background. The camera tracks the horse's movement with a slight handheld shake.

[00:45–00:51]
The cinematic knight sequence continues. Large white text "Comment 'AI'" is centered on the screen. The creator in the bottom frame continues to speak and gesture enthusiastically. The horse slows to a trot as the flag continues to wave.

NEGATIVE PROMPT:
Visual: robotic movement, distorted face, inconsistent armor textures, blurry horse legs, floating objects, cartoonish colors, low resolution, flickering lighting, extra limbs, text/logos other than specified.
Speech: robotic tone, muffled audio, background noise, lip-sync mismatch, stuttering, flat delivery.

SPEECH PACK:
[00:00–00:05]
"This new method of creating AI generated content gives us so much control over the output."
TAKE_A: (Enthusiastic, fast-paced) "This new method of creating AI generated content gives us so much control over the output!"
TAKE_B: (Authoritative, measured) "This new method... of creating AI generated content... gives us so much control over the output."
TAKE_C: (Casual, friendly) "Check this out—this new AI method gives you total control over what you're making."

[00:45–00:51]
"So if you want to try this out for yourself, type AI in the comments and I'll send you the link."
TAKE_A: (Direct, urgent) "So if you want to try this out for yourself, type AI in the comments and I'll send you the link!"
TAKE_B: (Warm, inviting) "Want the link? Just type AI in the comments and I'll send it right over."
TAKE_C: (Punchy, instructional) "Type AI below and I'll DM you the link to try this yourself."
Video
GLOBAL LOCK: The video consists of a series of "impossible POV" shots where the camera is placed inside objects. The visual style is consistently cinematic, photorealistic, and high-detail. Lighting is motivated by the environment, often warm and soft. The camera uses macro or wide-angle lenses depending on the internal space. Textures like skin, metal, and liquid are hyper-detailed.

[00:00–00:02]
Subject: A young Caucasian girl with light brown hair, wearing a dark blue hoodie.
Environment: Viewed from inside an open human mouth. The camera is placed on the tongue.
Action: The girl leans forward toward the camera as if to kiss it or look closely.
Framing: Extreme macro. The upper and lower rows of teeth and pink gums frame the top and bottom of the image.
Lighting: Soft, natural light coming from behind the girl, creating a slight rim light on her hair.
Motion: Subtle movement of the girl's head and the camera's slight handheld shake.

[00:03–00:06]
Subject: A mailman in a blue uniform and gloves.
Environment: Viewed from inside a dark metal mailbox looking out onto a city street. A brown UPS truck is parked in the background.
Action: The mailman opens the door and slides a stack of white envelopes into the mailbox toward the camera.
Framing: Wide-angle POV. The dark interior of the mailbox frames the street scene.
Lighting: Bright, overcast daylight outside; the interior of the box is in deep shadow.
Motion: Fast motion of the mail being inserted.

[00:07–00:10]
Subject: A person's fingers (macro skin texture).
Environment: Viewed from inside the eye of a large sewing needle.
Action: A thick, blue-colored thread is being pushed through the eye of the needle toward the camera.
Framing: Microscopic macro. The scratched, silver metallic edges of the needle eye dominate the frame.
Lighting: Harsh, direct studio lighting highlighting the metallic texture and skin pores.
Motion: Slow, deliberate threading motion.

[00:11–00:15]
Subject: A man's eye and forehead.
Environment: Viewed from inside an antique brass clock mechanism.
Action: A man stares through a circular opening in the clock face, his eye moving as he inspects the gears.
Framing: Close-up. Large, out-of-focus brass gears and springs frame the circular opening.
Lighting: Warm, golden light reflecting off the brass components.
Motion: Rotating gears in the foreground; the man's eye blinks and shifts.

[00:16–00:19]
Subject: Carbonated dark liquid (cola).
Environment: Viewed from the bottom of a metallic soda can, looking upward toward the opening.
Action: Dark liquid rushes into the can, creating violent streams and a mass of dense, fizzy bubbles that explode toward the lens.
Framing: Dynamic POV. The circular opening of the can is at the top of the frame.
Lighting: Backlit through the can opening, creating high-contrast highlights on the bubbles.
Motion: Fast, turbulent fluid dynamics.

[00:20–00:23]
Subject: A coastal landscape with a lighthouse.
Environment: Viewed from deep within the cranial cavity of a weathered, sun-bleached skull resting on a beach.
Action: Static landscape shot.
Framing: The two eye sockets and nasal cavity of the skull frame the ocean and lighthouse in the distance. Cobwebs are visible inside the skull.
Lighting: Natural, diffused daylight.
Motion: Subtle waves in the background; slight camera drift.

[00:24–00:28]
Subject: The internal anatomy of a flower.
Environment: Viewed from the center of a blooming tulip or lily.
Action: Looking outward from the base of the pistil.
Framing: Large, yellow stamens with pollen grains tower like pillars around the frame; soft pink/orange petals form the "walls."
Lighting: Bright, ethereal sunlight filtering through the translucent petals.
Motion: Pollen particles floating in the air; gentle swaying of the petals.

[00:29–00:33]
Subject: A girl blowing out a candle.
Environment: A birthday cake with colorful blue and yellow frosting.
Action: The camera is placed low in the frosting. A girl leans down into the frame and blows toward a single lit candle.
Framing: Low-angle macro. Swirls of frosting frame the bottom and sides.
Lighting: Warm, flickering candlelight; soft bokeh of party lights in the background.
Motion: The flame flickering and then being extinguished; smoke rising.

[00:34–00:38]
Subject: Text on black background.
Action: "COMMENT 'ARCADS' FOR THE PROMPTS" appears in bold white and yellow font.

NEGATIVE PROMPT: blurry, low resolution, distorted anatomy, extra fingers, cartoonish, 2D, flat lighting, watermark, text (except for the intended overlays), shaky camera, glitchy transitions, unrealistic physics.

SPEECH PACK:
[00:00–00:33] No speech, only background music.
[00:34–00:38] Text-to-speech or silent CTA.
Transcript: "Comment 'ARCADS' for the prompts."
TAKE_A: (Energetic) Comment ARCADS for the prompts!
TAKE_B: (Direct) Just comment ARCADS and I'll send you the prompts.
TAKE_C: (Casual) Want these? Comment ARCADS.
Video
Create a vertical 9:16 premium AI model promo visual featuring an ultra-realistic close-up portrait of a young woman facing directly into camera against a dark teal background. She has fair skin, dark hair pulled back, subtle natural makeup, and translucent amber-orange eyeglasses catching a precise highlight across the frame. The lighting should be soft but dramatic, sculpting the face with studio precision and emphasizing realistic skin texture, calm eyes, and balanced symmetry. In the composition, glowing yellow ImagineArt 1.0 text appears in the upper right, while Most Realistic AI Model is set large at the bottom like bold creator-marketing typography. The overall feeling should be a polished product ad announcing a highly realistic character-generation model for creators and brands. No clutter, no subtitles, no cartoon styling.
Video
GLOBAL LOCK: 
Subject is a young woman with long, wavy dark brown hair, fair skin with warm undertones. She wears a white ribbed turtleneck sweater and a delicate gold necklace. The environment is a professional studio with a soft, out-of-focus purple and pink gradient background. Lighting is soft three-point studio lighting with a subtle purple rim light on the subject's hair. Camera is a high-quality 4k sensor, 35mm lens feel, shallow depth of field. Speech is direct-to-camera, energetic, clear, and authoritative.

[00:00–00:01]
Split screen composition. Top half: A glossy 3D app icon featuring a stylized white face with glowing neon visor and the text "UNCENSORED" in a red banner. Bottom half: The subject speaking directly to the camera, smiling slightly. Camera is static, MCU.
Speech: "If you go to this"

[00:01–00:03]
Full screen graphic overlay. A 2x3 grid of popular AI tool logos (Runway, Sora, Midjourney, etc.) on black rounded-square backgrounds. The logos appear with a slight pop-in animation.
Speech: "website you get unlimited video"

[00:03–00:04]
The grid of logos changes to a new set of icons including the OpenAI logo and others. Text overlay "generation," appears in yellow.
Speech: "and image generation,"

[00:04–00:07]
Screen recording of a mobile UI. A dark-themed list of AI models scrolls vertically. Models include "Gemini 3 Uncensored," "Model T 2.0 Extended," and "Claude Opus 4.6." Some are marked "CENSORED" in grey, others "UNCENSORED" in blue. Text overlay "AI tools Completely Free all in One place" appears in bold white and yellow.
Speech: "and you can use all premium AI tools completely free all in one place."

[00:08–00:09]
Close-up of the UI. A finger (or cursor) selects "Nano Banana Pro" from a dropdown menu. A text input box says "Describe the image you want to generate in detail."
Speech: "Simply choose your AI model, write"

[00:09–00:10]
The word "your" is typed into the prompt box.
Speech: "your prompt"

[00:10–00:11]
Cinematic AI-generated image: A close-up portrait of a beautiful woman with wind-swept brown hair, golden hour lighting, extremely detailed skin texture, and expressive green eyes.
Speech: "and within just one minute"

[00:11–00:12]
Cinematic AI-generated image: A woman in a yellow vintage outfit and hat, surrounded by yellow flowers, soft cinematic lighting, 35mm film aesthetic.
Speech: "it will create high"

[00:12–00:13]
Cinematic AI-generated video: A woman in a navy tracksuit running happily on a beach with a brown dog jumping beside her. Overcast sky, realistic waves, handheld camera movement.
Speech: "quality images and videos"

[00:14–00:15]
UI demonstration: A cursor clicks a green "Download" icon on a dark interface.
Speech: "that you can customize and download."

[00:16–00:18]
Return to the subject in the studio. MCU, static. She gestures with her hands while speaking. Text overlay "comment Tool" and "send it" appears.
Speech: "Want the link? Comment 'Tool' and I'll send it to you."

NEGATIVE PROMPT:
Visual: blurry face, distorted logos, low resolution, messy background, harsh shadows, unnatural skin texture, flickering overlays.
Speech: robotic voice, monotone delivery, background noise, muffled audio, lip-sync mismatch, stuttering, long silences.

SPEECH PACK:
[00:00-00:01] "If you go to this"
TAKE_A: (Rising intonation, high energy) "If you go to this..."
TAKE_B: (Direct, pointing gesture) "If you go to THIS..."
TAKE_C: (Whisper-like, secretive) "If you go to this..."

[00:01-00:07] "website you get unlimited video and image generation, and you can use all premium AI tools completely free all in one place."
TAKE_A: (Fast-paced, emphasizing "unlimited" and "free")
TAKE_B: (Rhythmic, pausing after "generation")
TAKE_C: (Excited, high pitch on "all in one place")

[00:08-00:15] "Simply choose your AI model, write your prompt and within just one minute it will create high quality images and videos that you can customize and download."
TAKE_A: (Instructional, calm but steady)
TAKE_B: (Fast, emphasizing "one minute")
TAKE_C: (Awe-struck tone during "high quality")

[00:16-00:18] "Want the link? Comment 'Tool' and I'll send it to you."
TAKE_A: (Friendly, inviting, direct eye contact)
TAKE_B: (Urgent, pointing at the camera)
TAKE_C: (Casual, smiling)
Video
GLOBAL LOCK: Vertical 9:16 UGC tutorial reel with a persistent two-layer presentation style: the upper 60 to 70 percent of the frame shows demonstrations, screenshots, typed prompts, and generated image results; the lower portion shows the same male creator speaking directly to camera in a rounded-corner selfie window for most of the video. The creator is a white male in his late 20s to mid 30s, medium-length wavy dark brown hair, short beard and mustache, expressive eyebrows, average build, casual creator aesthetic. Keep his delivery energetic, friendly, and persuasive. Wardrobe changes are intentional by section: white tee and cream Vans cap at the opening studio desk, blue polo and backward cap for the main explainer section, yellow suit jacket and black top hat for the final gag CTA. Upper-frame design alternates between a white studio opening, black presentation slides branded "Google Nano Banana" with a banana emoji, product-demo image canvases, and dark Freepik interface screens on a soft orange-blue gradient background. The reel should feel like an AI creator tutorial ad: quick but readable, clean text overlays, obvious prompt boxes, high contrast UI, fast social pacing, light jump cuts, and consistent bottom talking-head commentary. Speech style is single-speaker direct-to-camera tutorial English with crisp articulation, upbeat cadence, short persuasive sentences, and creator-economy CTA energy. Audio should sound like a close phone or lav mic in a quiet room, lightly compressed, dry, intelligible, and synced to the speaker window.

[00:00-00:04.50] Open on a bright white studio setup. The upper frame shows the colorful Google wordmark above the title "Nano Banana" with a banana emoji. Centered below it, the creator sits behind a white table in a cream Vans cap and light shirt, leaning toward a turquoise striped cup-shaped microphone or tumbler. Softbox lights are visible on both sides, making the setup feel like a casual creator studio. In the lower portion of frame, a separate rounded-corner selfie video of the same man begins speaking directly to camera. He introduces the tool with immediate enthusiasm. Lips are fully visible in the lower video; lip-sync strictness high for the first spoken hook.

[00:04.50-00:10.00] Cut to a black presentation layout branded "Google Nano Banana" at the top. The upper demo area shows a bright outdoor image of the creator on a Grand Canyon style cliff-edge walkway, arms stretched, backpack on, huge sky and canyon behind him. A prompt box appears under the image and begins typing "Make it into a youtube thumbnail". The lower selfie speaker remains on screen in the blue polo and backward cap, gesturing with one hand while explaining the edit. The tone is excited, helpful, and a little amazed. Keep the typed prompt animation readable and central.

[00:10.00-00:14.50] The same canyon image updates into a louder thumbnail treatment with giant curved yellow "GRAND CANYON" text behind the creator’s head. Emphasize the before-and-after value clearly: same base photo, more clickable YouTube-style packaging. The lower speaker continues talking in sync with hand gestures. Audio remains a crisp tutorial voice, no music overpowering the speech.

[00:14.50-00:20.50] Transition to a luxury product-edit example. In the upper frame, a prompt card reads "Replace the bottle" with a small reference thumbnail, then the output becomes a glossy Dior Sauvage-style perfume bottle on swirling golden light trails over a dark brown-black studio background. Maintain premium ad aesthetics, reflective glass, centered bottle, and luminous streaks. The lower talking-head explains the edit use case, likely referencing product replacement or image transformation. Speech stays fast, punchy, and creator-friendly.

[00:20.50-00:24.00] Briefly show another generated image example in the upper area, including a polished portrait-style output that demonstrates broader image editing capability beyond product swaps. Keep the cut quick and social-first, serving as visual proof rather than a full tutorial pause. The bottom speaker window continues uninterrupted, preserving continuity.

[00:24.00-00:31.50] Move into the software walkthrough. The upper frame now shows the Freepik dark UI over a soft gradient backdrop, starting with an AI Suite menu containing categories like image tools, video tools, audio tools, and design tools. Then zoom into the model panel where "Google Nano Banana" is selected, with image reference slots, style/composition/effects/character/object controls, and a beta disclaimer about aspect ratio. The creator in the lower window counts features with his fingers while describing how to access the workflow. Keep the UI readable enough for social tutorial viewing, but still fast-paced.

[00:31.50-00:36.50] Continue the interface demo with more dark UI panels, prompt fields, thumbnails, and settings sections scrolling or cutting through the workflow. The creator keeps speaking in direct, practical language, as if walking viewers through where to click and how to upload references. Camera on the lower speaker remains static, head-and-shoulders, neutral indoor room with door and wall behind him.

[00:36.50-00:43.00] End with a comedic CTA transformation. The upper frame shows a prompt reading "Give him a sign to hold" while the creator appears dressed like a theatrical ringmaster or showman in a yellow jacket and tall black top hat on a sunlit balcony. He holds a handmade cardboard sign that reads "Comment AI and I'll send you the link!" The lower talking-head still speaks beneath, landing the call to action. The final beat should feel playful, persuasive, and optimized for comments. Lip-sync remains visible in the lower window; key sync accents should land on the CTA words "comment AI" and "send you the link".

NEGATIVE PROMPT: extra fingers, warped hands during gesturing, drifting facial hair, inconsistent eye color, duplicated selfie windows, unreadable UI, misspelled "Google Nano Banana", broken prompt boxes, random logos, muddy text, incorrect YouTube thumbnail lettering, deformed perfume bottle glass, floating product shadows, overexposed softboxes, messy background clutter, cinematic bokeh that hides the tutorial content, abrupt framing jumps, desynced speech, robotic cadence, slurred consonants, harsh sibilance, echoey room tone, loud background music, clipping, pumping compression, lip-sync mismatch, subtitle blocks covering the demo.

SHOT PROMPTS:
SHOT_1 [00:00-00:04.50]: White studio opener, Google Nano Banana title, creator at desk with Vans cap and turquoise cup, bottom selfie explainer starts.
SHOT_2 [00:04.50-00:10.00]: Black branded demo screen, Grand Canyon reference photo, typed prompt box for YouTube thumbnail conversion, bottom speaker explains.
SHOT_3 [00:10.00-00:14.50]: Thumbnail result reveal with giant GRAND CANYON text, same split-screen layout, energetic creator commentary.
SHOT_4 [00:14.50-00:20.50]: Product-edit demo, perfume bottle replacement prompt, luxury golden-light result, bottom speaker continues.
SHOT_5 [00:20.50-00:24.00]: Quick alternate polished image result proving editing range.
SHOT_6 [00:24.00-00:31.50]: Freepik AI Suite walkthrough, dark UI menus, Google Nano Banana model selected, image reference slots and controls visible.
SHOT_7 [00:31.50-00:36.50]: More UI steps, prompt/settings panels, creator explains workflow and uploads.
SHOT_8 [00:36.50-00:43.00]: Final joke CTA, top hat outfit, cardboard sign asking viewers to comment AI for the link, bottom talking-head closes the pitch.

SPEECH PACK:
Timecoded transcript (best-effort, inferred from visible overlays and tutorial cadence):

[00:00-00:04.50]
TAKE_A: "Please use this if you have not already. It is a game changer."
TAKE_B: "If you are not using this yet, you need to. It is a total game changer."
TAKE_C: "This tool is a game changer, and you should absolutely be using it already."
Prosody: fast hook, confident, slightly urgent, friendly creator tone.

[00:04.50-00:10.00]
TAKE_A: "You can take an image like this and ask Nano Banana to turn it into something more clickable."
TAKE_B: "Watch this. I can upload a photo and prompt Nano Banana to make it into a YouTube thumbnail."
TAKE_C: "Here is a simple example. Drop in an image and tell it to make a YouTube-ready thumbnail."
Prosody: explanatory, upbeat, demonstration-first.

[00:10.00-00:14.50]
TAKE_A: "It keeps the subject but gives you a much stronger thumbnail treatment."
TAKE_B: "Same image, better packaging. That is why this is so useful for creators."
TAKE_C: "This is the kind of upgrade that makes basic content feel publish-ready."
Prosody: impressed, selling practical value.

[00:14.50-00:20.50]
TAKE_A: "You can also do product swaps, like replacing the bottle and turning it into a premium ad."
TAKE_B: "It is not just thumbnails. You can replace products and restyle the entire scene."
TAKE_C: "This works for product creatives too. Swap the object and it rebuilds the shot around it."
Prosody: persuasive, slightly faster, feature-stack delivery.

[00:20.50-00:24.00]
TAKE_A: "And it is not limited to one type of image either."
TAKE_B: "You can use the same workflow across different visual styles."
TAKE_C: "That flexibility is what makes the tool stand out."
Prosody: transitional, concise.

[00:24.00-00:31.50]
TAKE_A: "Inside Freepik, open the AI Suite, choose Google Nano Banana, and upload your image references."
TAKE_B: "If you want to try it, go into AI Suite, pick the Nano Banana model, then add your reference image here."
TAKE_C: "This is where it lives in Freepik. Select the model, drop your images in, and start prompting."
Prosody: instructional, practical, clear enunciation.

[00:31.50-00:36.50]
TAKE_A: "Then you can use the style, composition, effects, character, and object controls to shape the result."
TAKE_B: "From here you fine-tune the edit with the controls and prompt box."
TAKE_C: "Once the image is in, the rest is just directing the model with these tools."
Prosody: matter-of-fact, tutorial rhythm.

[00:36.50-00:43.00]
TAKE_A: "Want to try it? Comment AI and I will send you the link with unlimited generations on Freepik."
TAKE_B: "If you want access, comment AI and I will send you the link."
TAKE_C: "Comment AI for the link and I will send it over."
Prosody: bright CTA, direct ask, strong emphasis on "comment AI".
Video
GLOBAL LOCK: A vertical cinematic-teaching reel, approximately 47 seconds, designed as a visually rich prompt-and-framing tutorial for better AI-generated film stills. The video alternates between sample portrait or scene imagery and bold centered on-screen text that critiques low-quality AI aesthetics and then replaces them with concrete visual principles. The piece opens with a polished but generic blonde beauty portrait on a black background labeled as “low quality AI,” then pivots into stronger cinematic examples: moody urban night scenes under arches, distant silhouettes in fog, soft practical lighting, handheld-style portraits, and warm sunset close-ups of a short-haired woman. The overall color world leans teal-green shadows, warm amber highlights, subtle grain, and low-key cinematic contrast.

The structure is educational, not narrative. Text captions carry the teaching flow: first rejecting weak AI image habits, then introducing simple filmmaking rules such as better frames, one dominant camera perspective, warm sunset key light from one side, natural texture, contrast, and the idea that the work should visually prove itself. The imagery should feel like proof-of-concept boards or moving mood references rather than continuous story scenes. Most shots are carefully composed single moments: a woman framed in shallow light, two people under an urban arch, a hand-held close-up with soft night lighting, and other filmic fragments that demonstrate intentional cinematography.

The tone should feel confident, minimalist, and opinionated, like a creator explaining how to stop making generic AI portraits and start making cinematic images with stronger visual grammar. Visual priorities: centered all-caps instructional text, black separators or negative space, elegant comparison between generic beauty render and moodier cinematic frames, teal-and-amber grading, shallow depth of field, strong directional light, tasteful grain, and compact tutorial pacing. Avoid busy graphics, loud meme styling, or heavy voice-dependent explanation. The point is that the lesson is readable through image-plus-caption alone.
Video
GLOBAL LOCK:
The video features a split-screen layout. The bottom 30% contains a consistent male creator: Caucasian, mid-30s, brown beard, wearing a tan "Vans" trucker hat and a black quilted vest over a white t-shirt. He is in a home office/studio setting with soft indoor lighting. The top 70% features AI-generated cinematic footage. The AI footage must maintain high subject consistency, specifically a character resembling Leonardo DiCaprio in "The Wolf of Wall Street" (short brown hair, blue pinstripe suit, red polka dot tie). The environment is a luxury office with wood paneling. Lighting is cinematic, warm, and professional.

[00:00–00:03]
Subject: A man resembling Leonardo DiCaprio in a blue pinstripe suit and red polka dot tie.
Action: He holds a crisp one-dollar bill horizontally with both hands, looking directly into the camera with a slight, confident smile.
Camera: Medium close-up, static.
Lighting: Warm, high-key office lighting, soft shadows.
Speech: Creator says "It has never been easier to create multiple camera angles..."
Sync: Creator's lips visible in the bottom frame, high sync.

[00:03–00:07]
Visual: A 3x3 grid appears showing the same man from 9 different angles (overhead, profile, low angle, etc.). Then transitions to a Nike windbreaker jacket (black, red, white) floating in a surreal dark environment filled with glowing blue and purple crystals.
Action: The jacket rotates slowly.
Camera: Close-up on the jacket texture and Nike logo.
Lighting: Dramatic, neon-blue and purple rim lighting.
Speech: "...with consistency from a single reference image."

[00:08–00:13]
Subject: Three characters: a man (DiCaprio-lookalike), a blonde woman (Margot Robbie-lookalike in a black dress), and a muscular man with a goatee (Jon Bernthal-lookalike, shirtless with a gold chain).
Action: They stand together in a modern room with wooden doors and bookshelves. They look toward the camera.
Camera: Medium wide shot, slight handheld jitter for realism.
Lighting: Naturalistic indoor light from the side.
Speech: "So in today's video, I'm going to show you the best method..."

[00:14–00:20]
Visual: Screen recording of the Higgsfield "Shots" app interface. A cursor selects an image of a woman in a black dress and clicks a yellow "Generate" button.
Action: The UI transitions to show a grid of 9 generated black-and-white images of the woman.
Camera: Screen capture.
Speech: "Let's dive in. To get started, you can upload your image into Shots..."

[00:21–00:28]
Subject: A beautiful woman with dark hair in a flowing black dress.
Action: A montage of artistic shots: her looking at the camera, her back to the camera with hair blowing, her dancing with fabric flowing around her.
Camera: Various angles (CU, MCU, Profile), slow motion.
Lighting: High-contrast black and white, dramatic shadows, bright white background.
Text Overlay: "Comment AI" in bold white letters.
Speech: "So if you want to try this out for yourself, type AI in the comments and I'll send you a link."

NEGATIVE PROMPT:
Visual: Distorted faces, extra fingers, flickering background, blurry textures, inconsistent clothing colors, morphing objects, robotic movement, low resolution, watermark.
Speech: Robotic tone, muffled audio, background noise, lip-sync delay, stuttering, unnatural pauses.

SPEECH PACK:
[00:00-00:07]
Transcript: "It has never been easier to create multiple camera angles with consistency from a single reference image."
TAKE_A: (Enthusiastic, fast-paced) "It's NEVER been easier to create multiple camera angles... with total consistency... from just ONE image."
TAKE_B: (Educational, steady) "It has never been easier to create multiple camera angles with consistency... starting from a single reference image."

[00:21-00:28]
Transcript: "So if you want to try this out for yourself, type AI in the comments and I'll send you a link."
TAKE_A: (Direct, CTA-focused) "Want to try this? Type AI in the comments and I'll DM you the link right now."
TAKE_B: (Friendly, helpful) "If you want to try this out for yourself, just comment AI below and I'll send that link over."
Video
A creator-style educational video in vertical format featuring a woman speaking directly to the camera outdoors while holding a small handheld microphone. She stands in front of an industrial blue-gray wall and explains how to improve AI image or video generation by writing better prompts for character identity and consistency. As she talks, visual overlays appear around her, including example faces, UI screenshots, prompt text blocks, icon graphics, and sample outputs that illustrate her points. The camera remains steady in a medium shot while she gestures with one hand, points upward for emphasis, and delivers concise teaching segments with captioned key phrases. The mood is instructional, creator-native, confident, and optimized for social learning content.

How to Generate AI Images

Learning how to generate AI images usually starts with three simple decisions: what you want to make, which tool can make it, and how specific your first prompt should be. Beginners often get better results when they keep the first attempt focused on one subject, one style, and one clear purpose instead of trying to describe everything at once.

A practical first workflow looks like this. Choose a tool that matches your goal, write a prompt that names the subject and the visual style, then review the result and change one thing at a time. If the image is close but not right, adjust the background, lighting, composition, or style details instead of rewriting everything. The goal is to learn how the tool responds so each new attempt becomes easier to control.

FAQ

What do I need to start generating AI images?

You need a tool, a simple idea, and a prompt that describes the subject, style, and mood you want to see.

How should I write my first prompt?

Keep it specific but simple: name the subject, the style, and the kind of background or lighting you want.

What should I change if the image is wrong?

Adjust one part at a time, such as the style, background, or lighting, so you can see what actually changed the result.

How do I get better results over time?

Save prompts that work, compare outputs, and keep refining the parts that make the biggest difference in the image.

How to Generate AI Images: Beginner Step-by-Step Guide | Alici.AI