@imma.gram content — AI art

Can a virtual human interview @steveaoki ??? I was too nervous 😫 What should I ask next when I see him??? @steveaoki 先生のライブ前に 人生初!のインタビューを挑んでみたっ けど緊張しすぎてガチガチ笑🙈💦

How imma.gram Built This Steve Aoki Interview Meme — and How to Recreate It

This frame is a perfect example of “social proof without bragging.” The headline tells you something big happened, but the emotion is small and human: nervousness. That contrast is why people stop scrolling. It’s not “look how cool I am,” it’s “I did something scary and I’m still processing it.”

Visually, it’s built like a meme: clean POV text at the top, a single loud word in the middle, and a wide shot that clearly shows two people in a waiting-room setup. The scene feels like a backstage moment you weren’t supposed to see—which is exactly what makes it comment-worthy. Viewers don’t just consume it; they want to respond with advice, questions, and “what happened next?”

Why it went viral (the signals and the mechanics)

Signal Evidence (from this image) Mechanism Replication Action
High-status context, low-ego tone The top text references interviewing a notable guest, but the emotion is “I was nervous” Humility makes the flex shareable and relatable Pair a big moment with a vulnerable feeling (“I was nervous / I forgot what to say”)
Meme-first typography Rounded white box with bold black text; one hot-pink emphasis word Instant comprehension + screenshotability Use a consistent POV box style and one mid-frame emphasis word per post
Backstage realism Waiting-room chairs, water pitcher, posters, plain walls “Unfiltered” environment feels authentic and behind-the-scenes Choose ordinary rooms (green room, hallway, studio corner) and keep the set uncluttered
Clear social dynamic Two people seated facing the same direction; one looks nervous Viewers can project a story and fill in the dialogue Stage a simple two-person setup with readable body language (nervous vs relaxed)

Best-fit scenarios

  • Collab teasers: announce a guest or partnership without a polished press-release vibe.
  • Event day diaries: capture the “before the moment” nervousness—people love the lead-up.
  • Series hooks: make it Episode 1 (“I was nervous”), then follow with the best question you asked.
  • Community prompts: ask viewers what you should ask next; it turns a post into a crowdsourced script.

Not ideal

  • Evergreen tutorials: this format is moment-driven; it needs a specific story context.
  • Highly aesthetic feeds: the charm is “real room energy,” not perfect styling.
  • Multi-point announcements: you get one strong sentence—don’t overload it.

Transfers (exactly 3)

  1. Recipe 1: Swap the guest, keep the emotion

    • Keep: POV text box + wide two-chair setup + vulnerable emotion
    • Change: the guest name and the feeling (nervous → excited → starstruck)
    • Slot template: “POV: I interviewed {guest} but I was {emotion}”
  2. Recipe 2: Swap the location, keep the layout

    • Keep: top POV box and one mid-frame emphasis word
    • Change: waiting room → hallway → backstage curtain → café corner
    • Slot template: “{two-person scene} in {location}, top POV text: {one sentence}, emphasis word: {1 word}”
  3. Recipe 3: Creator-to-creator version

    • Keep: ordinary room + honest caption tone
    • Change: replace ‘interview’ with ‘collab / studio session / first meeting’
    • Slot template: “POV: I met {creator} for a {first-time moment} and I was {emotion}”

Aesthetic read: why the room matters as much as the text

The beige office setting is doing a job: it lowers the perceived production level. That sounds bad, but it’s actually what makes the post feel believable. In a plain room, the audience focuses on faces, posture, and the headline. The virtual human stands out (pink hair, purple hoodie), but not in a “CG demo” way—more like a recognizable character in a real situation.

Observed Evidence in the image Recreate instruction (prompt knob)
Wide shot with context Two chairs, table, posters, cabinet are all visible “wide 9:16 shot, include both seated subjects and room details”
Vulnerable body language One subject looks nervous, mouth slightly open “nervous expression, mid-sentence mouth, subtle tension in posture”
Meme typography system POV box at top + one emphasis word in hot pink “rounded white POV caption box + mid-frame emphasis word in bold color”
Ordinary set dressing Water pitcher, cups, plain walls “waiting-room props: pitcher, cups, posters; keep uncluttered”
Color anchors Pink hair + purple hoodie pop against beige/gray “neutral room palette, one bright character accent color”

Prompt technique breakdown (make it repeatable)

Prompt chunk What it controls Swap ideas (EN, 2–3 options)
POV headline Story clarity and click motivation “POV: I met…”, “POV: I asked…”, “POV: I almost…”
Emotion word Relatability and comment triggers “nervous”, “starstruck”, “excited”
Scene layout Whether it reads like a real moment “two chairs”, “standing hallway chat”, “backstage couch”
Set dressing Authenticity vs staged look “water pitcher”, “paper cups”, “notice board posters”
Character contrast Scroll-stopping identity “pink hair”, “bright hoodie”, “signature accessory”
A quick template you can reuse
wide 9:16 waiting-room scene, two people seated on gray chairs,
neutral beige room, simple props (pitcher, cups),
POV caption box at top: “POV: {big moment} but I was {emotion}”,
mid-frame emphasis word: “{1 word}”

Remix steps (converge in 4 runs)

Baseline lock

  • Typography: the same POV box style and placement every time.
  • Scene layout: wide shot with two seats and simple props.
  • Color anchors: neutral room + one bright signature color on your main character.

One-change rule

Change only one or two knobs per run (headline OR guest OR room). Don’t rewrite the entire scene.

Example 4-step iteration sequence

  1. Run 1: match the waiting-room scene and the POV box layout exactly.
  2. Run 2: change only the emotion word (“nervous” → “starstruck”).
  3. Run 3: change only the location (waiting room → hallway) and keep typography locked.
  4. Run 4: change only the guest/second character while preserving the composition.