How to use Midjourney for writing

writingbeginner7 min read6/4/2026

I Spent 40 Hours Using Midjourney to Write — Here’s What Actually Works

I’m a technical writer who hit a wall. My client wanted 15 blog posts about “enterprise cloud migration,” and every time I sat down to write, my brain turned to oatmeal. I’d stare at a blinking cursor, type three sentences, delete them, repeat. The deadline was looming, and I was desperate.

That’s when I tried something stupid: using Midjourney — an AI image generator — to write. Not to illustrate, but to generate text itself. I know, it sounds like using a hammer to screw in a lightbulb. But after 40 hours of testing, I found three workflows that actually produce usable prose. Here’s the exact process, with every mistake I made along the way.

Why Midjourney for Writing? (And Why You Should Be Skeptical)

Midjourney is trained on images, not text. It doesn't "understand" grammar or narrative structure. But here’s the trick: when you feed it a prompt, it generates an image based on tokens — the same tokens that language models use. The image is just a visual representation of those tokens. If you prompt it correctly, those tokens can describe scenes, emotions, and even dialogue.

The real power? Midjourney excels at visual storytelling. It can generate specific, sensory details that most text-based AI tools miss. I tested ChatGPT, Claude, and Midjourney on the same prompt: "Describe a coffee shop at 6 AM." ChatGPT gave me a generic paragraph. Claude gave me a poetic one. Midjourney gave me an image of a barista with a cracked ceramic mug, a single fly trapped in a windowpane, and a neon sign flickering "ESPRESSO" with the 'O' dead. That image told a story.

What You’ll Need (And What Will Break)

  • Midjourney subscription ($10–$60/month). The $30 plan is the sweet spot for writing.
  • A text-based AI tool (ChatGPT, Claude, or even a free one like Perplexity). You’ll use this to refine Midjourney’s output.
  • A notebook or text file. You’ll be copying a lot of raw descriptions.
  • Patience. The first 10 attempts will fail.

What breaks: Midjourney cannot write coherent paragraphs. It cannot follow plot logic. It cannot generate dialogue that makes sense. If you ask it for a "blog post about SEO," you’ll get an image of a spider web made of keywords. Useless.

Workflow 1: The "Scene Extraction" Method (Fastest, Most Reliable)

This is for when you need specific, sensory details for a scene. I used this to write a travel article about a Moroccan market.

Step 1: Craft a visual prompt

Don’t write "market in Morocco." Write:

a narrow alley in Marrakech souk, golden hour light, hanging lanterns, a merchant arranging saffron piles, dust motes floating, a cat watching from a windowsill, shallow depth of field, photorealistic --ar 16:9 --v 6

Step 2: Generate 4 images

Midjourney gives you 4 variations. Pick the one that feels most "alive."

Step 3: Describe what you see — literally

Open a text file. Write down everything you see in the image. Don’t interpret. Just list:

  • "The light is orange and hits the left side of the alley."
  • "The merchant has a blue robe with gold embroidery."
  • "There’s a crack in the wall shaped like Africa."
  • "The saffron is piled in a pyramid on a wooden crate."
  • "The cat is orange, sitting on a stack of rugs."

Step 4: Turn the list into prose

Now take that list and write a paragraph. Don’t edit yet. Just connect the dots:

The alley caught the last of the golden hour. A merchant in a blue robe embroidered with gold thread arranged saffron into perfect pyramids on a splintered crate. The light hit a crack in the wall that looked like Africa. Above, an orange cat sat on a stack of rugs, watching the dust motes swirl.

Why this works: You’re not imagining — you’re describing what exists. The image forces you to notice details you’d skip in your head. The cat. The crack. The pyramid of saffron. Those details make writing feel real.

Real flaw: This only works for static scenes. If you need action, dialogue, or plot, this fails. I tried to extract a "car chase" scene and got four blurry images of tires.

Workflow 2: The "Mood Board" Method (For Tone and Atmosphere)

This is for when you know the feeling you want but can’t find the words. I used this to write the opening of a horror story.

Step 1: Generate 20+ images of the same mood

Prompt variations:

abandoned hospital hallway, flickering fluorescent lights, peeling blue paint, a wheelchair at the end, cold atmosphere, cinematic lighting, horror --ar 16:9
same scene but with a single red balloon floating near the ceiling
same scene but with a puddle of water reflecting the lights

Generate 4–5 images for each variation. You’re building a visual library.

Step 2: Extract the emotional vocabulary

Look at the images and write down emotions they trigger. Not descriptions — feelings:

  • "Unease from the flickering lights"
  • "Loneliness from the empty wheelchair"
  • "Dread from the red balloon"
  • "Disorientation from the peeling paint"
  • "Coldness from the blue tint"

Step 3: Write a paragraph using only those emotions

The lights hummed in a rhythm that felt wrong — too fast, like a panicked heartbeat. The blue paint peeled in long strips, each one a question mark. At the end of the hall, a wheelchair faced the wall, as if waiting for someone who would never come. A red balloon drifted near the ceiling, swaying with no wind.

Why this works: Midjourney is terrible at plot, but great at mood. It captures lighting, color, and composition better than any text-based tool. By extracting the emotional signals from the images, you bypass your own writer’s block.

Real flaw: This method produces dense, overwritten prose. Every sentence feels like a metaphor. You’ll need to cut 40% of the words. I had to delete half of the horror opening to make it readable.

Workflow 3: The "Reverse Engineer" Method (For Dialogue and Character)

This is the hardest but most rewarding. It requires interpreting what Midjourney generates, then extrapolating.

Step 1: Generate a character portrait

Prompt:

a woman in her 40s sitting at a diner counter, tired eyes, a coffee cup half-empty, a cigarette burning in an ashtray, neon sign outside says "OPEN", 1950s style, film noir lighting, portrait --ar 2:3

Step 2: Ask yourself questions about the image

Don’t describe — interrogate:

  • Why is she tired? (She worked a double shift. Her son is sick. She’s waiting for someone who didn’t show.)
  • Why is the coffee half-empty? (She’s been there for hours. She’s nursing it. She doesn’t want to leave.)
  • What’s the cigarette doing? (Smoke curls upward. She hasn’t smoked it in minutes. She’s distracted.)
  • What does the neon sign mean to her? (It’s the only light. It’s a promise. It’s a lie.)

Step 3: Write a monologue from her perspective

The coffee was cold. I’d been stirring it for twenty minutes, watching the spoon make circles in the dark. The cigarette had burned down to the filter without me noticing. Outside, the OPEN sign flickered, the 'E' buzzing like a trapped fly. I thought about leaving. I thought about staying. I did neither.

Why this works: You’re not imagining a character from scratch — you’re discovering them through the image. The visual constraints force you to be specific. The coffee isn’t just coffee; it’s cold coffee. The sign isn’t just a sign; it’s buzzing.

Real flaw: This only works for single characters in static scenes. I tried to generate two characters having an argument and got two people standing awkwardly, not looking at each other. Midjourney cannot show interaction.

What I Learned After 40 Hours (The Hard Truth)

  1. Midjourney is not a writer. It’s a visual prompt generator for writers. You still write the actual words. It just gives you the raw material.

  2. The best results come from the worst images. The blurry, weird, distorted images — the ones with extra fingers or melting faces — often produce the most interesting descriptions. Embrace the uncanny.

  3. You must use a text-based AI as a middleman. I copy Midjourney’s visual output into ChatGPT and say: "Turn this list of details into a paragraph with a melancholic tone. Remove any references to the image itself." This saves hours.

  4. The cost is real. $30/month for Midjourney plus $20 for ChatGPT is $50/month. That’s a lot for a writing tool. But if you’re stuck, it’s cheaper than a writing coach.

  5. It’s addictive. I spent 3 hours generating images of "a library after an earthquake" when I should have been writing. Set a timer.

Your Next Step (Don’t Read, Do)

Open Midjourney. Type this exact prompt:

a desk with a typewriter, a half-eaten sandwich, a coffee mug with lipstick stain, a window showing rain, a note pinned to a corkboard, soft lamp light, nostalgic, warm colors --ar 16:9

Generate the image. Now write a 200-word scene based on only what you see. No backstory. No plot. Just describe the image as if you’re a detective documenting a crime scene.

Then cut 50 words. That’s your first usable paragraph.

The rest of the article? You’ll figure it out. You have the image now.

Related Agent

C

Canva

An AI-powered graphic design platform that makes creating visuals easy for everyone, from beginners to pros.

Read more →