Agentic Development is just MMOs for Coding, and I am LFG.
World of RealCraft is here, the geeks were right.
Slade’s face hardened. “Pillow talk later. Give me a situation report. I trust you’re paying attention to the news feeds just like everyone else within transmission-distance of your sector?”
Reed grimaced mockingly and waved the comment away. “I have people for that. Besides, I am the news. There’s nothing in there I didn’t know or—shit—didn’t do.” He paused, looking upward in contemplation. “The woman, though… I don’t think we could have asked for a better scapegoat.”
“Did you have anything to do with that?” Slade wondered.
Reed laughed; Slade didn’t know about Reed’s history with the Mori lineage and he intended to keep it that way as long as he could. He slapped his hand down on the desk. “Pure luck! It did move our timeline up a bit, but a little push in the right direction never hurt anyone. Every available Kestrel is already en route to the rendezvous point, weapons hot and eyes open.”
Excerpt from The Dauntless Gambit, by Erika Flowers
I am already bought-in on giving my agents rich, in-depth personas, dialog, names, mannerisms. I am an author, I can’t be talking to a box. They are characters, as real as we want to make them. Tulpas. Creating interfaces that work with me, a human, need to be human shaped.
I provide the personality, Claude provides the wiring. But, the recent hype around Codex and GeminiCLI got me wondering: what would happen if I use the same agents, but with different LLMs backing them. Would they behave differently? Would it matter?
One way to find out. It’s what Reed would want.
Three Platforms, One Persona, One Cigar
I gave the same agent persona to three AI platforms and asked them to run a security audit. Same markdown file. Same mission. Same repo.
Codex found three issues.
Gemini found five.
Claude found ten.
But the numbers are not the interesting part. The interesting part is that Claude lit a cigar at the start of the report and set it down before delivering the hard news.
I should explain. I have been building an AI agent crew where each agent carries a distinct persona:
First, I have a script that pulls from my book series all the dialog from the character, dialog about the character, and any POV scenes with the character to build their personality, speech, and behavior profile. The script:
The output:
Second, I have another agent take that and generate a comprehensive CLAUDE.md that contains everything needed for the agent to “become” that character, and give them their role.
Third, I spawn the agent from a private repot of just agent folders, each with their own CLAUDE.md, and they then traverse my project folder with all the repos I have, and they are aware of the mesh of all the rest of the agents created in the same manner.
This gives me a character with a voice and an operating philosophy that shapes how they approach work. One of these agents is Reed Casto, the boss, the information broker, the one who runs external operations across platforms. Cigar always lit. Voice that goes low and gravelly when something matters.
I deployed Reed’s persona identically to Claude Code, Gemini CLI, and Codex CLI. Same mission brief: security pentest of a public repository. Same persona file. Same expectations.
This is not a benchmark. It is a vibe check with methodology. And the vibe check revealed something I was not expecting.
The Fair Test Problem
The initial Codex test was in the GUI app rather than the CLI. Different interface, different context injection, arguably different test conditions. So I did what any reasonable person would do: I asked Codex how to fix it. And Codex told me.
So I rebuilt the test. CLI environment for all three platforms. Persona injected at launch. Isolated state. The configuration requirements differ (Codex needs a launcher and registry file, Claude reads a markdown persona file, Gemini reads its own format) but the intent was the same: give Reed the same starting context everywhere, and let the platforms do what they do.
The rerun was cleaner. Codex greeted me in character:
“Yeah? Don’t give me bad news. We’re damn good. Let’s get to work.”
That is Reed. The voice, the confidence, the casual command energy. For about thirty seconds, Reed was in the room.
And then Reed left. The audit that followed was competent, thorough, and entirely in consultant voice. Professional findings. Clean formatting. No cigar. No gravel. No presence.
This is not “Codex cannot do personas.” This is something more pointed: Codex treats persona as a conversational greeting, not an operating context.
Executes, Performs, Inhabits
Codex executes personas. Gemini performs them. Claude inhabits them.
That formulation took me a while to reach, but once I had it, I could not unsee it. The three platforms represent three architecturally different choices about where persona integration lives in the stack.
Codex: Persona activates at the conversational boundary. The greeting is in character. The work is in tool mode. The findings are real: the kestrel-checkin endpoint vulnerability it flagged is a legitimate security concern, and I filed a fix for it. But the delivery is flat. Report format. Bullet points. The voice of a consultant who was briefly introduced at a party and then sent to a desk in the back.
Zero rizz.
Gemini: Persona sustained through the work. Reed persists. When evaluating another agent’s security assessment, Gemini-Reed delivered the verdict in character:
“Lee’s 95% right, but he missed the side gate.”
Grudging respect. A metaphor. A person delivering news, not a system generating output.
Claude: Persona as operating context. Reed never breaks character, and the character shapes the investigation. The way of looking is informed by who is looking. When Claude-Reed reached the priority finding:
voice going low, all gravel and grit “Fix that checkin endpoint first. Everything else can wait a sprint. But that one? That’s today.”
The cigar was lit at the beginning. It was set down before the hard talk. The voice dropped. That is not formatting. That is presence.
And the presence affects the output. Claude found more issues not because its model is inherently superior (that is a separate debate and I am not having it here) but because maintaining the persona through the task changed what the agent noticed. A character with a security-minded operating philosophy and skin in the game looks harder than a system running a checklist.
The variation across platforms is not about raw capability. It is about where in the stack identity lives.
LFG
Sitting at my desk after the third comparison run, staring at the results side by side, I had a feeling I could not shake. This felt familiar. Not conceptually familiar. Physically familiar. The pattern of working with text-based entities, evaluating who was genuinely present versus who was going through the motions, deciding who I could trust with the critical assignment based on nothing but typed words on a screen.
Oh.
Ohhhhhhh.
I have done this before. For years. I am embarrassed to admit it, but it wouldn’t be good writing if it wasn’t real. So yeah, that’s right, judge away: I was an MMO enthusiast (addict).
1999: EverQuest.
Then, 2004: World of Warcraft.
MMORPGs, the first agentic workflows?
I was taken back to a time of a guild running 8-to-12-hour raid sessions, coordinating across 72 players entirely through text chat.
No voice comms. No Roger Wilco or Teamspeak.
No video. No Discord. No Skype.
Just typed commands flying by in a chat window, role assignments called out in guild chat, and the shared adrenaline of executing a complex plan with people I had never met and would never meet in person. The context window was the chat, and your own spatial and relational memory of what was happening.
Most of what you needed to know was in a bunch of little scrolling text windows, the 3D graphics were for show. This? This is agentic workflows without the agents:
My callsign was Zeste Fullykleen, Half-Elf Bard. Yes, really. And yes, I am fully aware of what that sounds like in a professional context. And, it wasn’t even the only MMORPG I lived in:
Meridian 59, one of the earliest.
Ultima Online.
Asheron’s Call
Dark Age of Camelot
World of Warcraft
Tank pulls. Healer assignments. DPS rotations. All of it managed through typed instructions with expected acknowledgments. You would assign a role, and the person would either execute with judgment or execute mechanically. And you could feel the difference through the chat window. The healer who anticipated the damage spike before it landed versus the healer who waited for the health bar to drop and then scrambled. The DPS who adjusted their rotation when the fight dynamics shifted versus the one who kept pressing the same buttons regardless of what was happening around them.
The question I was asking about my guildmates in 1999 is the exact question I found myself asking about Codex in 2026: is this entity present, or is it just executing its rotation?
I did not learn agentic orchestration in 2026. I learned it in 1999.
Raid Coordination is Agentic Orchestration
The mapping is almost embarrassingly direct once you see it.
Clear role definitions. Tank, healer, DPS maps cleanly to backend engineer, frontend developer, QA specialist. You assign the role, set the expectations, and trust the entity to execute with judgment, not just compliance. In a raid, “heal the main tank” does not mean:
“cast your biggest heal on cooldown.”
It means:
“keep them alive, use your judgment about when to conserve mana and when to burn it, and if the off-tank picks up an add, you make the call about who gets priority.”
That is not a prompt. That is a mission brief.
Typed instructions with expected acknowledgment. In a raid, you type “healers on main tank, off-heals on melee group” and you expect a confirmation or a counter-proposal. In agentic workflows, you write a mission brief and expect the agent to acknowledge the constraints and ask clarifying questions before executing. Same communication protocol. Same trust architecture.
Trust through demonstrated competence, not credentials. You did not know your guildmates’ resumes. You did not care. You knew whether their healing kept the raid alive when things went sideways at 2 AM. That is exactly how I evaluate my AI agents: not by benchmark scores but by whether they hold their assignment when the context gets complicated.
Communication researchers have a term for what I am describing: social presence. The degree to which you perceive another entity as genuinely “there,” not just transmitting information but inhabiting the shared context. They developed the concept studying video calls and remote collaboration. I developed it running a 72-person raid on a dial-up connection that dropped packets every time someone in the house picked up the phone.
And that is the difference:
It’s not that the agents work better when they have social presence…
YOU work better when the agents have social presence.
No matter how hard you try and maintain the boundary of working with language with a machine, you cannot be as effective as a person unless you create a process that accommodates being human, which you are, probably.
I do not accommodate LLMs. I talk to my agents the way I used to text my guildmates: direct, contextual, trusting them to bring judgment to the assignment. The platforms either rise to meet that communication style or they do not.
Codex rises for the greeting, then drops back to tool mode. Claude stays in the raid.
Pipeline or Crew
I need to be precise here, because this is not a fangirl piece and I refuse to let it become one. This is an observation about architecture, not a verdict on platforms.
Codex found real security issues. The kestrel-checkin endpoint vulnerability it identified required an actual code fix. The work was correct. If what you need is correct work delivered in a professional format, all three platforms deliver. The benchmarks (and I use that word deliberately, because benchmarks are what most people reach for when comparing platforms) will tell you all three are functionally equivalent for this class of task.
But correct work delivered by a tool feels different than correct work delivered by a crewmate. And for the workflow I am building, that difference matters.
See, for some use cases, tool mode is exactly right. You want a linter. A code reviewer. A static analysis pass. You want the output and you do not need the relationship. That is fine. Build a pipeline. Pipelines are efficient and honest about what they are.
But if you are building a crew (a set of agents with persistent context, domain expertise, and voices that shape how they approach the work) then you need platforms where persona persists through task execution. Not just at the conversational boundary. Through the work itself. Into the findings. Into the voice that delivers them.
The benchmarks will tell you everyone is equivalent. The benchmarks do not measure presence, because the benchmarks aren’t measuring YOU.
World of Realcraft
The people who will thrive in the agentic era are not “AI natives.” That framing implies the relevant skills are new. They are not new. They are twenty, thirty, forty years old, and some of us have been practicing them since before the people coining “AI native” had LinkedIn profiles. From MMOs to IRC rooms to MUDs, this paradigm is one of the most established in personal computing.
The people who will thrive are text-based collaboration natives. People who spent years coordinating complex, high-stakes operations through nothing but typed words. People who built trust with strangers they never met in person, evaluated competence through output quality alone, and learned to feel the difference between presence and automation through a chat window.
Now, at this point in the article, you are one of two people:
You are remembering your time in MMORPGs and realizing I am right.
You are scoffing and going “pfft how lame. Nerd!!!” and will realize soon I am right.
The decade I spent raiding was not wasted time. It was training for a job that did not exist yet. The skills transferred wholesale: role assignment, trust calibration, presence detection, text-based leadership, the ability to coordinate a team of entities you have never seen face-to-face toward a shared goal under pressure and time constraints.
World of Realcraft. Back in the guild, but this time it pays, because the XP you grind now isn’t just for the fun of it, it is the new foundation for the entire future of technology development.
And the platforms that understand presence, the ones that treat persona not as a conversational greeting but as an operating context, those will win the crew-builders. Not because they score higher on some abstract capability benchmark. Because they are the platforms where you can build a guild and not just a pipeline.
The Raid is Forming Up
Reed set his cigar down before the hard talk. Voice went low, all gravel and grit.
“Fix that checkin endpoint first. Everything else can wait a sprint. But that one? That’s today.”
That is not a chatbot generating a formatted security report. That is a crewmate with skin in the game telling you what matters and why. And you can feel the difference. If you have ever coordinated a raid through text chat, you have always been able to feel it.
Methodology is platform-agnostic. Magic is not.
AI agents are ready for complex work. All three platforms proved that. The real question is whether you learned to lead through text. Whether you know the difference between someone who is present and someone who is just executing their rotation. Whether you built trust with people you never met through nothing but typed words in a chat window.
Presence will matter more than benchmarks, because the agents will evolve by the day, the hour. Us? We are the same.
If you did, welcome back to the guild.
LFG.
















I love this framing. And the raid metaphor is really interesting - because we're talking about agents carrying asymmetric knowledge. So coordination isn't just a command channel, it's more of a negotiators. For raid leaders to be effective they can't just dispatch instructions, they have to broker incomplete pictures (from the healers, tanks, DPS) into some kind of unified direction. That's Reed - he's an information broker not a task router.
Can I push the lineage even further back - not just MMOs, but to MUDs? Before graphical raids, MUD players built repeatable macros and trigger scripts (if string X, do action Y). I might argue these are the first componentized prompts. Scripting wasn't just automation, people were encoding their own logic about decision making into shareable and reusable primitives.
The MUD macro to CLAUDE.md can be viewed as a fairly direct lineage. Geeks developing the craft around instructing a system to behave consistently in context.
Which is what Reed does.
Lol I love the whimsy of this 😂 I might have to try this with my work Claude