From War Rooms to Dojos: How AI Is Reinventing the Testing Session
Remember the testing war room?
You've been there. Everyone piles into a conference room — or a Zoom with 14 cameras — armed with a spreadsheet of test cases, a staging environment that may or may not be working, and a Jira board that's about to get very busy. A product manager reads test cases aloud. Engineers toggle feature flags. QA clicks through flows and pastes screenshots into Slack. Someone inevitably says "can you repro that?" A bug gets filed. The cycle repeats. Three hours later, everyone is exhausted and you've covered maybe 60% of what you planned.
This worked. For a long time, it was the best we had.
But AI has changed what's possible — and the teams that figure this out first are going to ship software in a fundamentally different way.
What the Old Process Actually Cost
Before we talk about the new world, let's be honest about what the old one demanded.
Getting a staging environment fully ready — the right data, the right accounts, all dependencies behaving — could take hours or days. One broken seed script and the whole session was delayed. Then someone had to know which feature flags to flip, for which users, in which environments. Miss one and you're testing the wrong thing entirely.
Then there was the coordination tax. Pulling engineers, product, design, and QA into the same room across time zones is expensive. Every hour of that session is six to ten people-hours of time. The test cases themselves were only as good as whoever wrote them — usually a mix of gut feel and memory — and execution was manual, which means humans got tired, missed edge cases, and skipped the boring-but-important ones.
And when bugs were found? The report was only as good as the person writing it. Repro steps got lost. Context got lost. Half the bugs came back as "can't reproduce."
The output was a snapshot in time. Run it once at the end of the sprint and hope you caught the important stuff.
The New Model: The Engineering Dojo
Here's the shift in thinking: instead of a testing event, what if testing was a continuous practice?
We call this a Dojo — borrowing from martial arts, where mastery comes from deliberate, repeated practice rather than a single performance. Engineers run dojos regularly, not just before release. They're faster, more thorough, and bugs get fixed in the same session they're found.
AI makes this possible. Here's how it works in practice.
Step 1: AI Writes the Test Cases
The hardest part of testing isn't running tests — it's knowing what to test.
With a tool like Claude (via Claude Code for engineers, or Claude Cowork for less technical teammates), you can point at a feature, a Jira ticket, or a description of what changed and ask: What should we be testing here? What are the edge cases? What could go wrong?
The AI has read the code. It's read the ticket. It understands the user flows. It will generate test cases that a tired human at 4pm on a Friday would miss — the empty state, the expired token, the user with no payment method on file, the browser that doesn't support the latest API.
This isn't magic. It's pattern matching at scale, informed by millions of codebases and bug reports. The output isn't perfect, but it's a dramatically better starting point than a blank spreadsheet — and it takes seconds, not hours.
Step 2: End-to-End Tests Run Automatically — and Keep Running
Once you have test cases, the old world required a human to execute them. The new world lets AI do it.
End-to-end (E2E) testing tools like Playwright or Cypress simulate a real user clicking through a real browser — filling out forms, navigating between pages, checking that the right things happen. AI can write these tests, run them, and report back. Not just once, but every single time code changes.
This is regression testing on autopilot.
What used to require a war room can now run overnight, on every pull request, before any human looks at the change. By the time the engineer is ready for code review, they already know whether existing flows still work.
The dojo isn't a meeting. It's a pipeline.
Step 2b: Playwright Gives You Visual Proof — Not Just Pass/Fail
Here's something most people outside engineering don't know: Playwright doesn't just tell you whether a test passed or failed. It can show you exactly what happened.
Playwright has built-in support for:
- Screenshots — a pixel-perfect image captured at any point during the test, or automatically on failure
- Video recordings — a full screen recording of the browser session from start to finish
- Traces — a step-by-step timeline you can scrub through, showing every click, every network request, every DOM change
This changes the conversation between engineers and the rest of the team.
Instead of a product manager asking "what does it look like when that edge case hits?" — there's a video. Instead of design asking "is the component rendering correctly on mobile?" — there's a screenshot from a simulated iPhone viewport. Instead of QA writing a paragraph trying to describe a flicker they noticed — they attach the trace file and anyone can replay it frame by frame.
These artifacts get generated automatically every time the tests run. They can be attached directly to Jira tickets (especially powerful when combined with the MCP integration described below), shared in Slack, or linked in pull request comments.
For product and design, this is a game changer. You no longer need to be in the room when the test runs. You don't need to ask an engineer to reproduce something. The evidence is right there — timestamped, reproducible, and shareable — every single time.
And when AI writes the fix and opens a pull request, those same visual artifacts can confirm the fix actually worked. Before and after. Side by side.
Step 3: AI Files the Bug — With Full Context
When a test fails, someone needs to document it. In the old world, that meant a developer or QA engineer manually writing up what happened — often from memory, often missing key details.
With an MCP (Model Context Protocol) integration — essentially a way for AI to talk directly to tools like Jira — the entire bug-filing process can be automated. When a test fails, the AI:
- Captures exactly what happened (the error, the stack trace, the steps that led there)
- Checks Jira to see if this bug already exists
- Creates a new ticket with full reproduction steps, environment details, a priority estimate, and a link to the original feature ticket
- Attaches the Playwright screenshot or video so anyone can see what broke without needing to reproduce it
No copy-pasting. No "steps to reproduce: see screenshot." A complete, well-structured bug report — with visual evidence — filed the moment the failure happens.
For non-engineers, this is where tools like Claude Cowork shine — you don't need to be in a terminal or know how to read a stack trace. The AI surfaces what broke, explains it in plain language, and handles the Jira paperwork automatically.
Step 4: AI Fixes the Bug
This is the part that still surprises people.
Once a bug is filed and the failure is understood, an AI coding assistant can attempt to fix it. Not always — some bugs require deep human judgment, architectural decisions, or domain expertise the AI doesn't have. But a surprising number are straightforward: a null check that's missing, a condition that handles the wrong case, a type mismatch that only shows up with production data.
For those bugs, the AI can:
- Read the failing test
- Understand what the expected behavior should be
- Write a fix
- Run the tests again to confirm it worked
- Open a pull request for human review
The human is still in the loop — reviewing and approving — but they're reviewing a solution, not starting from scratch. The feedback loop that used to take days (find bug → file → assign → fix → test → close) can now happen in minutes.
Why "Dojo" Is the Right Mental Model
The dojo framing matters because it changes the culture, not just the tooling.
In a dojo:
- Practice is continuous, not event-driven — you don't wait for the big release to test
- Repetition builds mastery — every test run makes the suite more complete, every caught bug improves coverage
- You learn from failure — a failing test isn't a crisis, it's signal. Good signal, arriving early.
- Everyone participates — testing isn't someone else's job
Test-driven development (TDD) already asks engineers to write tests before writing code. The AI dojo extends that philosophy: AI helps write the tests, runs them constantly, files failures automatically, and suggests fixes. TDD becomes not just a discipline but an accelerated loop that compounds over time.
Before and After
Before:
Sprint ends Friday → Monday: prep staging, enable flags → Tuesday: 3-hour war room with 8 people → Wednesday: bugs filed and triaged → Thursday: fixes written → Friday: manual retest → ship the following week (maybe).
After:
E2E tests run on every PR, automatically → Playwright captures screenshots and video of every run → Dojo session: 1 engineer, 30 minutes, once a week → AI generates new test cases for anything that changed → failures auto-filed to Jira with visual evidence attached → AI proposes fixes for straightforward bugs → human reviews and merges → ship this week.
The war room still has a place for complex, high-stakes releases. But for the day-to-day work of shipping software? The dojo wins.
Where to Start
You don't have to rebuild everything at once. Here's a simple ramp:
- Start with E2E tests for your happiest path — the single most critical user flow. Get that running automatically on every pull request.
- Turn on Playwright's video and screenshot recording — one config change, and every test run produces visual artifacts your whole team can use.
- Add AI-assisted test generation — before your next release, ask your AI coding assistant to suggest what else should be covered based on what changed.
- Connect Jira — set up an MCP integration so test failures are automatically captured and triaged, with screenshots attached, without anyone having to manually file anything.
- Run a dojo — schedule 30 minutes, once a sprint, where one engineer reviews test results, fills gaps, and drives fixes. No war room required.
- Let AI attempt fixes — for straightforward failures, give the AI a shot before assigning to a human. You might be surprised how often it lands.
Each step compounds. The test suite gets more complete. The feedback loop gets shorter. The team ships faster without getting bigger.
The Human Element Doesn't Disappear
None of this removes the need for engineers, product managers, or QA expertise. What it removes is the coordination overhead and the repetitive execution work — the parts that consume time without requiring judgment.
Engineers get to focus on what actually needs them: deciding what matters to test, reviewing AI-proposed fixes, catching the subtle things no automated system would think to check, and making calls about what's a bug versus expected behavior.
Product and design get visibility they've never had before — not a summary of what was tested, but actual recordings of the product in motion, captured automatically, available without scheduling a meeting.
The war room will always have a place for the big moments. But the dojo is where the real work happens now — and it runs whether you're in a meeting or not.