Testing without Testers in the Agentic Coding Age
Agentic coding automates unit and integration testing, but product-layer validation still requires outside-in testing.
TABLE OF CONTENTS
Agentic Coding Changes Testing — But It Doesn’t Eliminate It
In January 2025, I wrote about a troubling trend: software teams assembling without anyone in a testing role. The argument from engineering leadership was simple: developers could own quality end-to-end. The reality, as teams discovered, was messier. Bugs in production. Exploratory testing skipped. The right side of Marick's testing matrix was left largely untouched.
That post resonated. What I didn't anticipate was how quickly the conversation would evolve, because now we have agentic coding.
Tools like Claude Code are changing what it means to write software. Developers don't just write code anymore; they direct AI agents that write, refactor, test, and iterate at machine speed. An engineer can describe a feature at a high level and let an AI agent implement it, including its unit tests, in minutes. The obvious question is: Does this finally make testing without testers viable?
My answer, after I spent time thinking about this, is YES, but only if you're honest about what agentic tools cover and what they don't.
What Agentic Coding Actually Changes
Let's be precise about what agentic coding tools do well. Claude Code and similar AI coding agents excel at tasks in the lower half of Marick's Agile Testing Matrix, the developer-facing quadrants. They generate unit tests from function signatures. They write integration tests from API contracts. They refactor code and run tests in a loop until they pass. They are tireless, fast, and consistent.
The latest GenAI models are genuinely transformative. For years, the gap in testing-without-testers teams was time and coverage at the unit and integration level. Developers knew they should write more tests; they just didn't have enough hours. Agentic coding removes that excuse. Claude Code can generate a comprehensive unit test suite alongside feature code in the same session and with the same context.
But the story doesn't stop at unit tests. Claude Code's Auto-Accept mode creates autonomous loops in which the agent writes code, runs the full test suite, and iterates until everything passes, without human intervention. And through its multi-agent capability, a lead agent can coordinate specialized sub-agents working on different parts of a codebase simultaneously. What used to take a sprint can now happen in hours.
Agents Across the Matrix — Including Quadrants 3 and 4
Here's where the picture gets more interesting than most people expect: agentic tools are no longer limited to the left side of Marick's matrix.
Anthropic recently released Claude Code Security, which addresses the Quadrant 4 technology-facing critique. Rather than using traditional static analysis that matches code against known vulnerability patterns, Claude Code Security reasons through a codebase the way a human security researcher would: tracing how data moves through the application, understanding how components interact, and identifying context-dependent flaws that rule-based tools miss entirely. In early testing with Claude Opus 4.6, the tool discovered over 500 high-severity vulnerabilities in production open-source projects — bugs that had survived decades of expert review (source: https://www.anthropic.com/news/claude-code-security).
This capability represents a meaningful shift for teams that have historically treated security testing as a separate, periodic exercise. When an agent can scan your codebase for vulnerabilities as part of the same development loop that writes and tests features, security moves from a checkpoint to a continuous activity.
Performance testing is also becoming more accessible through the agentic model. Claude Code can be directed to write load testing scripts, instrument code for performance profiling, and analyze results, all within the same workflow. What previously required specialized tooling and a dedicated engineer is increasingly being orchestrated within a CI/CD pipeline.
The Gap That Remains
And yet, the gap I identified in the original post is still real. It has just moved.
Agentic coding generates tests from code and documentation. It tests what the developer intended to build. Quadrant 3 of Marick's matrix, business-facing tests that critique the product, requires understanding what the product should do from the outside. That means UI-level testing across real browser states. It means testing the workflows your users actually follow, not the ones your engineers assumed they would follow. That kind of validation is essential but missing.
There's also a second-order effect worth naming: AI-generated code tends to be syntactically correct and logically coherent, but it can still encode wrong assumptions. If a developer describes a feature to Claude Code inaccurately, the agent implements the inaccurate spec faithfully and generates tests that pass against the wrong implementation. The tests are green. The feature is broken. This issue is not a flaw in agentic coding; it's a fundamental limitation of inside-out testing. You need outside-in validation, too.
And as code ships faster and more of it is AI-generated, the surface area for this kind of product-layer failure grows. Agentic coding accelerates development velocity. That's a good thing. But velocity without product-layer coverage means you discover broken workflows in production faster.
A New Division of Labor
Agentic coding creates an opportunity to rethink the entire quality ownership model. Here's how I see the division settling:
Agentic coding tools own the developer quadrants. Unit tests, integration tests, code-level security scanning, and performance instrumentation are increasingly solved problems for teams willing to use these tools deliberately.
Autonomous testing platforms own the product layer. UI-level testing, end-to-end workflows, and regression against the running application are tested continuously without requiring source code access or manual script maintenance.
Human quality experts shift to judgment work. Defining acceptance criteria, assessing product behavior against user expectations, and interpreting what failing tests mean for the business, not writing and maintaining test scripts.
This isn't the death of quality engineering; it's its maturation. The people who thrive in this model are those who deeply understand product behavior and can clearly communicate expected outcomes, whether to a developer, an AI coding agent, or an autonomous testing platform. The manual test script writer is no longer the primary demand. The person who can define what "correct" looks like at the product level absolutely is.
Why This Validates the Original Concern
When I wrote "Testing without Testers" in January 2025, my concern wasn't that developers can't write tests. It's that they don't test comprehensively. They test what they build, not how it behaves as a product. Agile already pushed teams toward the left side of Marick's matrix. Agentic coding accelerates that drift dramatically.
The good news is that agentic tools are beginning to reach into all four quadrants. Security testing is increasingly automated. Performance testing is more accessible. The lower quadrants are largely handled. That's real progress.
But the right side of the matrix, functional, exploratory, UI-level product testing, still requires deliberate investment. Teams that adopt Claude Code and stop there will ship faster and hit production with more confidence at the code level. They will still get bug reports about broken workflows, and UI states that no agent has tested from the outside in.
The Path Forward
If you're leading engineering at a company that's adopting agentic coding (and at this point, most of you are), here's the practical implication: your developer-facing testing is probably in better shape than it has ever been. Your security scanning is improving. That's genuinely good news. Now ask yourself honestly: what is covering your product-layer testing?
If the answer is "developers click around after a PR merges" or "we have some Selenium tests that half the team has stopped maintaining," you have a gap. And that gap will become more visible, not less, as agentic coding accelerates your shipping cadence.
Testing without testers is finally more viable than ever. But only if you close the loop on both sides of the matrix. The tools to do it exist. The question is whether you're using them.
Testaify is an AI-first autonomous testing platform that covers the product layer — UI-level, end-to-end functional testing that runs continuously without access to source code. As agentic coding tools handle more of the code-facing quadrants, Testaify handles the other side of Marick's matrix. If your team is shipping faster with AI and wondering what's catching product-layer bugs, start your pilot at testaify.com.
About the Author
Testaify founder and COO Rafael E. Santos is a Stevie Award winner whose decades-long career includes strategic technology and product leadership roles. Rafael's goal for Testaify is to deliver comprehensive testing through Testaify's AI-first platform, which will change testing forever. Before Testaify, Rafael held executive positions at organizations like Ultimate Software and Trimble eBuilder.
Take the Next Step
Testaify is in managed roll-out. Request more information to see when you can bring Testaify into your testing process.