A woman evaluates a yellow box vs a blue box, illustrating the need to choose valid software testing metrics.
Post by Jun 2, 2025 7:00:00 AM · 6 min read

How to Choose Software Testing Metrics

Not all software testing metrics are created equal. Context matters. The Cynefin Framework and Brian Marick's Agile Testing Quadrants can help you decide.

TABLE OF CONTENTS

Part 1: Context, Guidance, and Preparation 

In a recent post, I talked about software testing metrics. I shared an example of how DeepSeek generated a list of over 40 metrics but did not include one of the most important ones. I started thinking about how you pick what to measure. This blog post is about my thoughts on this subject.

Context Matters

The first thing you need to keep in mind is the context. Not long ago, The Pragmatic Engineer newsletter included an overview of a small company that developed a popular video game. The company did minimal testing: no unit tests and no code reviews. Some software engineering absolutists will immediately criticize this development team.

It makes sense that they did not invest significantly in software testing. First, they are building a game. No one will die if the game stops working. My son might disagree, but boredom does not cause death. Second, it was a one-time release. While some video games become franchises and you must keep releasing new versions, most indie games are one-time releases.

The context of your application significantly impacts the quality standards you must meet.

Guidance in Decision-Making: The Cynefin Framework (pronounced kuh-nev-in)

The Cynefin Framework helps you make better product decisions.

SOURCE: Agility11. (2020, January 7). Understanding Complexity to Make Better Decisions: Cynefin Demystified. Retrieved from https://www.agility11.com/blog/2020/1/7/understanding-complexity-to-make-better-decisions-cynefin-demystified

While Cynefin is a decision-making framework, the first thing it does is help you identify your context. In this case, the domain you are in right now. How do you do that? Liz Keogh has a heuristic she uses to help you get to that answer. It depends on the answer to the question: “Who has done this before?” Here are the potential answers:

Complex domain

1. Nobody has ever done it before.

2. Someone outside the organization (probably a competitor) has done it before.

Complicated domain

3. Someone in the company has done it before.

4. Someone in the team has done it before.

Obvious domain

5. We all know how to do it.

Answer 1 puts you in the obvious domain, precisely the context for this indie game developer team. This small team knows the domain well and has developed many games before. They used what they learned in computer science.

Preparation is Key: Defining Your Testing Strategy

My background is working with B2B applications. In that context, we cannot use the same quality standards as the game developers. Unlike this gaming development team, most B2B development teams live in the complicated domain and might get into the complex domain sometimes. In that context, the first thing you need to do is determine your testing strategy. How broad is that software testing strategy? I like to start with Brian Marick’s Agile testing matrix.

The Marick Agile Testing Matrix's four quadrants help you take a well-rounded approach to testing.

The matrix has four quadrants: Quadrant 1 is for Technology-Facing and Support Programming testing, Quadrant 2 is for Business-Facing and Support Programming testing, Quadrant 3 is for Business-Facing and Critique Product testing, and Quadrant 4 is for Technology-Facing and Critique Product testing.

Consider embracing good practices for each of the quadrants for most B2B applications. Most B2B applications have large codebases with millions of lines of code, so having unit tests for such a large codebase is a good idea.

For quadrant 2, I usually recommend BDD (Behavior-Driven Development). It allows us to define the essential use cases for each feature in a language understood by different groups involved in the software development process. BDD also provides guidance about which unit tests we need.

For quadrant 3, you do need to answer several questions. For example, what do you plan to do about usability and accessibility testing? Are you required to cover more than just use case testing from a functional perspective? The answers to these and other related questions will determine what you need to measure.

For quadrant 4, you must determine what you must cover for security and performance. You might also have to add some compliance testing depending on your unique context.

Now we’ve discussed how to determine your context and develop a testing strategy based on that context. I mentioned before how much I like the DRE metric. However, in the example we are about to discuss, we decided not to use such a metric. Why? Let’s find out.

Context (shown as an open blue box) determines what software testing metric you should use.

Real-World Example of How We Chose a Critical Software Testing Metric

After I left Ultimate Software, my next job was at a small company with 30+ employees. At Ultimate Software, my product development team had 12 times the number of people. My new team fluctuated in size between 6 and 9 people, including me. That included engineering, product, and UX.

Both companies existed in a B2B domain. Ultimate Software was in HR, Payroll, Benefits, etc. My new job was at a legal software company. Due to the nature of the domain, most of the work fell within the complicated domain of the Cynefin framework. In some cases, specific capabilities were only available in one of the legacy desktop applications that dominated the market at the time. That meant we also had complex work to deal with.

We have a complicated domain and a single product development team. At the same time, we had a solid foundation, as the product was built using a Test-Driven Development (TDD) approach and included numerous unit tests. It also had a continuous delivery pipeline that allowed us to test new capabilities with a selected group of customers.

Like many early agile products, this one over-relied on unit tests. We had automated unit tests, but no automated integration of system tests. Even from the perspective of the agile testing pyramid (see below), we only had the bottom part and nothing else.

The Agile Testing Pyramid has unit tests as the foundation.

If you have read my previous blog posts, you will recognize the testing strategy pattern followed this time. It was a TDD-only testing strategy or, using Brian Marick’s agile testing matrix, a quadrant 1-only strategy.

New Testing Strategy

We had to improve our testing strategy at a minimum. We could see the issues already. Our customer support team was larger than our product development team because we had no integration or system testing. Our customers conducted a lot of our testing.

We started by implementing a simple requirements template. The template included UI mockups and BDD (Behavior-Driven Development) scenarios. This simple document provided the intention and acceptance criteria for every new feature. At the same time, it provided us with a way to implement quadrant 2 of Brian Marick’s agile testing matrix.

It also provided the first set of automated system tests to complete our regression suite, allowing us to cover part of quadrant 3. We began with crowdsourced testing to identify issues early. That gave us an answer for quadrant 3.

Eventually, we taught people from support how to test and added one QA person to the team, allowing us to conduct exploratory testing internally instead of using a crowdsourcing vendor.

For quadrant 4, we had to figure out how to do it cheaply, as we had no budget to add more tools. We used Google Analytics to understand the response time our users were experiencing. While not perfect, as it is only a sample, we dedicated time to optimizing every time we saw a page consistently showing up in the top 5 of the slowest pages. After I left, the team upgraded to a better tool.

Finally, for security testing, we started with a once-a-year bug fest using the crowdsourcing vendor. Several security testers attacked our test instance and reported findings, which were added to the backlog and prioritized by severity.

After implementing a comprehensive testing strategy, we can examine software testing metrics.

Software Testing Metrics

As I mentioned earlier, the team had a well-established unit testing practice. As such, they tracked their code coverage. While the usefulness of the code coverage metric is limited, there was no reason to remove it or change the team's approach at that time.

At Ultimate Software, I embraced the use of DRE (Defect Removal Efficiency). However, this team's delivery cadence was shorter than most Ultimate Software teams. We needed something simpler and faster.

Our biggest challenge was the defects still coming through customer support. As such, escaped defects seemed like a helpful metric to track. However, I wanted it to be a narrower metric. At this time, I started using the term Customer-Found Defects (CFDs). We already had a comprehensive data set in our customer support team system. We could easily and quickly track our progress.

Conclusion

At other jobs, I built more complex testing strategies and developed a more comprehensive set of metrics. Here, I had to become more nimble and focus on a narrow goal. We did not track test automation coverage, DRE, or MTTD in this case. Instead, we focused narrowly on our immediate problem and how to solve it, which led us to Customer-Found Defects.

Which software testing metrics do you need to track? It depends on your context.

About the Author

Rafael E Santos is Testaify's COO. He's committed to a vision for Testaify: Delivering Continuous Comprehensive Testing through Testaify's AI-first testing platform.Testaify founder and COO Rafael E. Santos is a Stevie Award winner whose decades-long career includes strategic technology and product leadership roles. Rafael's goal for Testaify is to deliver comprehensive testing through Testaify's AI-first platform, which will change testing forever. Before Testaify, Rafael held executive positions at organizations like Ultimate Software and Trimble eBuilder.

Take the Next Step

Join the waitlist to be among the first to know when you can bring Testaify into your testing process.