Why is Usability Testing so Hard?

Usability testing is so hard. But why? And how can AI come to the rescue? Find out as we analyze the complex requirements of usability testing.

TABLE OF CONTENTS

Part 1 - Why is Usability Testing so Hard?
Part 2 - Can AI Help with Usability Testing?
Part 3 - Usability testing matters to product quality.

We recently introduced the concept of Continuous Comprehensive Testing (CCT), and we still need to discuss in depth what that means. This series of blog posts will provide a deeper understanding of CCT.

In our introductory CCT blog post, we said the following:

Our goal with Testaify is to provide a Continuous Comprehensive Testing (CCT) platform. The Testaify platform will enable you to evaluate the following aspects:

Functional

Usability

Performance

Accessibility

Security

While we cannot offer all these perspectives with the first release, we want you to know where we want to go as we reach for the CCT star.

Part 1 - Why is Usability Testing So Hard?

What is Usability Testing?

If you read our blog post about The Ever-Rising Quality Standard, you know user experience is an essential quality attribute. According to Wikipedia, “Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience.” This definition rightly makes Usability sound very subjective.

As such, usability is the most challenging quality attribute to test. In the book Handbook of Usability Testing, Jeffrey Rubin and Dana Chisnell use Bailey’s Human Performance Model to explain the scope.

Usability tests consider the human, the context, and the activity when assessing an action.

As you can see from the figure above, there are three components to consider in any type of human performance situation.

The Human
The Context
The Activity

Product teams have primarily focused on the activity component since the early days of software development. But that’s not enough. Most teams need to gain the expertise to handle the kind of work that covers all three components.

Why is Usability Testing so tricky?

Usability Testing goals are tricky to test. In the same book, the authors state:

The overall goal of usability testing is to identify and rectify usability deficiencies. The intent is to ensure the creation of products that: are easy to learn and to use; are satisfying to use; and provide utility and functionality that are highly valued by the target population.

How do you test that a product is easy to learn? How easy is it to learn the product? How do you know a product is satisfying to use? By definition, testing occurs in an artificial environment. But in the case of usability testing, this is even more pronounced.

For example, in functional testing, I need to test the following business rule: “City tax applies to all transactions that are $900 or more.” I can create boundary value tests, each with a clear expected result. I can easily replicate the environment of most users in the case of web-based apps. Yes, I might need to spend time creating the proper setup in the application, but in the end, each of my tests will have a clear Yes or No answer. It passes or fails. There is no ambiguity around the result.

In Usability Testing, that is a lot harder because replicating the human component becomes more complex, and in some cases, the context can be hard to reproduce. You can think of mobile apps used by construction workers at the construction site with weak or no cell phone signal. Usability testing is usually a task-based study conducted with a sample population of users and regularly accompanied by a survey. Sometimes, the context might be challenging to reproduce without conducting a field test.

To create a usability test, you need the following:

The team formulates a hypothesis.
The team assigns a randomly chosen group of participants to experimental conditions.
The team implements tight controls.
The team uses control groups.

I have never seen a team meet these conditions before. I have never seen a usability test using control groups. Plus, very few teams have enough participants to choose from them randomly. Finally, most teams do not have tight controls or know how to conduct a usability test with a well-defined hypothesis.

The best-case scenario is a usability lab where you can see 20+ participants. You can record what they are doing and observe each completing the tasks. The most common scenario in organizations with a team that conducts usability testing is an online test with 5 to 10 participants. The problem is that most usability testing performed today can only provide a clear negative signal.

For example, if you have a task and four of five participants cannot complete the task, then you have a strong signal that this design has a problem. Still, you might need to dig deeper to identify the issue if two participants stopped at different steps, incapable of completing the task. But if four of five participants complete the task, does that mean the design is ready for release? What about if all five participants completed the task? Is that good enough? The devil is in the details.

Teams tend to survey the participants to capture sentiment about the application and tasks. If all participants complete the tasks, but the sentiment from the survey is negative, then you might still need help with the design. How do you know? The answer is you do not know.

Is there a better way of conducting usability testing?

There may be a more effective way. Many teams have decided to iterate quickly and test ideas in production. In other words, many teams shift-right their usability testing. Thanks to CI/CD pipelines, cloud computing, usage monitoring applications, and feature flagging tools, you can efficiently address requirements like control groups and having a statistically significant population. Plus, you handle the context and human components in the Human Performance Model.

Modern cloud applications can break their production environment in such a way as to support multiple variations of a specific feature at the same time. In such a scenario, you can meet most of the criteria for usability testing. This type of configuration allows teams to get more definitive answers about what works and what does not.

However, this approach has a problem: You are testing with your production users. Can you identify usability issues sooner? Is there a way AI can help you improve your usability and give you an early warning without all the hassle of having a usability lab? There is, but to learn about it, you must check our next blog post in this series.

Part 2 - Can AI Help with Usability Testing?

Can AI help with usability testing? Yes, it can!

That last blog post discussed Usability Testing and why it is so hard. We tried to preempt usability issues for a long time, but the best we got with the existing approaches were weak signals. We shifted right and became better at deploying and testing in production. Can we add something to help us avoid usability issues? Yes, we can. Can AI help us with that? Yes, it can.

The Objective of Usability Testing

While many books talk about methods, techniques, and processes, a few focus on the main objective of usability testing. Our previous blog post quoted the Handbook of Usability Testing: “The overall goal of usability testing is to identify and rectify usability deficiencies.” Like all testing, our goal is to identify potential issues or problems.

Testing can find issues, but it cannot rectify them. The reason this line from the book mentions “rectify usability deficiencies” is due to the assumption that you are conducting usability testing before releasing your product to production. Testing can identify, illuminate, and inform us of potential problems but cannot fix them.

If we go back to the fundamentals, usability testing should focus on identifying usability issues.

Shift-Right Usability Testing

The shift-right approach does allow us to monitor a specific group of users and compare them with a control group. It will enable us to test a hypothesis. Most cloud-based systems implement instrumentation to capture the activity within their products. Many cloud applications use tools like Google Analytics, Amplitude, or Pendo, and together with tools like Split and LaunchDarkly, you can genuinely design usability tests.

The feature flagging tools allow you to turn on a new design for a select group of customers. The app monitoring tools will let you see how these customers interact with the new design. You have a control group because the previous design is still in production. You can get additional information to understand the findings by using existing survey features or adding a survey tool like SurveyMonkey or Qualtrics.

Is this an improvement for usability testing? Yes. Is this something many teams do? According to this Essential UX Statistics blog post, only 55% of companies currently conduct UX testing. Many of these companies need to take a shift-right approach to usability testing. That means they are mostly getting informal feedback and capturing weak signals.

Usability Testing with AI

One of the most significant challenges with usability testing, even the shift-right approach, is data processing and analysis. This problem is one where AI can help considerably. AI/ML systems are excellent pattern-matching solutions, and they can review the data from a tool like Amplitude and come up with findings for the specific group. These vendors will likely work on adding AI/ML capabilities in the future. At least, we hope they do.

An AI capability for data crunching and analysis will significantly improve usability testing for all the teams using a shift-right approach. Can it help earlier in the process? Yes, and that is where Testaify comes in.

Usability Testing - Testaify to the rescue!

One essential platform aspect of Testaify is our discovery engine. The discovery engine builds a model of your application from a user perspective. This model is recreated every time you run a test session. As such, the model changes through time.

The Testaify model will enable our users to learn about potential usability problems. We will know if a path that used to take three steps now takes five. We will know if specific steps are slower or faster. We will provide you with early warning signals that can become the basis of your next hypothesis to test using a shift-right usability testing approach.

At the same time, data integration can provide us with a mechanism to learn from the results captured in the last experiment. The Testaify usability engine will become more intelligent with every test result. This virtuous cycle will improve Testaify’s usability engine prediction capabilities, taking your product to a higher quality standard with every iteration.

Final Thought - AI-generated Personas Usability Testing

The feature I am most excited about in the usability product is the ability to use AI-generated personas. Today, Big Tech uses the data they captured about us to sell us stuff. Testaify will use the data captured by usage monitoring tools to generate AI personas that simulate our users' behaviors.

Imagine using ten AI worker bees to conduct a usability test for a specific user persona like a realtor. Testaify can test the new design by generating ten unique AI worker bees to simulate ten realtors so you can test your new realtor app. Testaify creates, executes, and analyzes the usability testing simulation in minutes.

AI worker bees discover your app, warn you about potential issues, and become AI-generated personas for usability testing. The Future is always fantastic! I hope you join us as we build it.

Part 3: Usability testing matters to product quality.

Usability testing is an iterative process that helps create user-centric products. Understanding user behavior and preferences enables you to make informed design decisions and deliver exceptional user experiences.

Usability testing can help you:

Identify user pain points and areas for improvement.
Gather data-backed insights to guide design decisions.
Continuously test and refine your product to delight users as their needs evolve.
Validate assumptions and understand users’ context of use.
Develop better products faster.

Usability testing reveals issues internal teams might overlook because they know the product well. This type of user research is especially important for new products or products that update frequently.

The most common types of usability testing choices

Qualitative vs. Quantitative

- Qualitative: This approach focuses on understanding user experiences, thoughts, and feelings.
- Quantitative: Collects numerical data like success rates and completion times.

Moderated vs. Unmoderated
- Moderated: A facilitator guides participants through the test in traditional usability testing. This is an expensive way to gain valuable feedback.
- Unmoderated: Participants complete tasks independently without a facilitator.
Remote vs. In-person
- Remote: Traditional usability testing might include online usability tests, allowing participants and researchers to be in different locations.
- In-person: It is conducted face-to-face with a researcher present observing the participants.

The traditional approach for usability testing was in person and moderated. It included qualitative and quantitative methods. Usually, interviews were conducted for qualitative information gathering, and software was used to capture the actions of the participants to collect all the quantitative data.

Today, most usability testing is done remotely and, most of the time, is unmoderated. Thanks to emerging technologies in cloud computing, like feature flagging and product analytics tools, you can capture qualitative and quantitative data. You can capture qualitative data with embedded surveys.

Usability Testing is Better with AI

We’ve established why usability testing is critical: usability testing methods help you evaluate how real people interact with your software product. Autonomous testing platforms like Testaify are on the road to simulating user behavior, delivering comprehensive insights development teams can use to make meaningful improvements to the usability of their products. Imagine a world where you can conduct usability testing with AI to test your web app against simulated users representing the many personas using your product to ensure your product meets user needs effectively.

In this new world, AI will simulate users as they interact with your web app to identify potentially confusing areas of your app or pain points in the user journey. This process will generate findings your development team can use to uncover issues that might not be evident to designers or developers already familiar with the app's design and logic. Objective, AI-generated usability tests, and findings will provide a clearer picture of how intuitive and user-friendly the product is.

By simulating user interactions, autonomous testing efforts can help you estimate whether visitors will understand how your site works, can complete critical actions, and encounter any usability issues or software defects. The accelerating testing speed is made possible through AI.

There are many benefits of autonomous usability testing.

Reduces development costs: AI user simulations can help you identify issues earlier to save time and money on post-launch fixes. Because AI is also responsible for web app discovery, you know that the AI will fully exercise your software product as it conducts usability tests with simulated users.
Helps you better tailor products to users: AI can anticipate user needs through simulated interactions. Findings from test sessions will help your development team design products that meet their expectations.
Enhances user satisfaction: As you work through findings and validate issues, your development team can improve the user experience. Happy users will keep churn rates low.
Combat cognitive bias: AI simulations provide objective feedback that avoids assumptions due to your development team’s familiarity with the web app’s logic flow, architecture, and design.
Increases accessibility: AI-led usability tests will cross over into areas that could affect accessibility, helping you design for diverse user needs and build inclusive products.

When Should You Conduct Usability Testing?

Usability testing should be an ongoing process throughout the product lifecycle. We believe autonomous usability testing is a critical part of continuous comprehensive testing, which is why usability testing is an essential phase of the Testaify product roadmap. Autonomous usability testing can continuously simulate user interactions as your web app grows and evolves through each release cycle. It saves time and money, allowing you to continuously test and refine your product based on usability feedback.

About the Author

Rafael E Santos is Testaify's COO. He's committed to a vision for Testaify: Delivering Continuous Comprehensive Testing through Testaify's AI-first testing platform. Testaify founder and COO Rafael E. Santos is a Stevie Award winner whose decades-long career includes strategic technology and product leadership roles. Rafael's goal for Testaify is to deliver comprehensive testing through Testaify's AI-first platform, which will change testing forever. Before Testaify, Rafael held executive positions at organizations like Ultimate Software and Trimble eBuilder.

Take the Next Step

Testaify is in managed roll-out. Request more information to see when you can bring Testaify into your testing process.

Why is Usability Testing so Hard?