Why is Usability Testing so Hard?
AI to the Rescue - Part 8
This blog post is the eighth in a long series. We recently introduced the concept of Continuous Comprehensive Testing (CCT), and we still need to discuss in depth what that means. This series of blog posts will provide a deeper understanding of CCT.
In our introductory CCT blog post, we said the following:
Our goal with Testaify is to provide a Continuous Comprehensive Testing (CCT) platform. The Testaify platform will enable you to evaluate the following aspects:
While we cannot offer all these perspectives with the first release, we want you to know where we want to go as we reach for the CCT star.
What is Usability Testing?
If you read our blog post about The Ever-Rising Quality Standard, you know user experience is an essential quality attribute. According to Wikipedia, “Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience.” This definition rightly makes Usability sound very subjective.
As such, usability is the most challenging quality attribute to test. In the book Handbook of Usability Testing, Jeffrey Rubin and Dana Chisnell use Bailey’s Human Performance Model to explain the scope.
As you can see from the figure above, there are three components to consider in any type of human performance situation.
- The Human
- The Context
- The Activity
Product teams have primarily focused on the activity component since the early days of software development. But that’s not enough. Most teams need to gain the expertise to handle the kind of work that covers all three components.
Why is Usability Testing so tricky?
Usability Testing goals are tricky to test. In the same book, the authors state:
The overall goal of usability testing is to identify and rectify usability deficiencies. The intent is to ensure the creation of products that: are easy to learn and to use; are satisfying to use; and provide utility and functionality that are highly valued by the target population.
How do you test that a product is easy to learn? How easy is it to learn the product? How do you know a product is satisfying to use? By definition, testing occurs in an artificial environment. But in the case of usability testing, this is even more pronounced.
For example, in functional testing, I need to test the following business rule: “City tax applies to all transactions that are $900 or more.” I can create boundary value tests, each with a clear expected result. I can easily replicate the environment of most users in the case of web-based apps. Yes, I might need to spend time creating the proper setup in the application, but in the end, each of my tests will have a clear Yes or No answer. It passes or fails. There is no ambiguity around the result.
In Usability Testing, that is a lot harder because replicating the human component becomes more complex, and in some cases, the context can be hard to reproduce. You can think of mobile apps used by construction workers at the construction site with weak or no cell phone signal. Usability testing is usually a task-based study conducted with a sample population of users and regularly accompanied by a survey. Sometimes, the context might be challenging to reproduce without conducting a field test.
To create a usability test, you need the following:
- The team formulates a hypothesis.
- The team assigns a randomly chosen group of participants to experimental conditions.
- The team implements tight controls.
- The team uses control groups.
I have never seen a team meet these conditions before. I have never seen a usability test using control groups. Plus, very few teams have enough participants to choose from them randomly. Finally, most teams do not have tight controls or know how to conduct a usability test with a well-defined hypothesis.
The best-case scenario is a usability lab where you can see 20+ participants. You can record what they are doing and observe each completing the tasks. The most common scenario in organizations with a team that conducts usability testing is an online test with 5 to 10 participants. The problem is that most usability testing performed today can only provide a clear negative signal.
For example, if you have a task and four of five participants cannot complete the task, then you have a strong signal that this design has a problem. Still, you might need to dig deeper to identify the issue if two participants stopped at different steps, incapable of completing the task. But if four of five participants complete the task, does that mean the design is ready for release? What about if all five participants completed the task? Is that good enough? The devil is in the details.
Teams tend to survey the participants to capture sentiment about the application and tasks. If all participants complete the tasks, but the sentiment from the survey is negative, then you might still need help with the design. How do you know? The answer is you do not know.
Is there a better way of conducting usability testing?
There may be a more effective way. Many teams have decided to iterate quickly and test ideas in production. In other words, many teams shift-right their usability testing. Thanks to CI/CD pipelines, cloud computing, usage monitoring applications, and feature flagging tools, you can efficiently address requirements like control groups and having a statistically significant population. Plus, you handle the context and human components in the Human Performance Model.
Modern cloud applications can break their production environment in such a way as to support multiple variations of a specific feature at the same time. In such a scenario, you can meet most of the criteria for usability testing. This type of configuration allows teams to get more definitive answers about what works and what does not.
However, this approach has a problem: You are testing with your production users. Can you identify usability issues sooner? Is there a way AI can help you improve your usability and give you an early warning without all the hassle of having a usability lab? There is, but to learn about it, you must check our next blog post in this series.
Are you interested in the other blogs in this series?
- The Heartbreaking Truth About Functional Testing (AI to the Rescue - Part 1)
- Have you said, “AI won’t help me as much as I thought?” (AI to the Rescue - Part 2)
- What is Performance Testing? (AI to the Rescue - Part 3)
- How to Conduct Performance Testing (AI to the Rescue - Part 4)
- Performance Testing: What about Scalability, Stability, and Reliability? (AI to the Rescue - Part 5)
- Back in the ‘SSR [The PSSR, that is] (AI to the Rescue - Part 6)
- How AI Performance Testing Works (AI to the Rescue - Part 7)
About the Author
Testaify founder and COO Rafael E. Santos is a Stevie Award winner whose decades-long career includes strategic technology and product leadership roles. Rafael's goal for Testaify is to deliver Continuous Comprehensive Testing through Testaify's AI-first platform, which will change testing forever. Before Testaify, Rafael held executive positions at organizations like Ultimate Software and Trimble eBuilder.
Take the Next Step
Join the waitlist to be among the first to know when you can bring Testaify Functional for Web into your testing process.