What is Performance Testing?
AI to the Rescue! - Part 3
This blog post is the third in a long series. We recently introduced the concept of Continuous Comprehensive Testing (CCT), and we still need to discuss in depth what that means. This series of blog posts will provide a deeper understanding of CCT.
In our introductory CCT blog post, we said the following:
Our goal with Testaify is to provide a Continuous Comprehensive Testing (CCT) platform. The Testaify platform will enable you to evaluate the following aspects:
While we cannot offer all these perspectives with the first release, we want you to know where we want to go as we reach for the CCT star.
It is time to talk about Performance.
Like Functional testing, Performance testing is complicated, and it has a lot of variables. While the industry agrees on specific aspects of performance testing, the devil is always in the details.
Wikipedia defines performance testing as “a testing practice performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability, and resource usage.” The definition seems familiar because almost every vendor or thought leader who writes about performance testing copies it from Wikipedia.
Performance testing is an umbrella term. While there is some disagreement about the types of performance testing, most will include the following:
- Load Testing – testing the system under specific expected load conditions. Usually, that means “peak load.” Yes, I used double quotes. You will find out why.
- Stress Testing – customarily used to identify the upper limits of the system's capacity.
- Spike Testing – as the name suggests, this test increases the system's load significantly to determine how the system will cope with those sudden changes.
- Endurance Testing (sometimes called Soak) – usually done to determine if the system can sustain the continuous expected load.
- Scalability Testing – This testing aims to understand at what peak the system prevents more scaling.
If you check multiple sources, you will see other types like volume, unit, breakpoint, or internet testing. While some of those names are more appropriate than others, most of them do not matter. What matters in performance testing are the quality attributes you are trying to measure.
Do you do performance testing with PSR?
At Ultimate Software, my friend Jason Holzer developed the acronym PSSR. Most people kept pronouncing it with one “S,” so it became PSR. The Acronym stands for Performance, Scalability, Stability, and Reliability (PSSR). He did it because what we care about is answering these questions:
- Performance – Can the system provide an acceptable response time with no errors and efficient use of resources?
- Scalability – At what load does the system stop having an acceptable response time with no errors?
- Stability – How long can the system provide acceptable response time with no errors for a significant period without intervention?
- Reliability – How reliable is the system after months of use without intervention?
These four letters match the four different tests we ran. Each one served as a gate. Each one provided an answer to one of our questions.
Now, we are getting into the details. Notice I did not mention “peak load.” I always find the idea that companies know the performance requirements of their product hilarious. I have never seen a product team provide that information in my career. How do you know what the “peak load” is? Instead, we try to find the actual load by defining a test based on response time and reliability that the system must meet.
Let’s keep getting deeper into the details. The first key question is: What is an acceptable response time? The second question is: How do you test for it?
The first question has an answer that most people start with. It all begins with a 1968 test conducted as part of human-computer interaction research. Those results have become the following advice:
The basic advice regarding response times has been about the same for thirty years [Miller 1968; Card et al. 1991]:
- 0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.
- 1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.
- 10 seconds is about the limit for keeping the user's attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.
Here is the link if you want to read about it: https://www.nngroup.com/articles/response-times-3-important-limits
Some have suggested that the third limit has changed through time as computers become omnipresent in our lives. In the early days of the internet, the 8-second rule became famous. Today, some organizations use an upper limit of 5 seconds instead. Others ignore the upper limit and focus only on the first two lower limits.
When we started, we used the Miller results. We kept the 10-second upper limit. Eventually, I replaced it with a 5-second limit. You can test a single user and see if it stays within the upper limit. Still, if your system can only handle one user in 10 seconds or less, your system will fail in the marketplace.
That means you must define a load test and, more importantly, determine how many concurrent users you can support. But as the saying goes, the devil is still in the details. What are concurrent users?
We will define simultaneous, concurrent, and active users in an upcoming post. We will also talk about how Testaify sees performance testing. Spoiler alert: It heavily depends on our work experience at Ultimate Software.
Special note for those who enjoy our content: Please feel free to link to any of our blog posts if you want to refer to any in your materials.
Are you interested in the other blogs in this series?
- The Heartbreaking Truth About Functional Testing (AI to the Rescue - Part 1)
- Have you said, “AI won’t help me as much as I thought?” (AI to the Rescue - Part 2)
About the Author
Testaify founder and COO Rafael E. Santos is a Stevie Award winner whose decades-long career includes strategic technology and product leadership roles. Rafael's goal for Testaify is to deliver Continuous Comprehensive Testing through Testaify's AI-first platform, which will change testing forever. Before Testaify, Rafael held executive positions at organizations like Ultimate Software and Trimble eBuilder.
Take the Next Step
Join the waitlist to be among the first to know when you can bring Testaify Functional for Web into your testing process.