Load testing isn’t new, but it has traditionally been the domain of performance specialists. Now that DevOps is driving organizations to “shift left” performance testing, functional testers are racing to get up to speed. Fortunately, load testing isn’t as complex as it may seem—in fact, you’ve probably already been doing it for years.
Watch as Tricentis Flood’s Tim Koopmans, Director of Load Testing, gives a crash course in the evolution of load testing— including how it entered the realm of the functional tester, the key concepts you need to know today, and what to expect in the future.
Here’s the full transcript
Good day. My name is Tim, and I will be talking to you today about the evolution of load testing: what testers need to know. Actually I’m going to change the title because I think load testing is something that everyone should know and Marketing isn’t here, so we can change the title.
You may have noticed by my accent that I am Australian. Australia is a long, long way away from everything. You may be wondering what does a guy from Australia actually know about performance? I would say performance is near and dear to my heart because of this physical distance: 15,757 kilometers, 23 hours on the plane, or thereabouts. If you think about performance in the context of an HTP packet that needs to travel at the speed of light, in theory, the quickest that packet is going to travel between those two countries is 105 milliseconds, roundtrip.
And, you know, that still probably doesn’t mean that much to you. If you think about Australian internet, it kind of looks like this. There’s a lot of 26 pair cable, copper cable lying in pits, which means it gets wet. On a rainy day back at home, my complicated exchange up the road is capable of circa 320 milliseconds roundtrip time. That may not sound like a lot because we’re talking about a third of a second, but just say I was in Australia and I needed to look at the OBB train time-table. The front page has 55 requests, and would take 20 seconds to load – that is a long time in internet time to actually look at the front page, I haven’t even got to the time-table yet.
And, you could say “who cares?”, because the people running this train time-table online don’t really care about customers in Australia. But, you can’t get away with that anymore because, the reality is, everyone’s services and electronic goods are distributed online. You can’t get away with a 20 second response time in the modern age. It’s just not acceptable. This problem could be easily solved by getting rid of the cat images on the internet – you could put it on a CD-N, for example, and it would go a lot quicker.
So, performance is near and dear to my heart, and today I want to talk to you about where load testing has a place in terms of performance.
First, load testing as a definition; load is simply putting demand on a system and then measuring it for performance.
Pretty straight forward, right? So, who here in the room (if you could just put your hands up) has done either vocationally or professionally, some form of load testing in their life? Oh, that’s pretty good, more than I thought. Okay and keep your hands up if you’ve done it continuously. Okay, alright.
I would say everyone in this room is familiar with the concepts of load testing, and it started pretty early on in life – say, from conception. For the first 40 weeks as you were carried around in your mother’s womb, you put all sorts of demand on her system. And that was a pretty big load test. And the success of that load test is measured by your sheer presence here today.
So, you know that load testing has continued on in to middle age. For some of us, you might witness some form of performance degradation over time, whether it’s a bit extra weight, or maybe you’re a bit slow at chasing the kids. And ultimately, everyone in this room is part of the extended stress-to-break scenario where at the end of our lives, the observer, which isn’t going to be us, is going to witness which of our components was most likely to fail in this scenario. So, intuitively we all understand how to do load testing, what kind of activities load testing should encompass.
In the software testing context it comes down to this: why do we need to do load testing? Once again this is pretty intuitive. If I was to book some tickets to the opera in Vienna, and I got this 503 screen, I would know it was broken. This is the worst-case performance scenario, where the system is just no longer available and I can’t get to the information that I need. We know intuitively when things are broken in terms of software testing. Likewise, if I was going to the ticketing counter at the airport and it was suffering a major network failure like it did recently with all the airline ticketing systems, once again, this is just real-life performance. It’s real-life load testing. You’re experiencing queuing theory in practice.
Then you get to your hotel room and you want to watch “House of Cards”, but you can’t because the WiFi is buffering or there’s not enough capacity on the mobile network. Once again, you have a pretty good feel for when performance sucks, right?
These days, performance is everyone’s responsibility.
We know how to identify poor performance in software. And it really impacts us all throughout the DevOps life cycle. It starts as early as this guy over on the left who’s on the Marketing team, adding a new Ad-Tracker to the website which introduces some additional latency. It extends through to the girl who is working on the Dev Team, and the new version of the project they’re working on doesn’t perform as well as the last one. It certainly impacts the person that is waking up at 3 a.m. from a production monitor or alert, and has to go in to work and down servers. Of course, it also effects any consumers of your system – people that are using your data, or your electronic goods, or your services. So, performance has, whether we like it or not, become everyone’s responsibility in this life cycle.
So, when do you need to do load testing? Hands up if you think load testing should shift left. Hmm, not as many, okay. What’s that, like 10%? And hands up if you think it should shift right. Like, no one? One, okay. Hands up if you have no idea what I’m talking about when I say shift left or shift right.
I’m not going to try and explain it as a buzzword, except that you’re going to see it all throughout this conference and the expectation is that you know what it means. But, you could say that “shifting left” is doing something earlier in the lifecycle, before things become a bigger problem. It involves a lot more feedback and tighter integration loops. Likewise, load testing also has a place in production, which is why you’ve only got that size environment.
Let’s look at an example.
Over on the right of the DevOps cycle, we will do load testing in production because we might be responding to, say, a queue depth that has gone to 50,000 deep, which is bad in performance terms. And we might need to simulate that same scenario in production because that’s the only environment that we’ve got to do it at that scale.
On the other hand, we may be doing it very early in the cycle (which is shifting left), when we start to think about how we’re going to mitigate that risk to production, and we start talking about what the code is going to look like before we’ve written one line of code. That is a form of load testing as well: some form of static analysis or just load testing on the back of an envelope, and it can go through a lot of sudden non-functional requirements and talk about capacity that you’re going to need.
Well then we’re back over on the right again, where load testing is a part of our continuous integration and continuous deployment pipeline, so that we can trigger load tests on a specific build that goes to production. Or maybe it’s back over here on the left again, where we’re linking performance to a particular source code repository so we can identify performance regressions over time.
What I’ve hopefully just demonstrated is that load testing is fractal in nature. It doesn’t just need to sit in one direction or shift in one direction. You really should be thinking about load testing continuously throughout that whole pipeline.
I’ve spoken about why you need to do load testing and where it should live, but you probably want to walk away today with a feeling for what you need to be doing in load testing. There’s really only three things that you need to do. You might be forgiven for thinking that load testing is pretty complex, but it doesn’t need to be, and certainly the goal of Flood and also integration with Tricentis is to make load testing easier and more accessible for more people.
The three things that you really need to do to succeed at load testing:
Obviously you need to be able to create load test scripts, and today we heard Sandeep talk about saying no to the scripts. The reality is however, today we are still producing load test scripts. I’ll take you through a little bit of the evolution there.
You also need to be able to execute those load test cases or scripts, and preferably, you need to be able to do that at scale, so you can’t just get away with just running them on one laptop, for example.
Lastly, you need to be able to analyze those load tests in real time because there’s no point in running a load test and then coming back later and have to look at the results, aside from a historical viewpoint.
So back in the day, ten or fifteen years ago, it was perfectly acceptable to just bang on a URL, which we call “URL bashing”, and just hit it over and over again, and call it your load test. You could get away with that ten or fifteen years ago, because websites were a lot simpler. You might get some early results like knowing that you’ve run out of file handles on your operating system, for example. But these days you’re not going to get far with this approach of just hitting one URL. Web servers, surprisingly, do a good job of this. They’re designed to serve you up responses as quickly as possible that need memory, so just hitting it over and over like this is of little value.
If you look at a more complicated or typical web application today (for example, I’ve got the SAP Fiori demo here), even a simple action gets complicated. You might say, ” I just want to navigate to this page and then click on the Inbox”. Scroll down and click on this Inbox. That actually generates something like 120 requests, over 120 HTP requests just for that one business action.
This is the point that Sandeep is going after in his presentation, when he talks about saying “no to scripts”. In order to script at the protocol level, not only do you have to think about the user actions that you want to simulate, but you also have to start thinking about how the browser is interacting with the server. And to do that, you have to go into a kind of Matrix mode and start thinking about “What should I be doing in this particular case? And what are all the other functions that the browser is doing for me?”. Cookie management, authentications, security, dynamic script policy and execution – you’re going to have to build all of that stuff into your workload model in order to be able to successfully generate a realistic load.
Now some people and commercial tool vendors will tell you you can just record and play back this traffic. So why don’t you just get your browser, stick a proxy in the middle, record all the traffic going through, and say you’re done? Unfortunately, this doesn’t work because if you just took one user and recorded that traffic, and took the same user and recorded the same traffic again, and you looked at all the differences in the request payloads – in this particular example with the SAP Fiori demo, there’s like 72 changes that you now need to account for. And so 72 changes that burden you down, keep you awake at night, and the next time you release application code, you have to do it all over again because it breaks. It’s going to get harder and harder to actually keep on top of this at the protocol level.
A customer told me that load testing is a pain to script after every code drop. There are lots of changes, and the best way forward was to rescript after every release, which kind of sucks. But, on the other hand, if you are talking about Tosca here, browser testing is not a pain. So the question is, why aren’t we doing more stuff with the browsers? Conceptually, we could trade in complexity for concurrency, and we could start doing stuff like this. Simple steps that describe what we’re trying to do in the browser. Visit the page and click on a link. This is pretty familiar to anyone that’s had to do automation. And it’s definitely on the path to what Sandeep was talking about in terms of codeless design. We want to take this further in 2018 and borrow or leverage all that functionality that’s built into Tosca so that we don’t have to keep running these scripts.
If you asked me in 2011 to launch 50,000 browsers with Selenium, I’d say one or two browsers per night. I would’ve felt sorry for the penguins on the polar ice caps because we would’ve blasted another big hole in the ozone layer with all the servers that we needed to start. We’d need like 25,000 or 50,000 servers. But the difference today, is that in 2017 we can leverage cloud technology to make this a lot easier to launch the infrastructure and still care about the penguins. So, the real advantage that we have today is that there’s projects like Google Chrome and the headless Chromium project, which makes this automation a lot easier, and the concurrency is a lot better. So when we’re comparing Tricentis Flood to our competitors, our competitors are still messing about with, let’s say 2 to 5 browsers per night, and we’re playing around with up to 50 browsers per night. Suddenly generating a load of 50,000 browsers is a lot more achievable with, say, 1000 load generators.
That’s a pretty long track on just load test creation.
Ultimately, at the end of the day, it becomes the tester’s responsibility to choose the right tool for the job.
It doesn’t always make sense to test at the protocol level, and it doesn’t always make sense to test at the browser level, and in some ways, you’re really just trading complexity and concurrency between the two. At the protocol level you’re going to get much higher bang for buck in terms of concurrency per node, but at the browser level you have a whole lot less complexity and script maintenance to maintain between releases. So it’s really about using a bit of both, and we see a lot of customers doing that as well. A kind of hybrid approach with a target HTP API, and they’ll also supplement with browser load.
I’m not going to talk too much about execution. I just want to say, or brag, that at Flood we consider ourselves the masters of this space. We’ve been doing cloud-based execution for over five years now. We’ve learned all the lessons; we’ve run up the giant bills with Amazon when we forgot to switch off infrastructure because our code didn’t work and likewise, we were on pretty good talking terms with Amazon. But anyway, we’re pretty good at doing cloud-based load testing these days. And the real benefit of doing cloud-based load testing is that you get the economy of scale. So, launching 1000 nodes is trivial, you don’t have to have the expensive lay time in terms of provisioning test infrastructure.
You can get to the point in your test where you say, “I’ve just used 500 nodes, I now need another 500 nodes because we want to push the system a bit further.” There’s no expensive licensing attached to this, and it just lets you proceed on. So, it’s really just about paying for what you use, and that is really popular with our customers.
I want to quickly talk about analysis and what you should take away, what you should look for. This guy, Edward Tufte, is a famous data scientist. He said that “Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time, with the least ink in the smallest space”. Love this guy.
Here’s a good example of what he’s talking about [Map of Vienna’s Underground Metro System]. Complex information, directional, describes the intersection of stations and how they intersect with different train lines, which gives a rough idea of distance. And not only that, really good use of white space and nice primary colors. And I can look at this train up here, or I can go to Tokyo, Melbourne, or London, and I’ll pretty much be familiar with this complex route system no matter where I go. This is a really good example of displaying complex information. Here’s another really good example: from our product Flood, where we’re displaying complex information. We’re talking about three things: concurrency, which is the orange line – the number of active users doing something; the green line is transaction rate – how fast those users are doing something; and the blue line is of course, response time – how long it’s taking for those transactions to complete.
Why do we look at those lines in a load test? Well, ultimately, we’re trying to find bottlenecks. There can be other objectives, but this is the most common thing. I’ll tell you a bit of the story. Actually if I’ve got the time, I’ll make you all load test experts in the next two slides, I promise. So, in this slide, I’ve got orange stepping up, concurrency applying increase concurrent load until it gets to the top and it just holds a steady state. I see that the green line is also trekking behind it, which makes sense, right? The more users I have on the system; I expect those users to generate more through-put instead of doing more transactions per second. And if you look at the blue line, that’s the real tell here saying the system is pretty good, because despite the increase in concurrent uses, and despite the increase in through-put, the system on the test can respond in the same amount of time. It’s nice and flat. Did everyone get that?
Alright, here’s the test then. Is this a bottleneck? [Graph of all lines increasing] Hands up. Give you a bit more time. Concurrency is the orange line, the green line is through-put, and the blue line is response time … Is this a bottleneck?
This is a classic bottleneck because the green line is increasing over time with the concurrent load, but what we see at the top is that it starts to flatten off. And it’s flattening off because the server can no longer serve any requests any faster despite an increased number of users arriving at the server. And the real tell, once again, is that the response time, which is the blue line, at this point a third of the way along the chart, say around here is starting to kick up and degrade over time. So this is a bottleneck.
What about this one? Similar to the first chart, bottleneck-hands up if you think it’s a bottleneck? This is another type of bottleneck. We have a nice sign wave oscillation between the blue line and the green line which indicates, in this particular case, a little pool is filling up and then releasing; filling up, then releasing; and it’s giving this oscillating behavior. So this is definitely a bottleneck.
This is what we’re trying to do with Flood. We’re trying to make load testing easy so we’re going to be focusing on features that help you with load test: group creation, distributed execution at scale, and also analysis features so you can pick more bottlenecks.
Load testing is really everyone’s responsibility, and that’s the main thing that I want you to leave today with; the feeling that you are responsible for load testing and, realistically, that the more load testing you do, the less time you’re going to spend tracking down production defects.
So what that really means is, another quote from Edward Tufte is, he said, “I have stared long enough at the glowing flat rectangles of computer screens.” He said, you know, let’s go for a walk in the park, plant a plant, walk the dog, go to the opera, or read a book or watch Netflix.