Lego Load Testing

Exploratory Testing

Continuous Testing Live: Why Load Testing = Exploratory Testing

“It seems crazy to me, not to be aware of performance-related information from the outset and continuously. If you’re not aware of that sort of stuff you’re going to get surprises at the end.”

On this week’s episode of Continuous Testing Live, Tim Koopmans and Michael Bolton sit down to discuss the exploratory nature of load testing, and why a lifetime of lessons learned from first-hand observations is so valuable for testers of all specialities.

You can now subscribe to Continuous Testing Live, Tricentis’ new podcast, in iTunesGoogle Play or SoundCloud!

Tim: In my test consulting career I had some key examples of where chasing arbitrary requirements could lead you to ultimate product failure. I was involved on a project which was taking a mainframe system and splitting it out to a distributed system. It would replicate databases amongst different states in Australia. The requirement around the replication interval of the frequency started off as six hours and wound its way up to thirty minutes. At the time, this was around 2004, we were chasing pretty ambitious targets for how fast we could replicate a given amount of data around Australia. We must have spent six months trying to tune, at that time it was Microsoft SQL Server, to try and get it to replicate as fast as we could with the network latency you have in Australia. Near the end of the project, we were having a lot of trouble trying to hit thirty minutes.

The project manager said to me, “Tim, I think you should go and visit the site to get a better understanding. Have a look at the business and the way it operates.” The business was a picking and packing sort of system, so it’s in a warehouse and it has a bunch of operators and they’re scanning stuff on a conveyor belt. And within five minutes of watching what happens, I saw parcels coming along a conveyor belt that had just been scanned, and the corresponding system, which was mainframe-based, would spit out a sticker that the packer could put on the box before it falls off the conveyor belt—and my whole world crumbled. I realized we had a fatal flaw in the system. This had to be real-time. Thirty minutes was nowhere near quick enough. It has to be the length of a conveyor belt, and when I went back and told the project manager, basically the project crumbled because the design was so wrong. It’s interesting that you can waste so much time and effort on performance testing, load testing, and still just miss the obvious stuff.

Talk to the business, talk to customers, talk to real users of the system that you’re testing because often they have pretty valuable insight on how things should perform.

Michael: Once again, we get this lesson from sociology. Harry Collins, of whom we are very fond, very early in his career, he was following guys around who were trying to make a laser. He’s a sociologist of science, so he was watching the progress of this thing, which, in theory, ought to have been possible to make. And, the scientists banged their head trying to create a “Transversely Excitable Atmospheric Carbon Dioxide Laser.” And, theory said it was supposed to be possible to make one of these things, but they couldn’t do it. They just kept banging their heads against the problem.

And, eventually, one guy cracks it. He manages to build one of these things and get it up and working, and so he publishes the plans for it. So, there are detailed instructions on how to construct the machine, what has to be connected to the components you have to use. All these details are in there and still, nobody else could build it. Unless, and until, he went to their lab and hung out with the people there, or somebody from their lab went out and hung out with him and they could make these kinds of observations. It turns out that the length of a particular cable had to be six inches plus or minus some factor and everybody else was doing it in different lengths, and arranging the stuff in different ways, and it turns out there was a really key timing element. At the speed of light, these things can matter a lot.

Or, maybe it had to do with the capacitance of the wire or something like that, I don’t know the specific detail. But, yeah, the presence, the capacity to observe, and the tacit knowledge, I think this is where he started getting on his big theme, themes of tacit knowledge and expertise—so much depends on us being in places, being situated in a physical space, but also in communities. And, also in collegial relationships with people, face-to-face relationships with people. That’s where discoveries tend to happen. If you set a performance target for something and you don’t know what the actual job is, or the perimeters of what’s actually happening, when your model is not informed by continuous interaction with actual reality—mistakes like that are going to happen.

Tim: Yeah, my advice and my lesson learned on that was always to talk to the business, talk to customers, talk to real users of the system that you’re testing because often they have pretty valuable insight on how things should perform.

Michael: For sure, and if at all possible, watch the work being done. When you were describing this conveyor belt system, one of the most interesting things I ever did in my whole career as a consultant was to get taken to the back of the distribution center where people were in the front writing software for this thing. They said, “Alright, let’s go to the back, and let’s look at some of the problems that we have to deal with.” It was a distribution center for convenience stores. First of all, the thing is impossibly massive, a huge place, and the other thing was that the people who worked in the distribution center, inside it, sort of monitoring the robots and the automated systems, some of them, every now and again, had to do a shift in the freezer, because, of course, a convenience store sells a lot of frozen goods.

So, all those swell ideas for touchscreens and stuff like that because these people have gloves on. And then the other thing was the way it worked was a conveyor belt, on which bins made their way all around the place, and these robotic arms would knock boxes off the shelf and into the bin. And the way they had to assess whether the right stuff was in the bin was they would weigh the bin. But, of course, there’s so much stuff moving through this, they had seconds, not seconds, milliseconds to make a measurement of the bin and its contents and do a calculation. “Well, we had four boxes of Mars Bars, and we had eighteen cans of Coke…” I’m using completely arbitrary examples, “…and three packets of Bic razors. Does it weigh the right amount?” And that was their way of checking to see that the right stuff was in the box.

Tim: In fact, I think Lego does something like that. Have you ever wondered how they get all the right pieces? It’s all automated, and I think they use weight at a very microscopic level to find whether or not it’s got the right contents in the pack, so a kid isn’t disappointed when a head is missing from their Lego man.

Michael: I went to another place one time, they were making software that identified how much stuff, liquid stuff, was in a tank. One of those tank farms that you see when you drive around with the big industrial chemicals, and oil, and stuff like that. The way they would measure it was there was a light emitter. They’d bounce the light off the surface of the stuff they were measuring, and they measured how long that took. So, it’s a speed of light kind of thing, things are happening that quickly. So, one of the bugs they ran into is “Some things are foamy.” So, they weren’t getting an accurate measurement because there was this much foam at the top, but when you’ve got something that’s that big, that measurement makes an awfully big difference.

It’s not “Pass/Fail.” To me, performance testing is more like a risk management activity. You can never get to the end of a test effort and say, “We’re done, ”because you only know as much as you know, and what you’ve observed during testing.

Tim: When people ask me how to be successful at load testing, or, how do you guarantee performance, I always tell them that’s a massive trap to get yourself into. And, I think that’s because maybe the attitude comes from, say, people come to performance testing with an automation background and are heavily into seeing “Red/Green” kind of results. I always say to just forget that. It’s not “Pass/Fail.” To me, performance testing is more like a risk management activity. You can never get to the end of a test effort and say, “We’re done, ”because you only know as much as you know, and what you’ve observed during testing.

What you can provide is a risk assessment. You can say, “I’ve identified these types of risks to production based on impact to users, or the likelihood of occurring. We’ve tested some mitigation around that.” Maybe it’s a control, maybe we’re out to reduce the risk, maybe we’re out to remove it all together, which is always a good story.

But, sometimes the risk is always there because it’s defined by other things. Maybe you didn’t have production infrastructure to test on in the first place, so you’re only guessing what the performance is going to be like. So, I try and encourage that attitude of moving away from a hard gate that you’ve got to pass through for performance. One of the things I like to suggest, but which never got any play, was trying to decouple performance testing from release cycles. It used to be when I was doing consulting, testing would be tightly coupled with a release management phase.

We don’t want to find out about this stuff late. Performance testing starts at the same time testing does, and testing starts in the way we look at, at the same time development does.

There’d be a bunch of system testing, integration testing, UAT, and then you’d do performance testing at the end. When you do that, you find risks and then nobody has any time to action the things that you found anyway. So, I used to say, “Why don’t we just decouple it from release, and just test as we have candidates and environments ready to go. Of course, that never really got play in the environments I was working in, but that makes me think maybe that’s more to do with a continuous form of load testing in that exploratory nature.

Michael: Well, testing is about learning, fundamentally. I brought this story up in the class that went on this week. A colleague of mine, a friend of mine, I don’t get to talk to him often enough. A fellow named Pete Howton told me this wonderful thing the last time I saw him. Now, I don’t think he bills himself as a performance tester, but whenever he goes into a new contract, a new situation, he talks to programmers and tries to get them to provide him with APIs with fairly low-level functionality. And, as a tester not focused on performance, he then sets up a little script that calls his function or calls his API and runs it a thousand times. Or, some improbably large number of times. Providing it with either consistent or varying data, and then he logs it.

He keeps doping that. That kind of part of the general process, he keeps running. And he says whenever he sees a big change in the speed—for some value, five, ten, twenty percent —but it can either speed up or slow down, it doesn’t matter. But when you see a big swing in how quickly this function can go through a thousand, ten thousand iterations, he says about half the time there’s a bug there. Always.

So, testing is about learning, testing is about awareness. And, I’m not a performance testing specialist by any means, but the people I respect in that field say something that seems really super important to me. We don’t want to find out about this stuff late. Performance testing starts at the same time testing does, and testing starts in the way we look at, at the same time development does. On a car trip, you don’t say, “We’re going to drive for five hundred miles, and then we’re going to see how fast we went. We’re going to drive for eight hours, and then we’re going to see if we’re there or not. We’ll spend the last hour of our journey looking out the window because we have the driving phase and then we have a looking out the window phase.” Well, in a car you can quickly imagine how separating the driving phase and the looking out the window phase is going to work out. So, similarly, there’s a driving phase and a looking at the dashboard phase.

Tim: Right.

Michael: No, you’re looking at the dashboard, you’re glancing at the dashboard continuously because the dashboard is going to tell you about stuff that you want to monitor. You’re not staring at it, you’re not fixated on it, but if you see the temperature creeping up, or you’re seeing the same number of RPMs of the engine, but the car is slowing down you can suspect that it’s a transmission problem. It seems crazy to me, not to be aware of performance-related information from the outset and continuously. If you’re not aware of that sort of stuff you’re going to get surprises at the end.

Tim: I like that analogy. If that car was a car full of developers and testers, a family of developers and testers, and maybe an ops guy sitting in the back as well. One of the things I like to say these days is, “Performance is everyone’s responsibility,” but as soon as you mention the word “everyone,” that means no one is looking at it. But, I like to describe it in the way, that I think performance is intuitive. Everyone, especially in the web-based world, everyone has an idea of performance that sucks. If you can’t watch Netflix because it’s buffering, you kind of intuitively know that’s a performance issue. If you’re trying to buy some tickets online and the site’s not available, you know that’s something to do with performance.

Back to your car analogy, maybe there is a person with a primary responsibility who’s meant to look at the dashboard, or has the closest view of the dashboard, But, the wife sitting in the right-hand seat or the left-hand seat—depending on which country you’re in—can also tell you about a light on the dashboard, or maybe you completely missed the on-ramp to the highway, or totally just slow down. So, I think everyone has a play on it, and that’s one of the things that I try to do in thinking about platforms and tooling, is to make load testing more accessible.

That’s a holistic goal of Flood, is to get rid of that image that it’s just this little niche group of people’s responsibility. Everyone can have a go. You can log into the UAT environment, and if it’s taking sixty seconds in UAT, it’s probably not going to run that much quicker in production. One user could tell you that story. Or, you can use browser tools to look at frontend optimization stuff. People can educate themselves on tools and things available to discover information about performance.

Michael: They certainly can. And, I think the point you’re making is tremendously important when you’re talking about, “When it’s everybody’s responsibility it’s nobody’s responsibility.” Certain Agilists will tell you, “Oh, we want to get rid of roles.” Well, okay, let’s see how that plays out. Let’s not have a “cleaning up the kitchen” role. Cleaning up the kitchen is everybody’s responsibility. So, you get to see what it’s like when cleaning up the kitchen is everybody’s responsibility, there’s an old pile of dirty dishes set in the sink.

Tim: Then there’s a nasty sign.

Michael: Yes, then there’s a sign above it that says cleaning up the kitchen is everybody’s responsibility. In economics that’s a tragedy, the comments. If I clean up my stuff, that’s great, but if somebody else isn’t cleaning up their stuff why should I clean up my stuff? I think it is important to have specialties of testing and specialties within testing as well.

In any group, to ease the problem of the mindset shift from one set of focuses to another, because, to some degree, that’s expensive. It can also be really valuable, but it’s effortful and it’s time-consuming, and our projects have a level of complexity that nobody could possibly hope to be good at every aspect of it. So, for us, that role, and somebody whose responsibility and commitment it is to say, “Hey, let’s not forget about this,” I think that’s really, tremendously important. I don’t make myself an expert in performance testing, because, I have a network, I have a community. We were chatting yesterday at lunch about friends that we have in common. When I have an inkling that I might need to know something important, I can go to those people. I have a community of people who are specialists at it, who are highly skilled at it, who can help me when somebody is asking for my help, I can say, “Take out the middle-man, go talk to some of these people.”

If we don’t have specialties, if we don’t have rules, it seems to me we also lose the sense of commitment to study something deeply. I think that the idea of rejecting rules is a dangerous one.

Tim: I think you need people with that experience-base, and along with that experience comes other things like intuition or knowing how to reduce a problem to make it easier to work with because they’ve done that before. Sometimes that backfires because you completely you have the wrong hunch. But once again, if it’s exploratory in nature and you’ve got pretty tight observation, decision, action-type loops, you can recover from that quickly. I mean, I do that all the time when I’m trying to help customers.

I have a unique position because we’ve helped thousands of customers. I’ve walked into customers with problems with Flood with no background, no test plan, just purely what they’re facing at that point in time, so I have to rely on a lot of intuitive guessing. “Oh they’re testing a CDM, maybe it might be related to this problem.” I can start giving them ideas to go and test to take it further. So, it’s good to have those skills, and I guess there will always be people that are interested in performance topics. If you want to get deep down in a particular step because its linked to personality, too.

Michael: Absolutely. Temperament, education, and let’s take advantage of the fact that many of us are fascinated by different things, and relish that, rather than then trying to reject it, or trying to say we should all be interested in the same stuff. What a boring world that would be. We see the same thing in the overall testing world, “There should be one standard universal language for testing.” Okay, well, “Majority rule, let’s make it Hindi.” I mean, there’s a certain kind of appeal in being able to communicate with each other, but I think we have different languages and we have different specialties and we can take advantage of that. We don’t have to make it a fortress and we don’t have to make it a prison. As James Bach says, we can make it a villa where we have the responsibility of taking care of the place and keeping it tidy and we also have the responsibility to welcome people in.

Tim: Yes, and that’s back to my original point of making things more accessible. So, I think tooling can have a play there. Sometimes the complexity of the tooling, or how valuable it is can dictate whether anyone can get started. I would say in these days, there are more and more ways to get started with load testing. Dare I say it, open source has helped that a lot. It’s just a matter of continuing with that interest in developing it into something that you can actually use on a job, a testing role, wherever you want to use it. Maybe you’re just trying to, I don’t know, perform better in a game. Who knows? I think load testing has a practical application that’s pretty wide.

Michael: I see things in your map that help to remind me that there are, of course, subcategories of performance-related testing we can do. And, in particular, stress techniques are fascinating to me. I like the way Cem Kaner described this in one of his black box software testing classes. When we ramp up the stress on something, we don’t know where it’s going to break. We don’t know where it’s going to expose a failure. We know it will eventually. Something is going to fall over. What’s hilarious to me, is, every now and then, I see these testers who say something like, “I’m using JMeter and my product is falling over at eight hundred transactions a second. How can I tune JMeter to make sure my product handles a thousand transactions a second?” I’m almost gobsmacked. “Dude, you found a bug.”

Tim: Stress to break testing is really interesting because that’s also often a stated goal or an objective from the test effort. But, I always try to question stress to break, but go back to asking, “What’s the workload model?” And, in that workload model, “What’s the actual likelihood of you hitting that?” Because you can make a lot of people worried about the results of a stress to break scenario. People are going to say, “Oh, my God, all the application servers are going to fall over.” But how likely is that going to happen in production? So you always need to balance what you’re trying to do with different types of testing.

It’s a similar thing, where, today, I can break the database server. I can tune that admiral capacity, and tomorrow I’ll break some caching layer or something that backs on to the SAN. I can’t remember who said it, but it’s around the lines of “You never fix bottlenecks in a system, you just move them somewhere else.” That’s what I get bitten by all the time, where do you stop? How far do you go before you know you’re done with a test effort?

Michael: Well, the first thing is, I like that quote, “You never fix bottlenecks, you only move them.” At the same time, you never break the software either, you only find out where it’s broken, where it breaks, or where it falls over. You’re not doing that, you’re not making it break, you’re discovering where it will break in these kinds of conditions.

The answer to your question, it seems to me is, once again, that’s not exactly your business. Too often, as testers, we say, “When I try these things the product falls over, therefore, we’ve got a bad product.” No. We tried these things, and it falls over at this point. Dear product managers are you okay with that?” If only we shifted our focus away from a pass/fail, a green/red, a true/false. If only we did that and said, “Here’s the fact of the program. It’s up to you, dear project manager, to determine whether you’re okay with that.” Then we can relax quite a bit, and we wouldn’t feel pressured to try to alter or distort the test in some way so that it now passes. I just want to know what the product is going to do and tell people about it, and then they get to decide, “Actually, that target was probably too heavy, we can get away with less.” Now that we know where the risks are, where the vulnerabilities are… as you suggested earlier, the product is never going to hit that.

Now at the same time, as a tester, it’s also my responsibility I think—and as an Australian, you should appreciate this—the possibility of a black swan. We’ve got a little blog site, a news site that we can expect ten thousand transactions a day, or some silly number like that. And then all of a sudden it turns out that we’ve had some marvelous insight on the blog, and now CNN wants to know about it, and it’s a big story on CNN and now we’re getting orders of magnitude more interaction than we anticipated. Well, it’s the testers job to remind people of that possibility, to be heard, and once you’re heard, sit back and let the project managers manage the project.

Tim: And if you don’t do that, that’s how you end up on PerfBytes’ “News of the Damned.” Like, a Black Friday sale, or It’s a very popular concert, and you knew that there’s a good chance that you’re not going to be able to forecast or predict the demand, you need to test that. And not only test it, and can you make whatever arbitrary goals you’re trying to hit, but what do you do when capacity is reached? What are the actions then? What happens if the region goes down and you start talking failover stuff? How do you actually recover saying you do crash, how do you bring the services back online under load? There are really interesting sorts of performance scenarios that I don’t often see in the traditional test plan. I think they probably fit more into the realm of someone who’s got experience in load testing and knows how to take a different path and look at these different scenarios.

Michael: Backups are happening reliably every single day. Has anyone tested the restore? The number of times we get in that kind of trouble. I need to come up with an aphorism for that, something along the lines of “You’re an idiot if you don’t back-up, and you’re an even bigger idiot if you back-up and you don’t test your capacity to restore.

Tim: I see a lot of organizations call these “Game-Day Scenarios,” or, “Release the Kraken,” it’s a really good part of load testing to get into as an organization.

Michael: Well, it’s fun, too, to sit around the table. We had a game that we used to play, something along the nature of “and then…” There’s a little thing, it turns out that we drop some packets, every ten thousandth packet gets lost or dropped or something, and then what happens? Well, it turns out that there’s a single bad transaction somewhere in the database. And then what happens? Well, we go in to try to fix that transaction. And then what happens? Well, we find out that we can’t shut down stuff long enough to fix this transaction. And those sorts of scenarios. There are lots of stories like that.

My favorite is the Royal Bank in Canada in 2004. It was not really testing in production, it was testing that didn’t happen at all. They made a small change to the way payments were handled, and they realized about three hours after this has been in production, they realized there was a bug. They said, “Oh no, no problem, we can just roll it back.” Well, they could roll it back really quickly, but they had to restore the prior state of the system, that’s the thing that they hadn’t really thought about. They had to stop pre-authorized payments for two weeks while that was going on.

So, it was a one-line change, a three-hour problem window, or three hours between releasing it or deploying it and discovering it, that doesn’t seem like very much, but by the time people aren’t getting paid, the bank, the customers aren’t getting paid and missing their rent checks and all that stuff, the cascade of failures like that is massive.

Tim: Yeah, I mean, the post-mortem reports which come out of those are more frequently made public these days, I’ve noticed. A lot of big site values will be quite transparent about what happened. It’s really interesting to learn as a load tester or a performance tester. And likewise, if you’re in load testing, it’s also a good idea to just sit in with the ops team, or go do production support, or whatever it’s called in the organization. Because the real benders are the ones that come from real life and not in the canned scenario.

Michael: Another Weinberg aphorism is, “It’s really important to learn from your mistakes. It’s argumentatively much more important to learn from somebody else’s mistakes.”

Tim: People are pretty forgiving. I’ve found that if you’re just honest about your mistakes—don’t try and cover up anything as a load tester, share what you’ve learned from it. Because let’s face it, no one can predict the future. You just need to be completely open down the line and open about the information you’ve found. You will succeed.