How do you implement continuous testing in a highly complex and regulated environment like the medical device industry? Manish Mathuria from Infostretch shares their experience implementing test automation for Varian, one of the world’s leading oncology medical device company.
Here’s the full transcript
Manish Mathuria: This is a unique case study in many respects, and I hope to cover how this was unique and how we achieved what Varian wanted to achieve through a rather complicated implementation.
That’s my picture (when I used to look better), and I have my email address just in case anybody wants to get in touch with me.
So a little bit about Infostretch, we do DevOps and quality engineering, particularly in the digital environment, so Varian in one way is a good case study when they were trying to implement a lot of their core business processes in systems like ServiceMax and Salesforce.
We’re over 900 people worldwide, headquartered in Silicon Valley, let’s see … and I guess a lot of good stuff.
And this is about Varian, I hope you know them. They are the world’s largest cancer and oncology medical device company and software company, and they make machines to treat patients for cancer. So they do chemotherapy treatments using their equipment and stuff like that. Pretty mission-critical stuff – this equipment needs to be serviced in time or people lose lives, and what we tested had direct implication on that. They also get regulated on some of these business processes, and that’s why it was very critical.
So I’ll be walking through the testing landscape and challenges, including what kind of approaches we used. I’ll also walk you through solution architecture. This was all entirely done on Tosca, and I’ll touch upon some key points on how we leveraged Tosca and how we technically implemented the solution, and what value we were able to derive from that. So let’s get on it.
Speaking from the system’s perspective, this is what Varian has implemented. This plethora of systems implement end-to-end processes across various business functions, particularly sales, service, and certain regulatory aspects of how they track and monitor their business processes. On the next slide I’ll walk through some of these systems, but this is pretty interesting in the way that, by using certain single sign-on solutions, they actually created one application. So it’s like one container of an application which basically pulls in the right app over the web, depending on where you are in the business process. If you are trying to deploy a service engineer to the field, this would span all the way from doing something in Salesforce, then going into ServiceMax, then releasing some equipment in SAP, going back to ServiceMax, and completing that business process. I’ll talk about that business process in a bit.
It was unique in quite a few different ways, in the sense that just keeping the session across these multiple systems was not just difficult for us to test, but it was difficult for people to implement, and we could see that in the way it was implemented. Often when things moved from one system to the other system, things broke down. It was important for us to test that things moved, I mean that the session moved seamlessly from one system across to the other system.
They use this Dell Boomi integration technology because they had to integrate daytime business process from behind their firewall to the cloud, and often there was a lot of complexity in that integration. It had to bring data back and forth between the cloud and behind the firewall in SAP and other systems.
These are some of the business processes. Those in the blue are the ones we actually tested. So as you can see, the goal of this “unity system”, as they call it, is to cater to sales, marketing, a lot of service management, a lot of management of how they interact with their community and the customers, and certain regulatory aspects. I’m not an expert on the regulatory aspects, but that was certainly very critical to what we tested.
Like I said, a lot of these processes are kind of integral ones in the sense that they start at one process and they end at another process – that’s how complex it is. Please stop me if you have any questions at any point.
Audience Member: What kind of platform did you test on this? Was it more like an embedded system, or simply applications on the hardware ending, or is it only applications…
Manish Mathuria: This is all business systems, so we tested workflows that spanned through the system that I showed you earlier, so business flows of service and sales that go across SAP, ServiceMax, regional service provisioning and service management, cloud software, right? And Salesforce, which is obviously a sales CRM in the cloud. These are all business- we are not on the medical device side, this is all on the business process end. But there were still pretty significant regulatory requirements in these processes as well.
Audience Member: I just wanted to see if it was computer systems only, because…
Manish Mathuria: Sure, and a good question.
Some of the testing challenges we had were definitely that each of these systems are mammoth implementations in themselves, and when you actually integrate processes across them, it becomes even more complicated because each of these systems have independent release cycles. Obviously their organization, the companies who built them, release them at an individual release cycle, and all of this needs to be coordinated and stitched together and put it into production in the individual release cycle in a coordinated way. That by itself adds a lot of complexity.
Varian, through a lot of transformation had, by the time we started with them, started following an agile life cycle, which means they had an ambition to release these workflows every two weeks to production, and they were actually doing it. That’s why I guess fundamentally testing this through automation became even more paramount for them – because they realized that what they wanted to do couldn’t be done manually. As you will see there are over 400 or so critical business processes that they were implementing through this, and there is no way in the world you are going to be able to test this manually every two weeks.
So definitely, that agility brings a lot of challenges, and there’s a lot of customization. These systems, when it comes to this regulated world of what Varian was trying to do, particularly these medical devices or medical machinery that they are servicing in the field, have a lot of unique business cases, and none of these things are out of the box, from ServiceMax or SAP or anything like that. There is a lot of customization, and even the process of building that customization and deploying it to the cloud was pretty unique and challenging.
Like I said, all of these combined necessitate the importance of doing automation and testing in a time-sensitive way.
I already talked about complex and long test scenarios, I’ll have an example on the next slide. There was data conditioning and data dependent transactions, and each of these transactions had very unique data-related problems, so when you create a work order for a machine to be serviced in the field, depending on what equipment you are servicing, the work order can be extremely different from one to another. More often these data conditions were not something we could save in the system and keep forever, not just because these systems were getting updated, but because these were very unique situations. So we had to take that into account.
The release process by itself was very complicated because the way it happens in cloud is this: you do all the development, all the customization on the call that you are doing. You do it on premise of course because your developers are sitting on premise, and then you push it to the cloud to be tested in multiple environments. So we actually created a pretty complex CI system for them, because it is one thing to build and deploy this software on premise, but when it comes to these multiple cloud environments and you’re promoting software through different stages in the cloud environment and testing it, it is not a trivial CI exercise. It has very unique challenges. So we actually built a continuous integration system on Jenkins to build and deploy into Salesforce and to ServiceMax and so on and so forth.
Though obviously the regression testing of software needed to be automated; it was time-sensitive, and it had also really unique regulatory requirements, and some of these other challenges that I already spoke about, so I’ll move on from here.
This is an example of a Use Case. This actually is a Use Case of creating a service order and deploying a technician to the field, maybe to a hospital, to actually service a particular medical equipment. As you can see, it actually spans through multiple systems. Not only does it spans through multiple systems, it has data that moves from one system to the other system, and they use Boomi for that, and all these integration pieces are done in a batch mode. They are done on a time cycle, so we couldn’t have our test case start in one system then wait until the data had moved, so we had to figure out how to resolve that particular problem.
As you can see, this is a very long scenario. It takes a while to do this scenario manually, it would take about 25 minutes with all the data conditioning and data preparation that I was doing, just to execute it. So it’s a long scenario, and it has a lot of these manual parts, or the data migration parts that need to be catered to, and there are a lot of validation points, and each of these screens, like creating a work order, has probably easily 30 to 35 different fields that need to be entered, which are mandatory and there are several other non-mandatory fields. So we didn’t know when we were actually going into the project, but one thing we realized is that this was way more complicated than we imagined it to be.
Audience Member: Did you deal with information breach aspects, like encrypted data into the cloud, that kind of thing? Was it a Tosca kind of tool for that?
Manish Mathuria: Yeah, so I think all the security aspects and encryption related aspects were ingrained in the systems that they had built. Our objective here was to do functional testing using Tosca. Our objective was to automate functional tests that they had developed, so the security aspects, encryption aspects, I’m sure they were very important to them, but they were all implemented as part of the implementation, we didn’t have to do much with it.
They gave us test cases, sort of like half-baked test cases, that we actually completed, and then we automated them. That was our deal: here are the test cases, make sure that the test cases work first, and then automate them, and then the other part was the CI thing that I talked about. That was our scope.
We did a parallel agile lifecycle while implementing this thing. We did a sprint zero, which we used to decide what was going to be the core architecture of this system, which is fundamentally: how do we deal with data, how do we deal with reused related problems, because the wrong thing would have been to actually record all of these test cases as if we were seeing it for the first time. There was a lot of reuse, a lot of components in each of these test cases that could be done in a modular way. And I’ll talk about how we did that in a moment. But in the sprint zero we did all of that, so by the time we were done with sprint zero in about a month, we were ready with what the test architecture was going to be, a proof of concept of how this thing all will come together, et cetera.
From sprint one onwards, right away we followed through creating those test cases, so like I said, the test cases were kind of half-baked to begin with, so we completed the test cases, got them approved through the business, and then our automation team came one sprint behind, and they automated the backlog that the subject matter expert, our test expert, was creating these test cases in.
So that’s the process we followed. Every 15 days we were delivering a big chunk of test cases automated to them, I mean developed then automated, and the test development and test automation was following like one sprint behind.
Here is the broad test architecture. As you can probably see here, we definitely leveraged reuse in terms of creating these test steps, and the purpose of these test steps was to- so first of all we created test modules, so in Tosca terminology these test modules are the model-based concept that actually allows us to capture the screen. So if I’m on the work order screen, for example, I would create a module for that, and I would reuse it across all my different test steps which are likely to use that module.
So we created these test modules, sort of in isolation. We considered that as an asset, and then we created these test steps which “create a work order” would be an example of. And we combine all these test modules and test steps in our test library, and the whole idea was that we would be able to spin off these test cases using these test steps, and the purpose there obviously is that if anything changes, say in the screen or in a particular step like creating a work order, the change is very local to that particular step. It is not that we are having to change all the different test cases.
That kind of brought in a lot of reuse, and all these test steps were parametrized to begin with so that we can push the line to it, and then use the test templates and the test case design approach. We basically created test templates and we brought all the test cases through the data, and executed that through CI, and generated this Dokusnapper document that we were able to kind of give to the customer and say that, “okay, look what this test case executes has, validate that, and sign it off.” So that was our process.
Audience Member: I have a question please. Was it very straightforward to be able to use that library, or did you have to rework that library in order to create new test cases?
Manish Mathuria: Yeah. So I’ll touch up on some of the issues that we ran into and how we addressed them. So obviously there were problems, challenges that we had to resolve with respect to process, of course there’s a whole lot of technical challenges that we addressed, and issues related to test data and environment. I’ll briefly touch up on a few of these, I’ll move a little bit faster because I realize I have a little bit less time.
The process-related problems were fundamentally around the test cases. There was a whole life cycle of test cases that need to be sanitized and approved from the business. So in a typical ideal organization, that by itself is a cumbersome and a bureaucratic process that we had to go through, nothing related to automation or Tosca. That just by itself was complicated, but the process that we followed for creating a backlog of test cases helped us a lot, because we were always able to keep the backlog full by doing that early on.
There were frequent system updates; that was expected and we knew that, and as we were actually automating the test cases, things were changing. Again, this reuse thing that I talked about where test modules and test libraries were all usable components, helped us quite a bit, because any time something changed, the change was very localized to one particular thing. But things did change quite a bit.
And capturing the snapshot of a test case executed was important because we didn’t want to create a situation where the test case ran with no problems and then eventually somebody came back to us and said that this thing doesn’t work. So we used Dokusnapper to capture that and give it to the customer and say this is a proof that we did it, you can approve it based on that.
So quickly, here are some of the areas where we used Tosca well, I probably won’t be able to go through all of it. One thing we really loved about it was it’s a one-stop test tool, so instead of having a test automation tool for ALM and another for APIs, and for conversion management or conflagration management you have a different tool … this is really a one-stop shop. So we could do everything all the way through from conflagration management, required management, test data management, test repository, API testing, service virtualization, everything we could use one tool for. And our team really appreciated that.
I think my second point covers that – we used all of these concepts, but of course the bulk of this was GUI testing. We also had a lot of API tests because all of these batch processes that I talked about were started through an API, and again we used Tosca to use that, I already talked about reuse and maintainability.
In terms of challenges, we had a lot of challenges where certain objects wouldn’t be recognized. To give you an example, we would have a calendar object done in one of these tools, ServiceMax or Salesforce, but no matter what we do, we can’t recognize it through Tosca. So we actually had to extend Tosca by writing a DLL, and I won’t go into too much detail, but we were able to basically enhance Tosca such that it was able to recognize that particular thing as one object, and that came in quite a bit handy.
We used test data and parametrization to the fullest. By the time we were done, we had in excess of 300 test cases, and we identified risk associated with each of these test cases, so when we created a regression suite through CI, we were able to shortlist a number of test cases that we could run as part of that particular execution.
I already talked about continuous integration and test run logs, so … I have a few screenshots about API testing. This just shows that we actually made all those requests to Boomi using the API tests in the system.
This is the object recognition test module thing, I think one of the really cool things that helped us was that as the system changes, and for example, a particular object is not recognized, the tool tells us that these are the six objects that are kind of out of sync. They call them unmapped controls, and you can rescan them quickly and build that repository really quickly before a test even breaks. So otherwise in a normal test process, a test breaks and then you need to investigate why did the test break, and you recognize that, oh, because this object repository has changed, that’s why I have to go and fix that. But here we could do that all proactively.
I already talked about the test library, so all of these things were actually created as a test that was reusable.
This is parametrization again, I think nothing fancy about it, except that each one of our test cases was parametrized to the extent that we could really consider all aspects of test data separate from the test cases.
Here is an example of that calendar control that we could not recognize. This was identified as a div before we did customization, and once we dropped our customization into Tosca, it was able to recognize that particular thing as one component.
Think I spoke quite a bit about data and integration challenges, fundamentally by extracting data from test cases we were able to meet both of these issues, and leveraging APIs.
So closing in, like I said there were two aspects of this effort. One was building this whole CI solution, so we did that using Jenkins, and using Tosca we automated about 320 scenarios. It was very typical for a scenario to execute for anywhere from 10 minutes to 20-25 minutes, on an average basis, this is about 19 minutes for one test execution. We were executing the whole regression officially two times a month, but unofficially it was run almost every time there was a system update.
We had about 35% reduction in just the human resources cost, not to mention the effort that it takes to do the repeated investigation and repeated preparation one has to do to execute these test cases.
So this is the overall outcome, the whole process was about less than six months. I think we have all these tests now running repeatedly, and right now they have one person who is doing continuous maintenance of these test assets.