The rise of chaos engineering is attributed to Netflix’s move to an AWS cloud-based infrastructure in 2010. To protect the experience of their customers, Netflix engineers began conducting chaos experiments to ensure they could continue to deliver quality streaming services even if experiencing downtime from Amazon servers.
Chaos engineering is designed to answer specific questions about the resiliency and functionality of systems, such as:
- What happens when a system has too much traffic or when it’s not available to users?
- What types of cascading errors occur when a single point of failure crashes an application?
- What happens when there are problems with networking?
- What happens when a specific service can’t be accessed or when specific applications go down?
As a result of chaos testing, IT teams can see how systems respond to a variety of pressures in real time. It reveals bugs and weaknesses that other testing methodologies cannot. Chaos experiments also better prepare IT teams to deal with real-world failures, reducing response times when problems occur in production environments.