Ignite 2021: Azure’s Chaos Studio goes public


Back at the beginning of 2021, Azure CTO Mark Russinovich’s regular Ignite runthrough of the Azure architecture gave us a first look at Chaos Studio, the platform’s fault injection tool. Building on the chaos monkey concept introduced by Netflix, the growing discipline of chaos engineering is focused on helping developers understand what happens to cloud-scale applications when they fail.

Now, with 2021’s second Ignite opening its digital doors, Microsoft is unveiling the first public preview of Chaos Studio as part of its push to deliver better and more resilient cloud applications. I had the opportunity to talk to Mark Russinovich in advance of the preview’s launch about Azure’s approach to chaos engineering and how he sees developers taking advantage of these technologies.

Adding chaos to Azure

Chaos engineering in Azure isn’t new. As he says, “We’ve been doing chaos engineering and Azure since pretty close to the start. It’s been a lot of homegrown chaos.” But as the service has grown, what began as tooling unique to specific teams has had to become something that works for everyone building on and in Azure. He says, “Over the last few years, we’ve realized, ‘hey, we should consolidate these efforts in chaos engineering into a common tool, a common framework service that we can apply across our services.’ ”

That common tool was the basis for Chaos Studio, and although it began life as an internal tool, Russinovich points out that it was always intended to become customer-facing. What customers need might not be what Microsoft needs, but the lessons they learn could help make Azure better for all its users, inside and outside Redmond. “We think, besides customers having the benefits of a service that’s operating for them, we can grow an ecosystem to have on top of this with customers. The extensibility they bring produces fault injections that we can then leverage across the ecosystem and even internally,” he says. 

azure chaos studio overview IDG

Introducing Chaos Studio

Chaos Studio is a tool that lets developers and testers script fault injections into running systems, starting with failing virtual machines and then offering more detailed, lower-level faults, including CPU and memory stress. Faults are either agent-based, which require a Chaos Studio agent as part of a VM build (both for Windows and Linux), or service-direct. Once the agent and any prerequisites are installed, you can use Chaos Studio to choose the type of test to run and how to run it. For example, if you’re stress testing the CPU, you first define how long you want to add CPU pressure and how much pressure you want to add. 

azure chaos studio experiment designer detail IDG

When you’re running a stress test like this, you’ll need tools like Azure Monitor alongside Chaos Studio to give you visibility on what’s happening to your systems. The same is true for service-direct faults. These are used to affect Azure resources, like Cosmos DB, once you’ve linked a service to your Chaos Studio instance. Here you can set up a test to see how your application responds to, say, a cross-region failover of a key service.

Copyright © 2021 IDG Communications, Inc.



Source link