ChaosCarnival Day 2 Keynote

ChaosWheel

About the speaker

Uma Mukkara

Uma Mukkara

Maintainer of LitmusChaos project, CEO at ChaosNative,

ChaosNative

Uma Mukkara is the CEO of ChaosNative and the maintainer of the LitmusChaos project. He is an entrepreneur and technology visionary who earlier cofounded CloudByte and MayaData. His team built LitmusChaos in the last 3 years to make Chaos Engineering easier for the modern DevOps ecosystem. His focus now is on building chaos solutions at scale for the enterprises and making chaos a first-hand tool for cloud-native developers. He has a masters degree from Illinois Institute of Technology, Chicago.

About the talk

A large chunk of challenges in the Kubernetes production environments is how to run stateful applications reliably. Each stateful application has many micro services and there is a mesh of such applications creating a tight dependency for running the stateful services successfully. Cloud Native Chaos Engineering is a modern approach to get a shot at predictable resilience in Kubernetes production at scale. Uma Mukkara, maintainer at LitmusChaos project discusses best practices to roll out Cloud Native Chaos Engineering and achieve desired SLOs for Kubernetes platform and applications.

Transcript

The Day 2 Keynote at Chaos Carnival started off at a great pace as host Abhishek Mishra introduced the CEO of Chaos Native and the maintainer of LitmusChaos, Mr. Uma Mukkara, to deliver the space on Reliability on Cloud Native system.

Uma starts off by addressing the question  "What about Resiliency in Cloud Native then?" This is where Chaos Native comes into the picture, providing reliability to cloud-native applications.

He describes how ChaosNative contributes to resilience in operations through Chaos Engineering and cloud-native by starting off with containerization and CI/CD, scaling the microservices into GitOps which eventually results in faster application delivery in the upcoming time.

TRADITIONAL CHAOS ENGINEERING

Chaos Engineering is the means to make the system infrastructure immune to expensive downtime, outages, and system failures. It is suggested to keep the scope of the proactive chaos testing in the production state itself. He went on to say, "The recent use of Chaos Engineering is pretty standard, and is currently limited to experts and enthusiasts, for larger deployments and is done  only if the system infrastructure has suffered some damage due to an outage." Hence, as of now, there are very few cases of self-driven or self-motivated chaos testing, but slowly and steadily, organizations are becoming more aware of its importance. So, how is chaos engineering performed typically?

  • Gamedays

  • Rarely integrated into CI or CD system

  • Only the SREs handle the process

  • Custom Measurement process to see any results on increase in reliability

  • Chaos Observability is a custom-built stack on each enterprise

  • Manual Planning and execution

He further mentions, "Chaos Engineering, in summary, is not a new field, it is a well-understood one, but it is not yet ready for the modern-day DevOps."

CLOUD-NATIVE AND CHAOS - CROSSING THE CHASM

Talking about Kubernetes as an example, Mukkara says, "Kubernetes has been very well adopted by various entities. It is understood to have crossed the chasm and is on the right side of the Mainstream Market."

"When it comes to Chaos Engineering, I believe it is still on the left side of the chasm in the early market stage. There have been a lot of innovations in the past eighteen months and it has also been announced as one of the top trending technologies this year by CNCF. I personally think Chaos Engineering will go on to cross chasm towards the mainstream market in the next couple of years." he observes.

Why should Chaos Engineering be different for Cloud Native?

  • More dynamism in Cloud Native because of containerization and application is broken down into multiple microservices, advancements in the CI/CD technology

  • DevOps has evolved and is on the right side of the chasm, infrastructure is declarative, developers are in control of creating and provisioning infrastructure

Five main principles of Cloud Native Chaos Engineering:

  1. Open Source

  2. Community Collaborated

  3. Open API and lifecycle management

  4. GitOps

  5. Open Observability

Mukkara further adds in, "LitmusChaos too is focusing on integrating  the five main principles, wherein first three have already been achieved till now, and the last two are work in progress."

LitmusChaos - Complete Toolset for Cloud-Native Chaos Engineering

Components of Litmus:

  1. Helm Chart

  2. Public ChaosHub

  3. Private ChaosHub

Litmus can be directly installed from the Litmus Portal, the experiments are pulled in from the Public ChaosHub, which are then included in the Private ChaosHub. The next step is to run Chaos workflows on any cloud/on-prem Kubernetes systems and on a non-Kubernetes framework. It can also be run on VMs/Baremetal.

To achieve scalability through LitmusChaos, LitmusChaos uses Argo workflows as its chaos injecting tool, but instead of displaying Argo workflow stages, it will display the  Litmus experiments which allow the user to sequence the experiments and run them parallel to consolidate status and results, which enables GitOps for Chaos Engineering.

Uma goes on to additionally describe the list of 40 odd experiments, that the Litmus team has run till now

  1. Pod Chaos

  2. Node Chaos

  3. Network Chaos

  4. Stress Chaos

  5. Cloud Services

  6. Applications Chaos

CHAOS INTERLEAVED OBSERVABILITY

"Observability advancement is a new thing, and probes are a way to define your steady-state hypothesis. We provide you with experiments and you can declare the steady state. " he adds.

LitmusChaos also provides chaos analytics which is related to workflows and experiments in chaos interleaved dashboards.

For Chaos in CI pipelines, GitLab remote template called the LitmusChaos CI Library is used for integrating with GitHub actions, Spinnaker is used for preconfigured job plug-in, and Keptn is used for Litmus Control Plane Plug-in.

CNCF PROJECTS AND LITMUS

Speaking of collaborative integrations, Uma says, "Litmus is in great integrations with other CNCF projects like OpenEBS, Cri-o, Keptn, Argo, Container D, and plans for better integrations with Crossplane, Flux, Vitness, and Open Policy Agent are on the roadmap

CLOSING OF THE KEYNOTE

Uma's candid takeaway as a Litmus user is, "As everything is open-sourced, the main plan is to continue using Litmus and to provide help with enterprise support, managed services, and best practices implementation.

"Chaos experiments will become a free commodity in the near future, Chaos will be predominantly practiced through GitOps, Chaos as a Service (CHaaS) offerings will emerge, and along with SREs, developers too will soon start practicing chaos testing." Mukkara observes.

ChaosCarnival Day 1 Keynote
ChaosCarnival Day 1 Keynote
A powerful framework for network chaos experiments
A powerful framework for network chaos experiments

Videos

by Experts

Checkout our videos from the latest conferences and events

Our Videos