The Day 2 Keynote at Chaos Carnival started off at a great pace as host Abhishek Mishra introduced the CEO of Chaos Native and the maintainer of LitmusChaos, Mr. Uma Mukkara, to deliver the space on Reliability on Cloud Native system.
Uma starts off by addressing the question "What about Resiliency in Cloud Native then?" This is where Chaos Native comes into the picture, providing reliability to cloud-native applications.
He describes how ChaosNative contributes to resilience in operations through Chaos Engineering and cloud-native by starting off with containerization and CI/CD, scaling the microservices into GitOps which eventually results in faster application delivery in the upcoming time.
TRADITIONAL CHAOS ENGINEERING
Chaos Engineering is the means to make the system infrastructure immune to expensive downtime, outages, and system failures. It is suggested to keep the scope of the proactive chaos testing in the production state itself. He went on to say, "The recent use of Chaos Engineering is pretty standard, and is currently limited to experts and enthusiasts, for larger deployments and is done only if the system infrastructure has suffered some damage due to an outage." Hence, as of now, there are very few cases of self-driven or self-motivated chaos testing, but slowly and steadily, organizations are becoming more aware of its importance. So, how is chaos engineering performed typically?
Rarely integrated into CI or CD system
Only the SREs handle the process
Custom Measurement process to see any results on increase in reliability
Chaos Observability is a custom-built stack on each enterprise
Manual Planning and execution
He further mentions, "Chaos Engineering, in summary, is not a new field, it is a well-understood one, but it is not yet ready for the modern-day DevOps."
CLOUD-NATIVE AND CHAOS - CROSSING THE CHASM
Talking about Kubernetes as an example, Mukkara says, "Kubernetes has been very well adopted by various entities. It is understood to have crossed the chasm and is on the right side of the Mainstream Market."
"When it comes to Chaos Engineering, I believe it is still on the left side of the chasm in the early market stage. There have been a lot of innovations in the past eighteen months and it has also been announced as one of the top trending technologies this year by CNCF. I personally think Chaos Engineering will go on to cross chasm towards the mainstream market in the next couple of years." he observes.
Why should Chaos Engineering be different for Cloud Native?
More dynamism in Cloud Native because of containerization and application is broken down into multiple microservices, advancements in the CI/CD technology
DevOps has evolved and is on the right side of the chasm, infrastructure is declarative, developers are in control of creating and provisioning infrastructure
Five main principles of Cloud Native Chaos Engineering:
Open API and lifecycle management
Mukkara further adds in, "LitmusChaos too is focusing on integrating the five main principles, wherein first three have already been achieved till now, and the last two are work in progress."
LitmusChaos - Complete Toolset for Cloud-Native Chaos Engineering
Components of Litmus:
Litmus can be directly installed from the Litmus Portal, the experiments are pulled in from the Public ChaosHub, which are then included in the Private ChaosHub. The next step is to run Chaos workflows on any cloud/on-prem Kubernetes systems and on a non-Kubernetes framework. It can also be run on VMs/Baremetal.
To achieve scalability through LitmusChaos, LitmusChaos uses Argo workflows as its chaos injecting tool, but instead of displaying Argo workflow stages, it will display the Litmus experiments which allow the user to sequence the experiments and run them parallel to consolidate status and results, which enables GitOps for Chaos Engineering.
Uma goes on to additionally describe the list of 40 odd experiments, that the Litmus team has run till now
CHAOS INTERLEAVED OBSERVABILITY
"Observability advancement is a new thing, and probes are a way to define your steady-state hypothesis. We provide you with experiments and you can declare the steady state. " he adds.
LitmusChaos also provides chaos analytics which is related to workflows and experiments in chaos interleaved dashboards.
For Chaos in CI pipelines, GitLab remote template called the LitmusChaos CI Library is used for integrating with GitHub actions, Spinnaker is used for preconfigured job plug-in, and Keptn is used for Litmus Control Plane Plug-in.
CNCF PROJECTS AND LITMUS
Speaking of collaborative integrations, Uma says, "Litmus is in great integrations with other CNCF projects like OpenEBS, Cri-o, Keptn, Argo, Container D, and plans for better integrations with Crossplane, Flux, Vitness, and Open Policy Agent are on the roadmap
CLOSING OF THE KEYNOTE
Uma's candid takeaway as a Litmus user is, "As everything is open-sourced, the main plan is to continue using Litmus and to provide help with enterprise support, managed services, and best practices implementation.
"Chaos experiments will become a free commodity in the near future, Chaos will be predominantly practiced through GitOps, Chaos as a Service (CHaaS) offerings will emerge, and along with SREs, developers too will soon start practicing chaos testing." Mukkara observes.