EVENT-DRIVEN CHAOS INJECTION
They are not only passionate about development but also curious about breaking things with the practice of Chaos Engineering." This is how moderator Ajesh Baby chose to introduce Raj Das Babu and Soumya Ghosh Dastidar one day 1 of Chaos Carnival 2021, who delivered a talk on Event-Driven Chaos Injection.
Raj kicks off the session by defining Chaos Engineering as,"Chaos Engineering is how confident we are in making something in production and making it resilient in the future."
He goes on to explain some basics of working with Chaos.
SysAdmins, SREs, DevOps Engineers, etc. are the ones who practice.
Recent practitioners of Chaos testing are Google, Slack, GitHub, etc.
Some of the tools to perform Chaos injection are Chaos Monkey, LitmusChaos, and Gremlin.
INTRODUCTION TO EVENT-DRIVEN CHAOS INJECTION:
There are several possible ways of injecting chaos into a system or application, like manual injection; scheduled injection -chaos is scheduled for a particular instance; random injection -the idea of injecting chaos randomly at any point of time to check reliability; injection in the CI/CD pipeline has Chaos introduced into the system in the CI stage and event-driven injection. Both speakers go on to address event-driven chaos injection in-depth in their talk.
So, what exactly is Event-Driven Chaos Injection?
Addressing this very question, Raj explains, "Event-Driven Chaos Injection is a type of Chaos Injecting method where chaos is triggered based on a particular event. Events can vary from changes in configuration, changes in replicas, etc. A policy can be initiated, which will then manage the events and detect a fault on immediate change in configuration."
He continues, "Litmus web-based Portal is a part of the LitmusChaos project, which allows you to connect your remote Kubernetes Cluster and inject chaos in that cluster, and is currently used in the beta version."
The Litmus Portal is divided into two major clusters: Red and Blue. The red cluster consists of components from the Litmus Portal while the blue cluster components are responsible to connect themselves with the red one.
ARCHITECTURE OF EVENT-DRIVEN CE:
Components of Red Cluster (Litmus Portal):
Web UI: Web components that allow the external user to interact with the dashboards.
Auth Server: Server responsible for managing all the authentication-related activities.
MongoDB: Used for persisting the data.
GraphQL Server: the server used to manage the API requests and which also serves as a connection point between the red and blue clusters.
GitHub Repository: an external component used for GitOps purposes.
Components of Blue Cluster:
Subscriber: First component that interacts with the GraphQL server by accepting its requests.
Litmus Operator: Required for LitmusChaos to inject chaos.
Argo Server & Workflow Controller: Used for managing Argo workflows while creating chaos. Argo workflows have a set of experiments and a chaos engine defined which, after implementation, will inject chaos on a sequential basis.
Event Tracker: It tracks all the resource changes in the Kubernetes cluster namely deployment state presets and demonstration. The resources have been annotated in the field of GitOps in boolean value and in workflows, it is a string value. You have to enable GitOps in your system to be tracked by Event Tracker.
"For example, we have a web application, named App V1. We need to upgrade App V1 to App V2 (which is being termed as an event) which will be tracked by the event tracker. This request will be sent to the GraphQL server with a particular workflow ID. The workflow is then sent to the Subscriber and the workflow is applied. Now you can check the resiliency of the web application using a PodDelete experiment in every configuration change, hence, fewer chances of failure." Raj explains.