A powerful framework for network chaos experiments

ChaosWheel

About the speaker

Andreas Krivas

Andreas Krivas

Engineering Manager,

ContainerSolutions

Andreas is addicted to solving complex problems and dedicated to helping the world gain confidence in today's rapidly shifting technological landscape. After trying electrical, mechanical, and software engineering, he started as a Cloud Native Engineer, and now, from the position of Engineering Manager at Container Solutions, Andreas helps clients improve their business by creating tailor-made solutions that enable them to leverage the cloud.

About the talk

Some say that large microservices architectures have an impact on performance as there is higher network complexity. You take the blue pill, the story ends. You take the red pill, you stay in Chaos Wonderland and I show you how to uncover hidden issues via fine-grained network chaos. Follow me. In a microservices architecture, the services tend to become simpler but the overall system complexity is increasing. This could impact performance, security, stability and generally create more operational overhead. In this talk, we will dive into a solution that can execute declarative and fine-grained network chaos experiments which test the resilience of one or more microservices. We will show how we can use open-source tools in combination with core Linux utilities like Traffic Control to create a very powerful framework for Kubernetes that can affect both internal and external to the cluster workloads.

Transcript

A powerful framework for network chaos experiments

For an enlightening session on day 2 of Chaos Carnival 2021, we had Andreas Krivas, Engineering Manager from Container Solutions, speak about "A powerful framework for network chaos experiments".

Andreas begins the high-level objective of the technical demo by explaining the architecture of the target application, a demo e-commerce website developed using a microservices architecture called Hipster Shop.

As a part of the demonstration, a network chaos experiment will be performed on the Hipster Shop, where a two seconds latency will be injected between the microservices Frontend and CheckoutService for one minute using the pod-network-latency experiment for LitmusChaos.

The Frontend microservice refers to the user-facing front-end web application that can be used to access the application's different features, such as browse the products and purchase the products available in the catalog. CheckoutService is the microservice responsible for the facilitation of payments for the products that are to be bought.

The demo setup includes the deployment of Hipster Shop in the GKE cluster running on GCP. LitmusChaos is also installed on the same cluster. Andreas then proceeds to give a brief understanding of the ChaosEngine manifest for the pod-network-latency experiment. Here, he explains the experiment details such as the appinfo, in which the target application's namespace, label, and resource type is to be specified, which in this case are hipster-shop, app=frontend, and deployment respectively.

He also explains the experiment tunables such as the DESTINATION_HOSTS tunable which helps to target only the specific hosts which in this case is just the CheckoutService. The NETWORK_LATENCY tunable for specifying the network latency duration, which is 2000ms i.e. 2s. In this case, the TOTAL_CHAOS_DURATION specifies the duration through which the chaos shall take place which in this case is 60s i.e. 1 minute.

Andreas then heads over to the terminal where application microservices can be observed to be in running status in their respective Kubernetes pods. Once the pods are verified, the ChaosEngine manifest is then applied. As soon as the manifest is applied, the experiment pod is created and the experiment begins to execute its various steps, which can be traced through experiment logs. While the chaos injection is in progress, Andreas heads over to the website and attempts to check out a product, and it is then observed that the checkout took place with a network latency of 2.11 seconds, which is slightly more than 2 seconds, because of the chaos injection.

It was also observed that the other microservices were unaffected and were working properly, without any significant network latency. Once the chaos duration was over, the checkout service was accessed once more and this time there was no significant network latency.

Andreas then proceeds to make a few changes to the experiment ChaosEngine manifest, such as commenting the DESTINATION_HOSTS tunable which effectively adds a network latency to every network request made from the Frontend microservice. Once the manifest is applied and the chaos injection is in progress, the website is found to be not reachable as the Frontend microservice pod has gone to a Not Ready state.

As soon as the chaos duration was over, the Frontend pod went back to the ready state and the website was reachable once again.

ChaosCarnival Day 2 Keynote
ChaosCarnival Day 2 Keynote
Building on System Resilience with Chaos Engineering for Serverless Applications on AWS
Building on System Resilience with Chaos Engineering for Serverless Applications on AWS

Videos

by Experts

Checkout our videos from the latest conferences and events

Our Videos