Chaos Engineering is fun!


About the speaker

Eugenio Marzo

Eugenio Marzo

DevOps Engineer,


Eugenio works as DevOps Engineer for Sourcesense, a system integrator based in Italy. He loves coding, walking, and street food.

About the talk

Hi everyone I would like share with you Kubeinvaders, a gamiefied chaos engineering tool called for Kubernetes. It is like space invaders but aliens are pods or k8s nodes.

During the talk I will show you how you use the special features for doing chaos engineering with fun!


Chaos Engineering is Fun!

Eugenio Marzo delivers an insightful session on 'Chaos Engineering is Fun!' on day 1 of Chaos Carnival 2021. He will be explaining how to use some special features for doing chaos engineering and of course, with fun!

"I would love to share with you Kubeinvaders, a gamified chaos engineering tool for Kubernetes. It is like space invaders but aliens are pods or k8s nodes." he quotes.

Eugenio begins the session by defining Chaos Engineering as, "The discipline of experimenting on a system, to build confidence in the system's capability to withstand turbulent conditions in productions."

He goes on to give a simple example of killing all pods in the namespaces of Kubernetes in the open shift. "This example is composed of a few lines of bash and the result is that in our project, all pods are killed." In his second example, he explains, "Here, the pods are killed randomly, hence, deemed more complex than using a chaos engineering tool with a few lines of Bash."

"As in the previous two examples, you saw the process of conducting chaos can be a bit boring, I have made a new tool in the past year. In this new version of Kubeinvaders, we have Kubernetes worker nodes, this way, it's much better to do chaos engineering and also because the developers can practice it too, as opposed to just the DevOps Engineers," he adds.

Kubeinvaders is an easy interface game where the aliens are the pods or nodes and some other information that will be visible during the game.

So what exactly does Kubeinvaders do?

  1. Kills the pods.

  2. Performs chaos experiments against Kubernetes worker pods with customizable functionality which is very useful because you can prepare custom chaos engineering experiments to test how your cluster is resilient.

  3. Checks pods against a variety of best practices, with a focus on production, readiness, and security. A Stackrox tool that acts as an open-source Kubernetes linker is used to see if your pods are configured correctly or you need to correct it by a security site or a high availability site. For example, if you do not insert the readiness probe inside your development, the deployment is a problem. 

  4. Shows logs of pods.

  5. Exposes Prometheus metrics.

Use Cases:

  1. Test how many Kubernetes clusters are resilient against an unexpected pod deletion and random chaos experiments against worker nodes.

  2. Collect chaos metrics and stack them against others like service response time. The Kubeinvaders metrics are useful if you compare them to other metrics of other services. For example, we can see that the chaos metrics of Kubeinvaders are useful with black box exporter metrics because the service is not working as all the pods were killed. 

  3. Encourage developers to conduct chaos experiments even during the development phase.

Eugenio defines pods as," They are designed as relatively ephemeral and disposable entities." When a pod gets created either by you or directly by the controller, it is scheduled to run on a node in your cluster. The pod remains on that node unless the process is terminated, the pod object is deleted, the pod is evicted for lack of resources, or the node fails.

"In this period, some people use this project for fun and demos, so there is a community that uses this project, contributes, and makes some pull requests on our GitHub repository," he says.

Eugenio starts to explain his Kubeinvaders interface with the connection status to Kubernetes which is important. On the left side, you can find the URL for Kubernetes Cluster, the current namespace, and the number of running pods. With key A, you can start the automatic pilot that launches multiple instances of Kubeinvaders, which in turn will help you with aggressive chaos testing to your cluster.

One new feature that was added to the game is if the spaceship was moved over an alien, you will be able to see the logs of the pod and with Key R you can refresh the log which is useful for developers to make some experiments on that particular namespace. Key K checks pods against a variety of best practices. Key W would show you the worker nodes of your cluster and for larger clusters, you will see a subset of the entire number of nodes, and nodes can be killed by shooting at them. You can start the default chaos experiment against the node that launches the stress-energy process which causes a load average inside the node.

"My preferred feature is to use N, with which we can jump between the namespaces when you store Kubeinvaders as a subset of the namespace that you want to stress and keep the pod in a particular namespace while jumping." Eugenio quotes.


Sometimes, you have to tune the position of the aliens to make the chaos engineering tests better. The following are the parameters:


"To make sense of the tool, I have added a Custom Prometheus Exporter and this is the configuration for scraping this matrix which is very simple. So first, you can have a job name like KubeInvaders and these are some important targets. In this case, I will reach directly to the Kubernetes service to scrape this matrix. If you have many instances of KubeInvaders, in a different namespace, you can add more objects to the array of the targets." he explained.


  1. #Total number of chaos jobs executed per node


  1. #Total number of chaos jobs executed against all worker nodes


  1. #Total number of deleted pods


  1. #Total number of deleted pods per namespace



Eugenio concludes his talk by giving an overview of the roadmap for KubeInvaders, which will include features like more chaos experiments, the ability to rewrite the game entirely in JavaScript which will get rid of the current dependency on the game engine and can be completely open-sourced, and added Prometheus metrics.

Tulp: Integrating Artificial Intelligence and Chaos Engineering to Learn from the Incidents
Tulp: Integrating Artificial Intelligence and Chaos Engineering to Learn from the Incidents
Bring Chaos into your Development Environment
Bring Chaos into your Development Environment


by Experts

Checkout our videos from the latest conferences and events

Our Videos