Create Chaos in Databases


About the speaker

Vadim Tkachenko

Vadim Tkachenko

Co-Founder & CTO,


Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona's and third-party products. Vadim's expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. He also co-authored the book High-Performance MySQL: Optimization, Backups, and Replication 3rd Edition

About the talk

Database is a critical piece of the infrastructure. With applying some chaos we can improve resiliency and reliability of a system that handles our data. In this talk I will discuss how we evaluate resiliency and reliability of our Percona XtraDB Cluster Operator (MySQL) and Percona Server for MongoDB Operator. We run MySQL and MongoDB in Kubernetes and we are looking to proving automatic deployment and management of database nodes and we need to make sure our data and management layer is able to sustain different kind of failures. For this I will discuss what kind of failures we are looking for, how we apply them and how database handle them.


Create Chaos in Databases

For yet another insightful session on day 2 of Chaos Carnival 2021, we had with us Vadim Tkachenko, the CTO and Co-Founder of Percona, to speak about "Create Chaos in databases".


Percona has been a leader in developing open-source databases since 2006 and was founded on the belief that you should have access to scalable, secure, efficient, and resilient features from your database software, without vendor lock-in.

Percona helps in eliminating slow applications and systems, and reduces unnecessary database costs by supplementing your in-house teams, to ensure you have the enterprise to manage the ever-changing technology and tools your unique data tools demand.

Database infrastructures are more complex than ever. Percona builds open-source databases and software to help you reduce the complexity, costs, and efforts around managing your ever-growing database environment, be it on-premise or in the cloud. We help companies manage, deploy and optimize databases to meet the ever-expanding needs of the customers.

Percona is the single vendor for the world's most popular open-source databases on the most popular platforms. It has the best technology, with the best platforms which can run anywhere and anytime. Platforms such as MySQL, PostgreSQL, MariaDB, etc. run on Kubernetes.


In Kubernetes, Percona has two main offerings: XtraDB Cluster Operator and Server for MongoDB Operator.

  • Percona XtraDB Cluster Operator

It is an open-source product that consists of multiple MySQL servers which are connected through group communication. With this setup, we provide a high availability solution. We run on multiple nodes and when one node goes down, the other nodes will continue to work in the server application. A bigger goal is to guarantee a bigger consistency. In a context of fully transactional SQL workload, whichever current transaction you are on, we guarantee that the data will be consistent which is achieved by synchronous communication between multiple nodes. It requires a few critical components for this cluster to work properly. One cluster is storage because the database performance is dependent on storage performance. Stability is another critical component for achieving connectivity to provide communication and message delivery.


  1. You want to run your databases the same way you run your application's infrastructure. Databases and infrastructure as code. Many companies are using Kubernetes in large quantities.
  2. Kubernetes (and its many variants) is becoming the industry standard for managing these environments.
  3. The portability provided from containers and Kubernetes enables companies to avoid getting locked-in and assists in managing at scale.


  1. Percona Kubernetes Operator helps you run databases in strict primitives. It is designed for critical databases like external DB Cluster which is based on MySQL and MongoDB.
  2. We provide Phase 5 lifestyle management through Autopilot.
  3. Automated provisioning and configuration management.
  4. Seamless upgrades for minor and major versions.
  5. Fully automated backups and failure recovery capabilities allow you to see what is going on inside their databases.
  6. Integrated metrics, logging, and analysis
  7. Horizontal or vertical scaling and configuration tuning.
  8. Always use a Kubernetes operator.


  1. Fully automated deployments
  2. High availability with no single point of failure
  3. Fully-automated self-healing by automatically recovering from a failure of a singular database member
  4. Requires an odd number of nodes (three or more)
  5. Automatic handling of POD Failures, that is, Sustain failures of (n-1)/2 pods. For example, in the 3 nodes cluster, 1 pod can fail or in the 5 nodes cluster, 2 pods can fail, etc.
  6. Recovery from total network outages


"Yes, Kubernetes is pretty ready for databases. During this year, the network plug-ins for Kubernetes improved significantly. There is greater progress, storage, and snapshot capabilities in the operator's API. Even after all this, I would say, it depends on your environment and database usage. If you have a single and small-scale dedicated amount of high-end database servers, a dedicated DBA team, and you handle multiple applications and workloads, then my answer would be no." Vadim explains.

He continues, "However, if you have multiple database servers, 30-40+ database instances, created on-demand, is disposable for test and dev usage has no dedicated DBA team and relies a lot on automation with the operator, my answer would be yes."


  1. Multiple database instances with unpredictable application workloads where unexpected bugs and queries might occur
  2. Massive-scale deployments
  3. Network plug-ins software
  4. Storage plug-ins software
  5. The software comes with bugs

"Kubernetes makes testing easier by automating the database deployment. But when it comes to automating failure scenarios, I found two interesting tools, namely, Chaos Mesh with YAML file definitions and LitmusChaos, which we use for Percona XtraDB Cluster as a chaos testing tool."

In the demo, Vadim goes on to explain testing pod disruption to stimulate node crash and loss and network disruption for partial link loss and all node links lost.


  1. Easy to define a failure scenario.
  2. Failure testing automation
  3. Coverage for multiple failures


Vadim suggests that the user should test their chaos testing tool as chaos is not only for operator developers but also for SREs and QA engineers. During selection, it is very important to test how your application will handle different database failures, and with this Vadim concludes his talk.

Building on System Resilience with Chaos Engineering for Serverless Applications on AWS
Building on System Resilience with Chaos Engineering for Serverless Applications on AWS
Chaos Engineering in Telco Cloudnative Infra
Chaos Engineering in Telco Cloudnative Infra


by Experts

Checkout our videos from the latest conferences and events

Our Videos