As a chaos solution engineer, you will also be empowered to explore and identify chaos engineering use cases in diverse areas such as service mesh environments, cloud-native security. The role also includes generating concise solution documentation around these chaos experiments, which can act as a ready-reckoner for SREs and DevOps engineers.
In this role, you'll get to:
- You will be responsible for the creation and maintenance of fault injection scenarios (crystallized into LitmusChaos experiments and workflows) for different popular cloud-native application workloads.
- This includes: databases (Percona MySQL, MongoDB, Datastax Cassandra, etc.,), message queues (Strimzi/Confluent Kafka) and storage providers (OpenEBS, Longhorn).
- This typically involves analysis of the various cloud-native use cases involving the aforementioned applications, their lifecycle management, points of failure, resilience checkpoints & steady-state hypothesis.
- The resulting experiments are expected to power the catalog available in hub.litmuschaos.io.
- The role also includes generating concise solution documentation around these chaos experiments, which can act as a ready-reckoner for SREs and DevOps engineers.
- The ability to present the findings from these experiments in meetups or conferences is a plus point and such activities are appreciated, though not mandatory.
- Alias with (specific) app communities pertaining to app categories in CNCF - identify, implement, publish, document and maintain chaos use cases.
We will expect you to have:
- B.E/B.Tech/MCA (anyone with the above skills).
- 5+ years of industry experience.
- Familiarity & usage experience of distributed systems.
- Knowledge of stateful applications in the CNCF landscape.
- Experience as an SRE or DevOps engineer actively involved in testing and maintaining deployment (staging/production) environments.
- Ability to code in Golang (preferred) or Python/Ansible.
- Competitive compensation
- Competitive benefits package, including medical insurance
- Remote work
- Flexible work timings
Team ChaosNative originally created the open source project LitmusChaos to drive the innovations around Cloud Native Chaos Engineering. Litmus is now a CNCF project with a large community of users and contributors. With a significant enterprise adoption of Litmus, ChaosNative provides commercial support to it’s worldwide customer base. ChaosNative develops solutions and other services in the area of chaos engineering and cloud native reliability.
Values and Culture
We are making every effort to make chaos experiments ubiquitous and resilient themselves. We empower ourselves to listen to the resilience needs of Enterprises across a variety of industries and push that feedback directly into the tools or products we build or services we deliver.