HaloDoc is the most popular all-around healthcare application in Indonesia. A rapidly-growing startup founded in 2016, our mission is to simplify and bring quality healthcare across Indonesia. We partner with more than 3,500 pharmacies in over 100 cities to bring medicine to people’s doorsteps. Recently, we launched a premium appointment service that partners with more than 500 hospitals, allowing patients to book a doctor’s appointment inside our application.
A quick reading of this profile will give you a hint about the mission-critical nature of our service.
The platform is composed of several microservices hosted across hybrid infrastructure elements, mainly on a managed Kubernetes cloud, with an intricately designed communication framework. We also leverage AWS cloud services such as RDS, Lambda and S3, and consume a significant suite of open source tooling, especially from the Cloud Native Computing Foundation landscape, to support the core services.
As the architect and manager of site reliability engineering (SRE) at HaloDoc, ensuring smooth functioning of these services is my core responsibility. In this post, I’d like to provide a quick snapshot of why and how we use chaos engineering as one of the means to maintain resilience.
Check out the full blog post at NewStack here: