Resilience and Reliability for SODA with Chaos

ChaosWheel

About the speakers

Sanil Kumar

Sanil Kumar

TOC, AWG Lead & Chief Architect,

Huawei

Sanil has over 20 years, of Industry experience in Linux, Open Source, ARM Ecosystem, Cloud and Emerging Technologies(like Edge Computing, Blockchain, Distributed Computing). He holds multiple patents, published papers & presented session and keynotes in international conferences, was Linux Foundation Yocta Project Advisory Board member, collaborates with CCICI, blockchain groups, Organizer of CNCF and SODA foundation meetup groups. He is an awardee of Zinnov Senior Technical Role Model in 2020.

Kiran Mova

Kiran Mova

TOC & AWG,

MayaData

Kiran Mova is Passionate Technologist with 19 years of extensive experience working for product companies like Cisco, Lucent, Novell. Kiran has led efforts in performance engineering, simplifying the products in terms of usability, operation, and deployment complexities. Introducing multi-tenancy and enabling SaaS services in domains like IP/Optical Networks, Storage and Access Management.

Ashit Kumar

Ashit Kumar

Lead Architect,

SODA Foundation

Ashit Kumar is Senior System Architect at Huawei with over 13 years of experience. He has done his Post Graduation from Computer Science Department, Pune University (PUCSD). He currently works on Linux Foundation project, OpenSDS. Before joining Huaewi, Ashit worked with Veritas Technologies, where has worked on developing products in multiple domains such as SRM, DRaaS and Replication.

About the talk

Introduction to SODA and How Chaos Engineering can support Resilience and Reliability for SODA Projects.

Transcript

Resilience and Reliability for SODA with Chaos

For yet another insightful session on day 2 of Chaos Carnival 2021, we had with us Ashit Kumar, Kiran Mova, and Sanil Kumar from SODA Foundation, to speak about "Resilience and Reliability for SODA with Chaos".

Sanil kickstarts the session by saying, "Any cloud-native technologies you see today are completely data-driven, and anything related to data becomes critical. SODA helps you in providing a data framework and providing unified solutions."

Data Explosion is the new normal. Every month there is significant growth and new predictions in the data explosion, as per Google. The generation of new data is catching up at an easier and faster pace. Whereas speaking of value, it is slow and tough to derive and define it, and the overall value of data is very low. At last, the data is connected with efficiency.

The scale is unpredictable, for example, we start with a few sets of infrastructure. And then when the services start growing, we need different types of infrastructure such as edge, cloud and enterprise

The solutions to these hindrances are many, which is a weaker problem. Data & AI Landscape of 2020 is a platform that provides technologies and softwares with various use-cases. Some of them are open-sourced whereas some are commercial for various purposes to address our high demand for specific features. That's where SODA comes into the picture, providing streamlined solutions and data frameworks as per consumer needs.

CONNECTING DATA: SODA ONE DATA SCORE

There are many software entities for your workloads, for example, Kubernetes, VMWare, etc. Data has to be connected to your storage, which can be on-prem, open-sourced, cloud, edge, etc. This particular data has to be stored and processed, as we will be connecting a platform to the storage. Here is where SODA comes to the rescue by providing a unified interface across the cloud, and a seamless southbound data connectivity. In simple words, you can connect any platform to any storage at any given time or instance through SODA's open data framework.

ONE DATASTORE: CHALLENGES

Sanil explains some of the challenges he faces implementing one data store.

  1. Unity across vendors, platforms, and domains:

Unifying the vendors across these specifications is not easy as each software application is serving varied purposes.

  1. Unsettled types of storage:

Again different types of storage, such as a file, data, block, research, etc. are not streamlined for all the legacy solutions. Connectivity to not only the storage but also the edge cloud core is an issue.

  1. Unprecedented data mobility and scale

  2. Distributed heterogeneous

There are so many use cases in terms of cloud solutions, such as Kubernetes, Connected Car, Data Lifecycle, etc. SODA has been able to design a common use case that can be used in edge, cloud, and even core. The end-user use case provides edge data center, data mobility, and enterprise data center with multi-cloud storage.

SODA OPEN DATA FRAMEWORK

SODA ODF consists of multiple projects and provides a complete data framework. The Unified Data Management API holds the Global Controller (Metadata, Policy, etc.) and the data types like file, block, objects, etc. The Storage Backend Manager provides storage in the form of edge, core, and cloud. From the North Bound Plugin, we can connect to the unified API for File and Block. SODA provides you the storage from the object file to the block. From Delfin, we collect the heterogeneous and Metadata data. In multicloud, we consider the object APIs and provide a complete data framework solution from the object's side. From there, you can manage the project's lifecycle, policies, etc., and other parameters.

SODA PROJECT LANDSCAPE

Ashit here explains different projects that they use for reliability and resiliency at SODA Foundation and they fall predominantly under these categories. The help in building holistic solutions that can be used in varied scenarios and environments.

  1. ODF Projects

  2. SODA Incubator Projects

  3. Member Projects

  4. Ecosystem Projects

SODA DEPLOYMENT VIEW

Now that we have multiple projects, they run on multiple services. SODA Installer helps in the seamless installation of different projects with just one click and different projects can integrate into the installer framework. Microservices-based projects run on multiple services which demand multiple-project monitoring. Unified API interface should not have a single point of failure, hence we need to make it resilient and available. Upon introducing a new stack between north-bound platforms and south-bound storage, we need to ensure that overhead and performance are not being compromised.

COMPLEX QUALIFICATION

To track certain parameters like overall quality, performance benchmarking, reliability testing, high availability testing, and resilience across projects, we had to go through some challenges such as multiple scenarios and manual tests, fault injection, resource control, need for a large cluster of real hardware, and hard to stimulate test scenarios.

Apart from these, we have identified a few more challenges and by solving them we can majorly achieve reliability and resilience like a single point of failure in Unified Interface, consistency and availability crisis in Global Metadata, and huge downtime cost in data framework for multiple services.

The whole process starts by identifying the steady-state conditions, and then the fault is introduced in the system. If the system's steady-state conditions are regained, then your system is resilient, and if it fails to regain, weakness in the system will be found. The goal here is to run random and not controlled chaos. We usually try to have our experiments in all stages;, development CI pipelines, long-running tests, staging, pre-production, and production.

In SODA, we have our predefined steady states as data consistency, data availability that successfully reads and contributes wires with data integrity, latency, and responsiveness, services like control plane or metadata management uptime.
Kiran says, "SODA exploring Chaos Projects through CNCF and finding Chaos Mesh, LitmusChaos, TiKV, and OpenEBS has helped in making our use-cases much more resilient and reliable." and then he concludes his talk.

Sign Up

for our Newsletter

Get tips, best practices and updates on Chaos engineering in cloud native.

Videos

by Experts

Checkout our videos from the latest conferences and events

Our Videos

Related Blogs

Cloud Native Reliability

Read

May 01, 2021

6 Min Read

Uma's Blog