Data Management

Not Maintaining Data Quality Today Would Mean Garbage In, Disasters Out — Acceldata CEO and Founder

Written by: CDO Magazine Bureau

Updated 1:00 PM UTC, Tue October 1, 2024

(US & Canada) Rohit Choudhary, CEO and Founder of Acceldata, speaks with Mark Johnson, Chief Growth Officer at CoStrategix, in a video interview about his professional journey, data observability and its need, the relationship between AI and data observability, how data observability contributes to ensuring high-quality data for AI training and deployment, and model decay and how to prevent it.

Shedding light on his 17 years of professional journey, Choudhary splits his career into two parts. For the first half, he mentions working as an application engineer. In the second half of his career, Choudhary shifted to being a data engineer, working with startups that handled massive volumes of data, and that experience inspired him to start Acceldata.

When asked to define data observability, Choudhary says that it is not a new paradigm. To explain further, he shares a familiar analogy of using X-rays or MRIs when a bone is injured; an x-ray detects a fracture. As systems, IT deployments, applications, and data products have grown more complex, it has become much harder to diagnose issues without using deep probes to identify what is going wrong.

Adding on, Choudhary states that the concept of monitoring has always existed. However, with the increasing complexity, people now want to understand the state of a system without instrumenting their code or being explicit about what they are looking for. Rather, people expect to be automatically informed about a system’s status and alerted if additional attention is needed to ensure business objectives are met.

Delving further, Choudhary states that he finds the paradigm of observability interesting. Reflecting on the past, he recalls how large-scale IT application deployments happened 20 years ago. For example, he says that booking a flight required navigating an IT application, initially via a website and later through mobile apps, making user experience monitoring essential.

On the contrary, now, enterprises are increasingly data-driven and rely heavily on the collected data to make decisions, says Choudhary. Also, a decade ago, a single application stored all its data in a relational database for weekly reporting.

Today, data is scattered across various sources including relational databases, third-party data stores, cloud environments, on-premise systems, and hybrid models, says Choudhary. This shift has made data management much more complex, as all of these sources need to be harmonized in one place.

However, in the world of AI, both structured and unstructured data need to be of high quality. Choudhary states that not maintaining data quality in the AI age would lead to garbage in, disasters out.

Highlighting the relationship between AI and data observability in enterprise settings, he says that given the role of both structured and unstructured data in enterprises, data observability will become more critical.

Additionally, Choudhary says that structured data represents the quantitative side of the business, including factual information about how the world operates. However, AI also requires the unstructured business context, such as documents from wikis, emails, design documents, and business requirement documents (BRDs). He stresses that this unstructured data adds context to the factual information on which business models are built.

The paradigms between structured and unstructured data are very similar. As a result, data observability will become increasingly important, says Choudhary, driving significant adoption in the next three to five years.

Explaining how observability creates accelerators, he states that while manual observability may have worked five or ten years ago, it is not feasible today. In the current scenario, organizations deal with hundreds of models, thousands of reports, and sometimes tens of thousands of ETL pipelines.

To manage such a large, complex environment, more than manual oversight is required, says Choudhary. Therefore, he believes that observability will be at the core of how these intricate data and AI systems operate in the future.

Speaking of unique platform capabilities, Choudhary says that Acceldata’s biggest differentiator is the ability to synthesize signals across various areas such as compute infrastructure, data pipelines, costs, user behavior, and metadata. This capability enables the company to not only identify when something goes wrong but also provide the top 5 reasons behind it, and recommend corrective steps to users.

Also, Choudhary affirms that the company proactively highlights vulnerabilities that need attention before they cause issues. He adds that while data management has been a continuous challenge, Acceldata is providing the tools that allow customers to manage their data properly and efficiently, addressing a long-standing need.

Sharing a customer success story, Choudhary mentions deploying Acceldata’s solution at one of the biggest telecom companies in the U.S. He notes that in just 72 hours, Acceldata’s AI and machine learning-based observability solution identified that 35% of the data stored on their massive clusters wasn’t being used. The company then freed up that 35% of capacity, allowing them to reallocate the compute resources for generating new insights and running additional machine learning models.

Next, Choudhary emphasizes how data observability contributes to ensuring high-quality data for AI model training and deployment. He states that the traditional data platform is rapidly evolving into a combined data and AI platform, and the same teams managing data and analytics will oversee the AI side. In addition to the existing machine learning models, the new models will expand the scope of AI’s role in organizations.

Additionally, since these models will be used for continuous, real-time insight generation rather than static reports, they will need frequent updates to adapt to new architectures, says Choudhary.

Moreover, in the growing multimodal world, if organizations are not monitoring the quality of all the data fed into these models, it will become challenging to ensure high-quality insights. Ultimately, by validating the data at the source, organizations can ensure that the models produce accurate, high-quality outcomes.

When asked how Acceldata offsets against model decay over time, Choudhary first explains model decay with an example. He says that in the zip code 95129, a model was created with the assumption that an equal number of men and women would visit a store.

However, if the foot traffic in that area shifted to 30% men and 70% women, the original assumption becomes outdated. As the facts change, the model and its weights need to be updated accordingly, which hints at the need for a proactive approach to prevent model decay.

Through the example, Choudhary asserts that model decay and poor outcomes typically occur because there is a failure to recognize these shifts in the data.

Furthermore, time could also become a factor that influences the type of shoppers entering the store at different times. This suggests the need to monitor the data schema to account for changing facts, ensuring the model’s predictions remain accurate.

In conclusion, Choudhary states that another important aspect, especially in the generative AI world, is verifying whether inferences are accurate or if they are drifting more than before. He says that if the ground truths remain the same but the model’s predictions have changed, this needs to be addressed quickly, as something may have gone wrong in the training process.

CDO Magazine appreciates Rohit Choudhary for sharing his insights with our global community.