Kyle Kirwan, Co-founder and CEO of Bigeye, speaks with Derek Strauss, Gavroshe Chairman, and CDO Magazine Editorial Board Member, in a video interview about the persistent challenge of data lineage, data observability, and the need to have a dedicated function within organizations to tackle data lineage challenge.
Bigeye is a data observability platform that helps data engineering and science teams ensure their data is accurate and reliable.
Mastering data lineage has been a continuous challenge, says Kirwan. While building Bigeye, the initial focus was on monitoring aspects of data observability, which included gathering metrics from data sources to detect anomalies within data pipelines.
Further, it was assumed that the customers would have some lineage solution, which sometimes turned out to be true to a degree, says Kirwan. However, it was found that most organizations do not have a complete lineage map, although there could be limited lineage capability from using a data catalog or in a data warehouse form.
Consequently, a massive challenge arises in an enterprise situation where there are transactional databases such as Salesforce and Workday, a big data lake, and several data warehouses existing together. This creates a significantly complex pipeline, each piece of which belongs to different vendors.
Lineage becomes extremely challenging in such environments, says Kirwan. He mentions that some large regulated organizations, especially in financial services, invest in building custom in-house lineage solutions.
According to Kirwan, it takes up a huge part of the data platform budget and, therefore, is a critical problem for large organizations, especially regulated ones. Further, there are no commercially available solutions yet that can cover the entire pipeline, he asserts.
While some modern solutions advertise lineage, they could only be used for modern data sources, which fall apart when it comes to SaaS scripts or IBM Db2 databases.
When asked about his thoughts on data observability, Kirwan states that it is a recent development that has roots in data quality. He maintains that there are numerous frameworks and dimensions to measure data quality, but the burning question is whether the data can be trusted to be fit for a particular use case.
For instance, data used for a predictive model to forecast prices and net sales is different from data used for financial reporting. Even if the data source is the same, the requirements are different, says Kirwan.
The challenge, at the end of the day, was meeting those requirements under the data quality bucket. On the contrary, data observability is a wider definition of the problem, which demands understanding everything that goes on in the data pipeline and not just quality at the end of the pipeline.
Ultimately, observability tries to answer the question of whether the organization knows where the data comes from, where it goes, and what its state is at all times, which in turn ensures quality.
Furthermore, it also helps understand how data is being used, such as performance characteristics and cost characteristics. Therefore, it is a general take on understanding the inside of data architecture and enables asking questions.
Moving forward, Kirwan states that in the long run, organizations need to have a 360-degree view of data and have a dedicated group to tackle that challenge. From having pipelines in place to data contracts, from data lifecycle management to incident management, there should be a group that has the complete picture.
Kirwan believes that having that capability is not a technology thing but rather a discipline and a process problem. However, the level of maturity in managing the data infrastructure is generally not as advanced in most organizations as it is in traditional software or conventional infrastructure.
In conclusion, he states that practices like DevOps and site reliability engineering (SRE) are well-established and proven in those areas, as there is a mature ecosystem built around them. Kirwan affirms that in the future, organizations will have a model within data organizations that is focused on uptime, performance, and scalability, which mirrors what SRE and DevOps have done for traditional software.
CDO Magazine appreciates Kyle Kirwan for sharing his invaluable data insights with our global community.