Building a Data Fabric Architecture to Deliver on Your Data Strategy

Building a Data Fabric Architecture to Deliver on Your Data Strategy
Published on

Inderpal Bhandari

Inderpal Bhandari, Global Chief Data Officer | IBM

(US and Canada) For businesses to compete at the highest level, they need to make the most of their enterprise data: honing raw information into insights that can supercharge efficiency, reduce risk and increase revenue growth. But to truly maximize the transformative power of data, it must be easy to access and explore—not just by data scientists, but by any employee looking to make intelligent, informed decisions. 

Despite being a top priority, many organizations nevertheless struggle to get their data in order, especially when dealing with complex scenarios like hybrid multicloud environments. This is a problem that worsens as the volume of enterprise data continues to expand—according to estimates, by a rate of 42.2 percent annually—and the number of sources and repositories for data grows. According to a study by BetterCloud, many large organizations’ data winds up in various siloes and complicating things further, large enterprises today use an average of 150 different SaaS applications, each with a separate database. 

In my last article, I shared the importance of establishing a data strategy and the six-step frameworkwe recommend to design one that’s tailored for your specific business. To make your data strategy a reality, chief data officers need to build a data fabric architecture that eliminates technological complexities, all while managing the strict demands of compliance, security, governance risks and regulations. At IBM, I’ve seen the value first-hand that a data fabric architecture provides when it comes to simplifying data access.  A data fabric can, for example, leverage AI to continuously learn patterns in how data is transformed to automate data pipelines, which makes finding data easier and automatically enforces governance and compliance. It significantly improves productivity, accelerates time-to-value for a business, and simplifies compliance reporting.

What is a Data Fabric?

Put simply, a data fabric is a new way for organizations to simplify not only access to their data, but also how that data is preserved. It helps enable integration of all data pipelines and cloud environments.

Your data may be housed in various places—data warehouses, databases and data lakes—and stored with alignment to specific aspects of your business; HR data in one place, customer service in another, and accounting in yet another. A data fabric relies on data virtualization to harness metadata, enabling multiple sources of data to be easily centralized without replacing existing technology. It allows data to be ingested, integrated and shared across an enterprise in a governed manner, regardless of location—whether it’s on premises or in multiple public cloud environments.

What a Data Fabric Does Differently

Traditionally, consolidating all of this data would require tremendous amounts of file movement or copying. A data fabric, however, uses a virtualization layer to aggregate access to all data sources, so you can use it while minimizing the movement. Moreover, it includes data integration tools that can help clean and move data into a central repository, if needed—for example, when latency demands require more rapid access.

A data fabric also makes it easier to manage the lifecycle of data. This includes enforcement of access: ensuring the right people have access to the right data, and nothing more. A data fabric automates the process of enforcing privacy and regulatory policies, so sensitive aspects of datasets are automatically redacted for certain users. As a Chief Data Officer, I see firsthand the importance of automated governance and privacy, as well as the challenges that our constantly evolving regulatory environment presents. That’s why the metadata-based automation that a data fabric enables is so powerful. It helps you stay abreast of standards like GDPR, CCPA, or industry-specific regulations like HIPAA and FCRA.

How a Data Fabric Is Used

After the data has been connected and governance policies have been applied, a data fabric makes data available to users by making it easily accessible in a catalog. It also provides a foundation for building and running more trustworthy AI and machine learning (ML) applications, as its data integration and automated processing provide more seamless access to the information these tools need.

Within a data fabric, data virtualization tools connect to different data sources, integrate only the metadata required and create a virtual data layer, allowing users to use the source data in real time. There are countless use cases demonstrating how this is beneficial. For example, a university used multicloud data integration through a data fabric to consolidate, organize and analyze its data and build predictive models. A health insurance company built a model from insurance claims data and used a data fabric to integrate and deploy insights into its existing clinical services application to identify patients at high risk for a deadly disease. In both cases, the institutions have been able to act swiftly on robust sets of data while ensuring sensitive aspects, like student and patient information, are respected.

When I took on the role as IBM’s Global Chief Data Officer in 2016, I saw an opportunity for data to play a much bigger role throughout our entire organization. While preparing for this data transformation, we also had to prepare for GDPR going into effect in 2018. With the complexity we were facing, a data fabric architecture helped us build an augmented knowledge graph detailing where our data is, where it lies, what it’s about and who has access to it. This gave us the solid data foundation we needed to start building automation and AI into our major processes, from supply chain to procurement to quote-to-cash.

Data Fabric vs. Data Mesh: Is It Really Either-Or?

One debate I often hear when I’m talking to other data leaders is about the merits of a data mesh versus a data fabric and if an organization should be focused on one or the other. While the two are independent concepts, they both share the same goal of easy access to data. What surprises many people is that under the right circumstances, it’s not an either-or decision, they can in fact be complementary approaches.

Whereas a data fabric is primarily an architecture driven by metadata-based automation, a data mesh separates large enterprise databases into subsystems that can be managed by various teams. A data mesh aligns data sources by business domains or functions with data owners. In practice, this means that subject matter experts with better knowledge of particular business domains can create data products on their own, without having to rely on data engineers to clean and integrate data products downstream.

That’s why a data fabric sets the stage for the implementation of a data mesh, which then gives data owners more capabilities. These include cataloging data assets and transforming assets into products; publishing data products to the catalog, searching and finding data products, and querying or visualizing data products; and using insights from metadata to automate tasks.

It's never been a more exciting, or challenging, time to be a data leader. Not only is there more opportunity for data to be put to work throughout an organization. We’re faced with relentless data growth, evolving regulatory environments, and we all bear the responsibility to help ensure the AI we’re building is trustworthy. Organizations that build a data strategy around a data fabric architecture will empower their data leaders to act as change agents. At IBM we’ve learned that embarking on a journey to reach the data fabric vision enables our leaders to face these challenges head-on and helps our organization use data as a true differentiator.

About the Author

Inderpal Bhandari is the Global Chief Data Officer at IBM. He has leveraged his extensive experience to lead the company’s data strategy to ensure that IBM remains the number one AI and hybrid cloud provider for the enterprise. Under his leadership, the Cognitive Enterprise Blueprint — a roadmap for IBM’s clients on their transformation journeys — was created.

Bhandari is an expert in transforming data into business value and improved customer experiences by delivering strategic, innovative capabilities that use analytic insights to enable growth and productivity. In 2017, he was named U.S. Chief Data Officer of the Year by the CDO Club, and he has been featured as an industry expert by The Wall Street Journal, The Washington Post, US News & World Report, CNN, and FOX.

Bhandari earned his Master of Science degree in electrical and computer engineering from the University of Massachusetts and holds a Ph.D. in electrical and computer engineering from Carnegie Mellon University.

Related Stories

No stories found.
CDO Magazine
www.cdomagazine.tech