This article is a compilation of my learning notes focused on how GenAI (Generative Artificial Intelligence) is related to the current industry practice of “data and analytics” for CDAOs.
I have kept these 3 questions in mind:
How do CDAOs perceive GenAI?
How to integrate GenAI into the current context of “data and analytics”?
What can CDAOs do to embrace GenAI as part of their strategic framework?
When we talk about AI, most people unconsciously or unintentionally use the “narrower definition” — they abbreviate “Generative AI” to just “AI.” In contrast, many data, analytics, and technology professionals use the academic or practitioners’ original and much broader definitions.
The CDAO community may need to quickly adjust its vocabulary to be more precise and relevant to the context of the conversations with the general population (users of data analytics solutions or technologies).
Artificial Intelligence (AI) is a system (including hardware, software, methodologies, and processes) that enables computers and machines to simulate human intelligence and problem-solving capabilities. (Source: IBM)
Generative AI refers to deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on. (Source: IBM) The graph below is from Dataiku to show the relationship among AR, ML, DL, NLP, and LLM.
Conceptual phase of Artificial Intelligence: 1950s
Machine Learning phase: 1960s to 1990s
Deep Learning phase: 2000s to 2020s
Generative AI phase: 2022 to date (also could be nested in the Deep Learning phase)
Diagram below.
Over the last couple of years, we have seen some “irrational exuberance” on AI hype and AI-washing by visionaries, technological trailblazers, and marketers. However, from the viewpoint of data and analytics practitioners as well as business professionals, the AI hype cycle has diluted the rich contents of such complex and complicated topics into a few sound bites and buzzwords.
The AI hype cycle has confused and led to unrealistic expectations among the targeted beneficiaries and users of AI, including both consumers and enterprises.
a) Short-term and long-term views: Roy Amara was an American scientist, futurist, and President of the Institute of the Future. He famously coined the following adage, which was to become Amara’s Law: “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.”
b) Tools evolution vs. a new digital species: Human history shows an evolution of leveraging tools: domesticated animals, simple to complex mechanical tools (including steam engines and electricity), and digital tools.
In the future, quantum computers or memcomputers/biocomputers can exponentially improve the capabilities to store data and perform computation for decision-making.
Others see GenAI leading to a new digital species as human companions, for example, at a recent TED talk by Mustafa Suleyman (source).
Two categories of intelligence: Digital and Biological Prof. Geoffrey Hinton has a lecture at Romanes Lecture (University of Oxford) titled "Will digital intelligence replace biological intelligence?"
Digital Intelligence is known for being logical, digital, accurate, precise, repeatable, and consistent. It is better on “immortality of computation”, “communication efficiency”, and “rapid improvement and learning”.
Biological intelligence is known for tolerance, fuzziness, ambiguity, aging, memory loss (or selective memory), emotion, biases, probability, directional or ballpark. It is better on “energy consumption efficiency” and “potential scalability”.
c) Treating GenAI as an early phase of still-evolving human brain: GenAI is built upon the foundation of Deep Learning with huge scale of Neural Networks, enabled by (1) available “big data” with volume, variety, and velocity, (2) significant improvement on computing capabilities (huge scale, much lower cost than a few decades ago, potentially leading to even lower cost like that of electricity to benefit the entire human population), and (3) breakthrough in research methodologies and algorithms development.
We can review and reflect on how neural net systems work, compared with the human brain.
We can compare how GenAI works in the context of how a human child learns, grows, develops, and matures over time.
GenAI's flaws such as hallucinations can be seen as the mistakes human toddlers frequently make when they are learning and their thinking and behaviors are tested, validated, and fine-tuned.
We are effectively building GenAI and future human-shaped AGI-powered robot as systems that are similar to how the human brain works: listening and seeing (also now or in the future: smelling, touching, tasting, etc.), learning the human language, reading, observing, interacting with humans and other systems, receiving feedback and incentives, improving, and repeating these processes for millions of times.
It is critical to verify input (to avoid content pollution before the value system is properly set up for human children) and validate output (teaching children what is right or wrong, and how to behave in the social context).
For business strategic and operational decision-making, to best utilize the available data, tools, and best practices, I see four complementary domains of knowledge and skill sets:
a) Technology
b) Data Management
c) Analytics
Descriptive analytics: what has happened? (e.g., static reports, interactive dashboards)
Diagnostic analytics: why did an event happen? (e.g., identifying drivers to an outcome, and their relative contribution to the outcome)
Predictive analytics: what will happen in the future? (e.g., how will customers respond to our marketing offer?)
Prescriptive analytics: what should we do now to optimize the future outcome? (e.g., how should we reroute our delivery truck drivers’ trips to reduce fuel cost and minimize traffic accidents?)
Cognitive analytics: help us make better decisions with the right information and reasons, and automate the decision-making process. (e.g., customer service chatbots with various language skills and product knowledge; autocomplete your sentences as you are typing; GenAI is the next generation and much more powerful cognitive analytics in the business decision context.)
d) Business decision-making and operations
Data and technology professionals are familiar with this framework: ACID in database management systems.
Atomicity: The entire transaction takes place at once or does not happen at all.
Consistency: The database must be consistent before and after a transaction.
Isolation: Multiple transactions are visible to each other and occur independently without interference among/with each other.
Durability: The data remains in a permanent state in the database after a successful transaction. If the system suffers from a failure, the data should remain intact so that it can be accessed.
At its core, the current GenAI handles language (documents, conversations) and audio/video content (as input and output); I have not seen any strong use cases to use GenAI to solve “data and analytics” problems at a complex level.
The data fed into LLMs (Large Language Models) are mostly texts from the Internet. The input or output is not yet organized into the ACID framework, even with the enhancement by Retrieval-Augmented Generation (RAG) or other techniques.
Retrieval-Augmented Generation (RAG) is a framework that improves Large Language Models (LLMs) by integrating external knowledge sources. RAG uses semantic similarity to retrieve relevant document chunks from an external knowledge base before generating a response. This process helps LLMs access real-time data, improve contextualization, and provide up-to-date responses.
The daily "bread-and-butter" work by CDAO teams does not overlap with what GenAI is particularly good at large-volume text content reading, organization, and summarization.
Large Language Models (LLM) are named as language models for a reason. It may not be compatible with the ACID data management framework as discussed above.
Gen AI tools excel in “autocompleting to finish our sentences” with contextual learning abilities. They have not yet proven to be capable of becoming fully autonomous and independent problem-solvers in highly structured data and analytics environments.
It is difficult for data and analytics professionals to translate business requirements into the technical language for database developers or data engineers to do ETL or ELT (extract, load, transform).
Will the next generation of GenAI be able to make business requirements gathering and ETL tasks easier and less time-consuming?
We need an AI system (or GenAI’s next generation) to:
(must do 1): Read messy raw data or half-cleaned / half-organized data,
(must do 2): Conduct the ETL or ELT, so that the data is properly structured, clean, up-to-date or real-time, and ready for wide-range business analytics use,
(must do 3): Provide relevant, accurate, curated, secure, and assessable data to the analysts and data scientists for specific use cases,
(optional 1): Generate descriptive analytics solutions,
(optional 2): Present diagnostic analytics summaries,
(optional 3): Predict various future events with useful probabilities,
(optional 4): Recommend various options and choose the best one to take action now to achieve optimal business outcomes (metrics).
(optional 5): Automate implementation of the best action plans listed above.
There are many articles on GenAI use cases, including this article in Harvard Business Review.
How people are really using GenAI by Marc Zao-Sanders, founder and CEO of Filtered.com. Looking through thousands of comments on sites such as Reddit and Quora, the author’s team found 6 top-level themes of what generative AI is being used for:
Technical Assistance and Troubleshooting – (23%)
Content Creation and Editing – (22%)
Personal and Professional Support – (17%)
Learning and Education – (15%)
Creativity and Recreation – (13%)
Research, Analysis, and Decision-Making – (10%)
The AI hype and AI-washing in the past two years have led to confusion and disorientation for all non-technical audiences, making the CDAO teams’ current work more difficult. At the same time, the unintended benefits of AI hype are forcing our stakeholders to "drink from the fire hose," to provoke the “why don't we” questions from CEOs and board members, and to allocate a certain amount of budget to experiment or innovate.
The negative and positive sides of AI-hype may present an opportunity for CDAOs to start thinking about how to integrate GenAI into their current workstreams.
GenAI is not a threat to CDAOs, but an opportunity.
We must get to know GenAI better.
We should like what GenAI has been doing and will do for the enterprise.
We can also help accelerate the maturity journey for GenAI to minimize risks, so users will trust (but verify) GenAI’s output, so GenAI or its next generation will deliver a solid return on investment.
CDAOs may allocate 70% of time and energy on current data and analytics practices, and 30% on exploring new ways (including GenAI) to continuously maximize the value we deliver.
An interesting way to think about GenAI in the “traditional data and analytics” work:
“Traditional Data and Analytics” work requires higher levels of precision with higher risk or reward.
GenAI use cases (refer to the list of use cases above) work best for directional and imprecise ad-hoc answers, centered around language and lower-stake tasks.
We see the following examples of dichotomy:
Consumer Banking: marketing vs. Credit risk management
Marketing can tolerate a wider range of probabilities and directional decisions – if we market to the wrong customers, they will just ignore our messages and will not cause tangible harm to the business.
Risk management decisions require a much higher level of confidence – one single bankruptcy account can wipe out the gross profit from 20-100 good accounts.
Healthcare: Non-clinical operational efficiency vs. Clinical decisions
Non-clinical decisions can better tolerate the lack of precision.
Clinical decisions need to carry much higher precision (even though it is still a probabilistic play).
GenAI can be used for less risky decisions, while the “traditional data and analytics” approach will still be the best practice for decisions with higher stakes. They are not two different paths that are exclusive to each other, but they are complementary to each other. CDAOs should embrace GenAI as an additional tool in the short term and stay flexible and adaptive in the long term.
We should move quickly but methodically to weave GenAI into the existing CDAO framework and roadmap.
CDAOs need to provide value-based services to internal and external customers.
What can be the bridge between “traditional data and analytics practices” and the new GenAI tools?
Potentially, the answer can be “a semantic layer.” A semantic layer is a business representation of data that allows end users to access it independently using common business terms. It sits between the data store and consumption tools for end users.
The layer provides a unified view of data across an organization by mapping data definitions from different sources into familiar business definitions.
The next generations of GenAI should go beyond language models. They should be able to digest data architectural design, data content, and business contexts, and perform process automation of the following workflow:
Raw data or operational data store
Data warehouse, data lake, or data lake house
Data management: semantic layer, data catalog, data quality, data profile, and/or lineage
Analytics solutions: descriptive, diagnostic, predictive, prescriptive, and cognitive
“Data Management” has been the most obvious logjam or bottleneck for CDAOs and their teams to generate the maximum business value.
I see a tremendous opportunity to use GenAI and/or its next-generation products to truly democratize data management and “lift the floodgate” for data-enabled insights.
Business leaders will be able to make strategic and operational decisions in real time with much higher confidence and higher accuracy.
AI (in the broadest sense), including GenAI and the “traditional data and analytics” practice, will enable CDAO's teams to be more effective and more efficient than today.
About the Author
"Mr. Ge” Gary Cao advises CEOs and board of directors on analytics and AI strategy and serves as a fractional Chief Data and Analytics Officer (CDAO) or Chief AI Officer (CAIO). With 20 years of experience as a CDAO and serial founder of internal analytics startups, Cao has had a strong track record at 8 companies with revenue between US$40 million and US$120 billion.
Cao’s journey spans industries including healthcare (provider and payor), distribution, retail and ecommerce, financial services, banking, marketing, and credit/insurance risk. He is an expert advisor at the International Institute for Analytics and Rev1 Ventures startup studio and has been a speaker or panelist on various events and podcasts.