Embed Compliance and Ethics Into the Core Of AI Development — Head of Data Management and Governance

(US & Canada) Marla Dans, Head of Data Management and Governance at Chief Data Office, speaks with Martin Couturier, Director of AI Innovation at Saige, in a video interview about leveraging AI in data governance, maintaining ethics and regulatory compliance, the AI hallucination issue, the role of InfoSec in governance, balancing the usage of non-transparent and transparent AI algorithms, and monitoring the performance of black box AI models.

Dans begins by stating that data governance is more than just “data or AI police.” In fact, data governance can leverage AI to streamline its processes for people and data.

The many ways in which AI can help companies improve data governance programs include automated data cataloging, classification, and metadata generation, says Dans. She further notes that it is challenging for data stewards to write a definition from scratch; however, there is a higher rate of engagement if they are asked to validate an AI-generated definition.

Next, Dans shares that the data quality tools also utilize AI to profile data. AI can propose hundreds of data quality rules outside of one’s knowledge. She mentions the temperature, top K and top P sampling in GenAI used to return different contexts and how AI identifies data security and privacy anomalies as well.

Moving forward, Dans states that the data quality challenge lies not so much in identifying anomalies as in remediation. It boils down to finding the root cause in the data stack, finding the right person to fix the issue, identifying dependencies, and propagating the fix at the right time and location.

For smooth orchestration, some data quality tools have started leveraging AI and lineage for automated data cleansing. Some other AI usage areas include bias detection, mitigation, predictive analytics for future data quality issues and compliance risk, and prescriptive analytics for data governance metrics.

The need to maintain regulatory compliance while aiming for success is a challenge, says Dans.  The challenge calls for a comprehensive strategic approach that integrates governance, transparency, continuous monitoring, and stakeholder engagement.

Organizations must embed compliance and ethics into the core of AI development for responsible innovation and risk mitigation, says Dans. When a data governance program has this covered end-to-end, it also builds trust with regulators, customers, and the broader society.

To maintain regulatory compliance, organizations must comprehend what the regulatory requirements are and develop fairness standards, transparency, accountability, and privacy. Also, it is critical to create clear policies to support these and then sew ethics and compliance into the design of life cycle development.

The life cycle development includes data collection, processing models, training, deployment, monitoring, and then regularly auditing AI models for bias. It should also ensure that the training data is representative of the population and use interpretable models for transparency in model decision-making. This leads to the creation of AI models where data governance in algorithms comes into play.

When asked about the hallucination issue in AI models, Dans believes that hallucinations need to be rigorously managed. Data quality preparation, clear ethical and bias guidelines to guide the model development and validation, and monitoring and auditing of systems to continuously assess model performance, accuracy, and the incidence of hallucinations, are critical managing aspects.

Then, she says, there must be a governance body or committee to monitor, audit, and make gray-area decisions regarding responsible AI usage.

In continuation, Dans suggests information security teams work closely with data governance by identifying and classifying sensitive information. Specifically, in the context of AI models, they can help prevent model manipulation by bad actors.

The InfoSec teams enforce strict access controls using MFA, role-based access controls, data encryption with encryption algorithms, and anonymization in both privacy and commercially sensitive data contexts, says Dans. They also adopt secure coding practices, vulnerability assessments, penetration testing, simulating attacks on AI model systems, and adversarial testing as well.

InfoSec also ensures AI systems and data are available and operational when needed, even in the face of cyberattacks. This is how companies can protect AI models from manipulation by threat actors by implementing robust security measures.

Delving further, Dans discusses the approach of balancing the usage of non-transparent and transparent AI algorithms. Elaborating, she states that sometimes there is value in a black box model, and one must know how to take the black box algorithms and solutions and make them trustworthy.

Organizations must rely on clean data, algorithm transparency, validation, result monitoring, and robust data governance for all training data. Whereas, for black-box algorithms, one must understand the use case, stakeholder needs, and risk management with humans in the loop.

Furthermore, Dans suggests organizations validate prediction outcomes and take a cautious approach to prediction. Then, she mentions the explainable AI techniques that explain the model predictions by comparing them to smaller interpretable models and weighing the impact of features on the model.

In continuation, Dans says that there are hybrid models as well that are jointly used for both explainable and black box models. She recommends using a transparent model for critical decision points and then a black box model for complex but less critical aspects.

Organizations could also layer both models, wherein the initial predictions are done by the interpretable model, and then the black box could refine the predictions, Dans opines.

Thereafter, she discusses ways to detect issues while monitoring black box AI’s performance. Taking an example, she refers to ChatGPT as her explainable model, which is also a black box, and states that she relies on her brain instead of relying on the output as is.

The key metrics to monitor performance include assessing the area under the curve and the confusion matrix. In addition to that, organizations can create synthetic data sets to test the model's behavior under various conditions, identify biases and vulnerabilities, and simulate real-world scenarios across diverse and extreme situations.

In conclusion, Dans also mentions setting up monitoring and alert mechanisms for significant performance deviations. Organizations can also engage external auditors to provide an unbiased review of the model's performance and have a feedback loop for users.

CDO Magazine appreciates Marla Dans for sharing her insights with our global community.

Also Read
Humans Can No Longer Manage Data, They Need AI — Head of Data Management and Governance
Embed Compliance and Ethics Into the Core Of AI Development — Head of Data Management and Governance

Executive Interviews

No stories found.
CDO Magazine
www.cdomagazine.tech