Data Management: The Cornerstone of Successful AI

Dave Smith, head of GDPR Technology at SAS UK & Ireland, explains how data management underpins the success of AI systems.

AI is increasingly embedded in everyday life, yet misconceptions about the technology can fuel mistrust. Professor Jim Al‑Khalili, incoming president of the British Science Association, warns that a public backlash similar to the early GM debate could emerge unless AI is delivered with greater transparency and public engagement.

Beyond trust, unchecked autonomous models pose a control risk. The 2010 Flash Crash, when U.S. markets plunged 9% in 36 minutes, highlighted how algorithmic trading amplified volatility. Regulatory scrutiny now demands that models be monitored and their outputs auditable.

Using AI for Good

When applied responsibly, AI can transform healthcare—enhancing cancer detection through rapid image analysis—or support conservation by identifying wildlife footprints. Realising these benefits requires a framework that ensures fairness, accountability, transparency and explainability (FATE). In this article we focus on the transparency pillar, which is heavily influenced by how data is managed.

AI’s performance is bounded by the quality of its input data. Building an AI solution typically follows three key stages:

Data cleansing—removing irrelevant or erroneous records that could bias the model.
Transformation and enrichment—joining disparate sources and creating derived variables to feed the algorithm.
Deployment—applying the trained model to live data so that decisions can be made in real time.

Each stage can add value but also change the outcome. For example, eliminating outliers during cleansing may improve model stability for the majority of cases, yet it could also erase rare but critical signals. Dame Jocelyn Bell‑Burnell’s discovery of pulsars in the 1970s illustrates the danger of discarding outliers—she spotted a faint signal in one out of 100,000 data points, proving its scientific significance.

The Data Journey

Data quality is also vital for avoiding embarrassing errors. In 2014, Bank of America sent a credit‑card offer to an invalid name, “Lisa Is A Slut McXxxxxx”, after receiving corrupted data from Golden Key International Honour Society. Rigorous validation could have prevented the mistake.

Transformation moves data from highly normalised source systems into a single, flat table preferred by data scientists. This step often involves adding derived variables and must be implemented carefully to avoid introducing errors that could mislead the model.

Deployment is the final stage that directly affects business outcomes. Models have a finite useful life; delayed production can render them obsolete. Moreover, GDPR Article 22 forbids analytical profiling on personal data without strict conditions such as explicit consent. Controlled deployment provides an audit trail that shows which data fed the model and which analytics were applied, ensuring regulatory compliance.

In short, robust data management is the foundation that allows AI to deliver trustworthy, fair and effective results. Understanding every step of data processing upholds transparency—the core of responsible AI.

The author of this blog is David Smith, head of GDPR Technology, SAS UK & Ireland.

How Big Data and Building Analytics Are Revolutionizing HVAC Efficiency—Part 1 Harnessing IIoT, Industry 4.0, and Fork Truck Free for Superior Plant Safety – Part 2

Internet of Things Technology

Embedded

Sensor

Cloud Computing

Internet of Things Technology