The Wrong Way Everyone Builds It
Ask most AI vendors how they handle personally identifiable information and the answer arrives in one of three rehearsed forms. The model was trained not to leak it, a filter scans the responses before they reach the user, an instruction in the system prompt politely asks the model to avoid surfacing names and email addresses. Each of these three is security theatre dressed in vocabulary borrowed from a SOC 2 checklist, and the gap between the theatre and an actual security control is the entire reason CISOs are taking longer to approve enterprise AI than vendors expected.
The first answer is the worst. Training data policies are aspirational, not deterministic. A model trained on public data can reproduce anything it saw during training, and a fine-tuned enterprise model can surface fragments of any customer record it processed. 'The model was trained not to leak PII' is not a control. It's a hope.
The second answer is slightly better but structurally fragile. Response filtering means the model has already seen your customer data. It formulated a response using that data. The filter then tries to catch PII before it leaves. Every filter has a false negative rate. Every false negative is a breach. And if the model is prompted cleverly enough, it can be coaxed into representing personal data in forms the filter doesn't recognise.
The third answer, system prompt instructions, is the easiest to bypass. Prompt injection attacks have repeatedly demonstrated that telling a model to do something is not the same as preventing it. Any attacker with a text box can override a system prompt with a well-crafted user prompt.
The Right Way: Strip Before You Send
The only approach that actually works is to remove personal data before it ever reaches the model. Your customer names, email addresses, phone numbers, physical addresses, and account identifiers are replaced with neutral placeholders before any request is formulated. The AI layer receives anonymised data, reasons on it, returns a result, and the result is then recontextualised for the authorised user on the way back. This sits alongside structural tenant isolation, so even a successful prompt injection against the model has nothing meaningful to extract.
The AI never sees the real information, and the guarantee follows from that fact rather than from a policy. It cannot leak what it never processed, and there is correspondingly no training data to compromise, no filter to bypass, no system prompt to override on the way to a leak. The defence is structural at the data-handling layer, and the model is downstream of it.
- Personal data is removed before any AI request is constructed, not filtered out of responses after the fact
- The replacement is reversible for authorised users, so the experience is seamless and the real data surfaces in the UI exactly as expected
- The AI layer never processes, stores, or transmits actual personal information at any point in its reasoning loop
- This works for any model, any provider, any prompt, any attack vector that targets model behaviour
“If your vendor tells you they rely on the model to protect PII, walk away. The only acceptable answer is that personal data never reaches the model in the first place. Everything else is theatre.”
Why This Matters for GDPR and HIPAA
GDPR Article 25 requires data protection by design and by default, and the phrase has been quoted in more vendor decks than it has been implemented in production. It is not satisfied by adding encryption at rest and calling the work finished; it asks that every processing decision begin from a minimisation premise, so that any data which is not strictly required to accomplish the task is not processed in the first place.
PII redaction at the architectural layer is the strongest possible answer to that premise. The AI does not need the customer's actual name to reason about the deal; it needs to reason about a deal, and the name is irrelevant to the reasoning. Sending it to the model anyway is a processing decision that fails the necessity test, and GDPR is explicit about exactly that kind of failure.
HIPAA goes further. Protected health information cannot be disclosed to third parties without a business associate agreement, and even then, only for specific purposes. Most AI model providers are effectively third parties. Sending patient data to them, even under a signed agreement, creates a compliance exposure that regulated healthcare customers cannot accept. Architectural redaction means there is no disclosure to worry about. The model never receives the protected information.
For CISOs evaluating AI vendors in regulated industries, this is the question to ask first. Not 'do you encrypt data in transit'. Not 'are you SOC 2 certified'. The question is: when I ask your AI a question about a customer, does my customer's personal data leave my environment and enter yours? If the answer is yes, the rest of the conversation is irrelevant. If the answer is no, with a structural explanation of why, you have a vendor worth evaluating. We make the full architectural case in Compliance as Architecture, and the UK Information Commissioner's Office guidance on AI and data protection is explicit that data minimisation must be enforced before processing, not filtered after the fact. To see the redaction substrate running on your own stack, review our security model or get early access.


