Opinion

DSK publishes guidance on using AI systems with Retrieval Augmented Generation (RAG)

DSK publishes guidance on using AI systems with Retrieval Augmented Generation
On October 17 2025, the Data Protection Conference (DSK) published guidance on whether a data protection impact assessment of an AI system is affected if a large language model (LLM) is supplemented with a Retrieval Augmented Generation (RAG) system. 

The DSK concluded that RAG can positively influence accuracy, traceability and confidentiality, but a RAG does not alter the underlying LLM or remedy any unlawful training. As such, controllers must conduct data protection impact assessments on both the LLM and the RAG and always keep their technical and organisational measures up to date. 

The DSK highlighted that RAG can improve accuracy and reduce hallucinations by ensuring that answers are grounded in carefully curated, lawful and up-to-date reference materials. Errors in these reference materials can be corrected or deleted without retraining the model. Secondly, if the controller records which documents or snippets the AI system relied on to answer a question, a controller can show where the input context came from. This provides some traceability and transparency over which sources informed the LLM’s output when conducting a data protection assessment. RAG can also be used to comply with the principle of data minimisation as controllers can select which documents enter the vector database and can delete specific documents according to retention rules. This also enables the targeted deletion of personal data.

However, there are also important limitations to RAG. For instance, RAG does not alter the underlying LLM. If the LLM model has been trained illegally, regardless of the integration of a RAG system, the training would still be illegal. Moreover, a RAG itself does not make the LLM more transparent. Even if a controller knows what sources have been fed to the LLM, they would still be unable to see inside the LLM to understand exactly how it processed those sources into an output. 

Ultimately, whilst RAG can reduce some weaknesses of an LLM, significant challenges remain to ensure that an AI system is fully compliant with its data protection requirements. Controllers should document design choices, data flows, and safeguards for both the RAG subsystem and the underlying LLM. 

The press release is available here, and the guidance is available here (both only available in German).

Related capabilities

subscribe

Interested in this content?

Sign up to receive alerts from the A&O Shearman on data blog.