Retrieval-augmented generation

Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information.^[1] With RAG, LLMs do not respond to user queries until they refer to a specified set of documents. These documents supplement information from the LLM's pre-existing training data.^[2] This allows LLMs to use domain-specific and/or updated information that is not available in the training data.^[2]^[3] For example, this helps LLM-based chatbots access internal company data or generate responses based on authoritative sources.

RAG improves large language models (LLMs) by incorporating information retrieval before generating responses.^[4] Unlike traditional LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources.^[1] According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations,^[4]^[5] which have caused chatbots to describe policies that don't exist, or recommend nonexistent legal cases to lawyers that are looking for citations to support their arguments.^[6]

RAG also reduces the need to retrain LLMs with new data, saving on computational and financial costs.^[1] Beyond efficiency gains, RAG also allows LLMs to include sources in their responses, so users can verify the cited sources. This provides greater transparency, as users can cross-check retrieved content to ensure accuracy and relevance.

The term RAG was first introduced in a 2020 research paper^[4] from Meta.^[7]^[3]

^ ^a ^b ^c "What is retrieval-augmented generation?". IBM. 22 August 2023. Retrieved 7 March 2025.
^ ^a ^b Cite error: The named reference :2 was invoked but never defined (see the help page).
^ ^a ^b Singhal, Rahul (Nov 30, 2023). "The Power Of RAG: How Retrieval-Augmented Generation Enhances Generative AI". Forbes.
^ ^a ^b ^c Kiela Douwe, Lewis Patrick, Perez Ethan, Piktus Aleksandra, Petroni Fabio, Karpukhin Vladimir, Goyal Naman, Küttler Heinrich, Lewis Mike, Yih Wen-Tau, Rocktäschel Tim, Riedel Sebastian (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. pp. 9459–9474. arXiv:2005.11401. ISBN 978-1-7138-2954-6.{{cite book}}: CS1 maint: multiple names: authors list (link)
^ Turow Jon, Kiela Douwe (March 26, 2025). "RAG Inventor Talks Agents, Grounded AI, and Enterprise Impact". Madrona.
^ "Can a technology called RAG keep AI models from making stuff up?". Ars Technica. 6 June 2024. Retrieved 7 March 2025.
^ "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks". ai.meta.com. 2020.

[:1-1] "What is retrieval-augmented generation?". IBM. 22 August 2023. Retrieved 7 March 2025.

[:2-2] Cite error: The named reference :2 was invoked but never defined (see the help page).

[:4-3] Singhal, Rahul (Nov 30, 2023). "The Power Of RAG: How Retrieval-Augmented Generation Enhances Generative AI". Forbes.

[:5-4] Kiela Douwe, Lewis Patrick, Perez Ethan, Piktus Aleksandra, Petroni Fabio, Karpukhin Vladimir, Goyal Naman, Küttler Heinrich, Lewis Mike, Yih Wen-Tau, Rocktäschel Tim, Riedel Sebastian (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. pp. 9459–9474. arXiv:2005.11401. ISBN 978-1-7138-2954-6.{{cite book}}: CS1 maint: multiple names: authors list (link)

[:7-5] Turow Jon, Kiela Douwe (March 26, 2025). "RAG Inventor Talks Agents, Grounded AI, and Enterprise Impact". Madrona.

[:0-6] "Can a technology called RAG keep AI models from making stuff up?". Ars Technica. 6 June 2024. Retrieved 7 March 2025.

[:6-7] "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks". ai.meta.com. 2020.

[1]

[2]

[3]

[4]

[5]

[6]

[7]