Retrieval-augmented generation

Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information.[1] With RAG, LLMs do not respond to user queries until they refer to a specified set of documents. These documents supplement information from the LLM's pre-existing training data.[2] This allows LLMs to use domain-specific and/or updated information that is not available in the training data.[2][3] For example, this helps LLM-based chatbots access internal company data or generate responses based on authoritative sources.

RAG improves large language models (LLMs) by incorporating information retrieval before generating responses.[4] Unlike traditional LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources.[1] According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations,[4][5] which have caused chatbots to describe policies that don't exist, or recommend nonexistent legal cases to lawyers that are looking for citations to support their arguments.[6]

RAG also reduces the need to retrain LLMs with new data, saving on computational and financial costs.[1] Beyond efficiency gains, RAG also allows LLMs to include sources in their responses, so users can verify the cited sources. This provides greater transparency, as users can cross-check retrieved content to ensure accuracy and relevance.

The term RAG was first introduced in a 2020 research paper[4] from Meta.[7][3]

  1. ^ a b c "What is retrieval-augmented generation?". IBM. 22 August 2023. Retrieved 7 March 2025.
  2. ^ a b Cite error: The named reference :2 was invoked but never defined (see the help page).
  3. ^ a b Singhal, Rahul (Nov 30, 2023). "The Power Of RAG: How Retrieval-Augmented Generation Enhances Generative AI". Forbes.
  4. ^ a b c Kiela Douwe, Lewis Patrick, Perez Ethan, Piktus Aleksandra, Petroni Fabio, Karpukhin Vladimir, Goyal Naman, Küttler Heinrich, Lewis Mike, Yih Wen-Tau, Rocktäschel Tim, Riedel Sebastian (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. pp. 9459–9474. arXiv:2005.11401. ISBN 978-1-7138-2954-6.{{cite book}}: CS1 maint: multiple names: authors list (link)
  5. ^ Turow Jon, Kiela Douwe (March 26, 2025). "RAG Inventor Talks Agents, Grounded AI, and Enterprise Impact". Madrona.
  6. ^ "Can a technology called RAG keep AI models from making stuff up?". Ars Technica. 6 June 2024. Retrieved 7 March 2025.
  7. ^ "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks". ai.meta.com. 2020.

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search