Challenges with Naive RAG

Bad Retrieval

  • Low precision: Not all retrieved chunks are relevant
    • Hallucination + Lost-in-the-middle problems
  • Low recall: Not all relevant chunks are retrieved
    • Lacks enough context for LLM to synthesise an answer
  • Outdated information: The data is redundant or out of date

Bad Response Generation

  • Hallucination: Makes up an answer not in the context
  • Irrelevance: Does not answer the question
  • Toxicity / Bias: Harmful / offensive answer

Evaluation

We need a way to measure performance to improve the performance.

rag-evaluation-diagram

Retrieval

Evaluate the quality of retrieved chunks given a user query.

  1. Create a dataset
    • Input: query
    • Output: Ground-truth documents relevant to the query
  2. Run retriever over dataset
  3. Measure ranking metrics
    • Success rate / Hit-rate
    • MRR
    • NDCG

End-to-End (E2E)

Evaluate the final generate response given a user query

  1. Create a dataset
    • Input: query
    • (Optional) Output: Ground-truth answer
  2. Run full RAG pipeline
  3. Collect evaluation metrics
    • Label-free evals: Faithfulness, relevancy, adherence to guidelines, toxicity
    • With-label evals: Correctness

Optimising RAG Systems

from-simple-to-advanced-rag

Table Stakes

Chunk Sizes

  • Tuning the chunk size can have impacts on performance
  • Not obvious that more retrieved tokens lead to higher performance
  • Reranking (shuffling context order) isn’t always beneficial
    • Due to lost-in-the-middle problems: Information in the middle of the LLM context window tends to get lost, while information at the end are well remembered

Metadata Filtering

table-stakes-metadata-filtering

  • Metadata: Context you can inject into each text chunk
    • e.g., Page number, document title, summary of adjacent chunks, questions that chunk can answer (reverse HyDE)
    • Benefits
      • Can help retrieval
      • Can augment response quality
      • Integrates with VectorDB metadata filters

Advanced Retrieval

Small-to-Big

advanced-retrieval-small-to-big

advanced-retrieval-small-to-big-2 Image Source: LlamaIndex

  • Intuition: Embedding a big text chunk feels suboptimal.
  • Solutions
    • Embed a text at the sentence-level, then expand that window during synthesis (Sentence window retrieval)
    • Embed a smaller reference (e.g., smaller chunks, summaries, metadata) to the parent chunk, and use the parent chunk for synthesis

Structured Retrieval

Agentic Behaviour

Multi-Document Agents

agentic-behaviour-multi-document-agents

Fine-Tuning

Embedding Model

llms-to-generate-labelled-data Image Source: Jo Kristian Bergum, vespa.ai

  • Intuition: Embedding representations are not optimised over the custom dataset
  • Solution: Generate a synthetic query dataset from raw text chunks using LLMs, and use this synthetic dataset to finetune an embedding model

LLM

  • Intuition: Weaker LLMs are relatively worse at response synthesis, reasoning, structured oututs, etc.
  • Solution: Generate a synthetic dataset from raw chunks using strong LLMs, and use the synthetic dataset to finetune the LLM

References