Large Language Models

Chunking Strategies for RAG with Examples

If you’ve built a “Naive” RAG pipeline, you’ve probably hit a wall. You’ve indexed your documents, but the answers are… mediocre. They’re out of context, they miss the point, or they just feel wrong.

Here’s the truth: Your RAG system is only as good as its chunks.

Chunking—the process of breaking your documents into searchable pieces—is one of the most important decision you will make in your RAG pipeline. It’s not just “preprocessing”; it is the foundation of your AI’s knowledge in the RAG application.

The problem is what I call the “Chunking Goldilocks Problem”:

  • Chunks too big? You get “noisy” context. The large language models (LLMs) has to read a 10-page document to find one sentence, and the search itself is imprecise.
  • Chunks too small? You get “context-deficient” snippets. The LLM gets a single, out-of-context sentence and can’t formulate a meaningful answer.

Let’s walk through the evolution of chunking strategies, from the simple baseline to the state-of-the-art, so you can decide which one is right for your project.

0. The ‘Naive’ Way: Fixed-Size Chunking

This is the most basic method. You simply decide on a length (e.g., 500 characters) and a small overlap (e.g., 50 characters) and slice the document from top to bottom.

  • How it works: It’s a “dumb” slicer. It doesn’t know what a word, sentence, or paragraph is. It just counts characters and cuts.

  • Pros: It’s dead simple, fast, and 100% predictable.

  • Cons: This is the source of most RAG problems. It will split sentences in half (“semantic fragmentation”) and separate key ideas from their context. It’s the “brute force” method and should be avoided for most production systems.

1. Recursive Character Chunking

This is the default for most RAG tutorials and a smart step up from “fixed-size” chunking. Instead of a hard cut-off (e.g., “every 500 characters”), it splits text using a priority-ordered list of separators.

  • How it works: It tries to split on \n\n (paragraphs) first. If a resulting chunk is still too big, it splits that chunk by \n (lines). If that’s still too big, it splits by (spaces), and so on.
  • Pros: It’s fast, simple, and “context-aware” enough to respect basic document structure like paragraphs and lists.
  • Cons: It’s still “dumb” to the semantic meaning of the text. It can easily separate a key idea from its conclusion if they are in different paragraphs.

2. Parent-Child Chunking

This is the best “bang-for-your-buck” strategy and solves the Goldilocks Problem brilliantly. It separates the chunk you search for from the chunk you generate with.

  • How it works:
    1. Parent Split: First, you split your document into large, logical “Parent” chunks (e.g., an entire section of a manual).
    2. Child Split: Then, you split each Parent chunk into many small, precise “Child” chunks (e.g., individual paragraphs or sentences).
    3. Indexing: You only embed and index the small Child Chunks. Each child chunk stores a pointer to its Parent.
    4. Retrieval: The user’s query searches for the most relevant Child Chunks. But when it’s time to generate an answer, you retrieve the full Parent Chunks associated with those children.
  • Pros: You get the best of both worlds: the precise, targeted search of small chunks and the rich, full context of large chunks for the LLM.
  • Cons: It’s slightly more complex to set up your indexing pipeline.

3. Semantic Chunking

What if your document doesn’t have clear sections? What if it’s a dense, narrative essay or a long-form article? This is where Semantic Chunking shines. Instead of splitting by characters, it splits by meaning.

  • How it works:
    1. It breaks the document into individual sentences.
    2. It embeds every single sentence, converting them into vectors.
    3. It calculates the “distance” (how different the meaning is) between one sentence and the next.
    4. When it finds a large “semantic gap” between two sentences, it means the topic has changed, and it places a chunk boundary there.
  • Pros: Creates perfectly-sized, thematically coherent chunks. A chunk might be 3 sentences or 10, but it will always be about a single topic.
  • Cons: Requires an extra embedding step during indexing, which adds time and cost.
  • Further Reading: This approach has been systematically benchmarked. Recent research, like the one detailed in “Evaluating Semantic Chunking for RAG” (arXiv:2410.13070), confirms that semantic-based approaches can significantly outperform traditional chunking on retrieval tasks.

4. Propositional Chunking

This is the current state-of-the-art and a complete paradigm shift. The idea is simple: instead of indexing what the author wrote, you index what the author meant.

  • How it works:
    1. You run your document through an LLM first. You instruct the LLM to break down every paragraph into a list of “atomic facts” or “propositions.” You then embed and index these propositions instead of the original text.
    Original Text: “Agent Alpha, built by Acme Corp in 2024, is a data bot that excels at financial reports and has a 95% accuracy rate.” Propositions:
    • “Agent Alpha is a data bot.”
    • “Agent Alpha was built by Acme Corp.”
    • “Agent Alpha was built in 2024.”
    • “Agent Alpha excels at financial reports.”
    • “Agent Alpha has a 95% accuracy rate.”
  • Pros: This is hyper-precise. A query like “When was Agent Alpha built?” will make a 1:1 match with the third proposition. This is the ultimate “needle-in-a-haystack” solution.
  • Cons: This is the most expensive indexing strategy, as it requires an LLM call for every paragraph in your entire knowledge base.
  • Further Reading: This strategy was detailed in the foundational paper “Dense Retrieval as Byproduct of RAG: A Novel Approach to Propositional Retrieval” (arXiv:2312.06648). The idea is so powerful that new research is even applying it to the query itself, as seen in “Improving RAG Retrieval via Propositional Content Extraction” (arXiv:2503.10654).

5. Structured Chunking

What about documents that aren’t just text? PDFs, financial reports, and web pages are full of tables, lists, and headers. Throwing them at a text splitter will create a mess.

  • How it works: This is a “content-aware” strategy that uses the document’s structure as its guide.
    • It parses tables as a whole, often serializing them as Markdown or JSON, and indexes them as a “table chunk.”
    • It uses headers (<h1>, <h2>) as natural “Parent” boundaries.
    • It treats lists (<ul>, <ol>) as a single, coherent chunk.
  • Pros: By far the most effective method for complex, “real-world” documents. It preserves the data’s structure, which is often just as important as the text’s meaning.
  • Cons: Requires a robust parsing library (like unstructured.io or LlamaParse) and a more complex indexing pipeline.
  • Further Reading: The impact of this is huge in specific domains. A great example is “Financial Report Chunking for Effective Retrieval Augmented Generation” (arXiv:2402.05131), which shows that parsing a report’s structural elements (like tables) is critical for accuracy.

When to Use Which Strategy: A Simple Guide

There is no single “best” method. The right choice depends on your documents, your accuracy needs, and your budget.

If your primary need is…The Best Strategy is…Why?
Rapid PrototypingRecursive Character ChunkingIt’s fast, easy, and the default. Good enough to see if your RAG system is viable.
General Purpose Q&A(e.g., Manuals, Textbooks, Legal)Parent-Child ChunkingThe best balance of search precision (small child chunks) and generation context (large parent chunks).
Dense, Unstructured Text(e.g., Essays, Research Papers)Semantic ChunkingIt creates thematically pure chunks by finding the “topic breaks” in the narrative.
Extreme Factual Accuracy(e.g., High-Stakes Q&A, Fact-Checking)Propositional ChunkingIt creates a 1:1 mapping between a fact and a query. Highest accuracy, highest cost.
Complex, “Messy” Documents(e.g., PDFs, Tables, HTML)Structured ChunkingIt respects the document’s layout, preserving tables and sections, which are vital pieces of context.

Final Thought

In 2025, “chunking” is no longer just a preprocessing step you can ignore. It’s the core of your retrieval strategy. The trend is clear: we are moving away from static, fixed chunks and toward intelligent, dynamic, and structured representations of knowledge.

Choose your chunking strategy for your upcoming RAG application wisely—it will make all the difference.

 

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

RAG Pipeline: 6 Steps for Creating Naive RAG App

If you're starting with large language models, you must have heard of RAG (Retrieval-Augmented Generation).…

1 day ago

Python: List Comprehension Explained with Examples

If you've spent any time with Python, you've likely heard the term "Pythonic." It refers…

4 days ago

Large Language Models (LLMs): Four Critical Modeling Stages

Large language models (LLMs) have fundamentally transformed our digital landscape, powering everything from chatbots and…

3 months ago

Agentic Workflow Design Patterns Explained with Examples

As Large Language Models (LLMs) evolve into autonomous agents, understanding agentic workflow design patterns has…

3 months ago

What is Data Strategy?

In today's data-driven business landscape, organizations are constantly seeking ways to harness the power of…

3 months ago

Mathematics Topics for Machine Learning Beginners

In this blog, you would get to know the essential mathematical topics you need to…

4 months ago