Chunking Strategies for RAG with Examples

If you’ve built a “Naive” RAG pipeline, you’ve probably hit a wall. You’ve indexed your documents, but the answers are… mediocre. They’re out of context, they miss the point, or they just feel wrong.

Here’s the truth: Your RAG system is only as good as its chunks.

Chunking—the process of breaking your documents into searchable pieces—is one of the most important decision you will make in your RAG pipeline. It’s not just “preprocessing”; it is the foundation of your AI’s knowledge in the RAG application.

The problem is what I call the “Chunking Goldilocks Problem”:

Chunks too big? You get “noisy” context. The large language models (LLMs) has to read a 10-page document to find one sentence, and the search itself is imprecise.
Chunks too small? You get “context-deficient” snippets. The LLM gets a single, out-of-context sentence and can’t formulate a meaningful answer.

Let’s walk through the evolution of chunking strategies, from the simple baseline to the state-of-the-art, so you can decide which one is right for your project.

0. The ‘Naive’ Way: Fixed-Size Chunking

This is the most basic method. You simply decide on a length (e.g., 500 characters) and a small overlap (e.g., 50 characters) and slice the document from top to bottom.

How it works: It’s a “dumb” slicer. It doesn’t know what a word, sentence, or paragraph is. It just counts characters and cuts.
Pros: It’s dead simple, fast, and 100% predictable.
Cons: This is the source of most RAG problems. It will split sentences in half (“semantic fragmentation”) and separate key ideas from their context. It’s the “brute force” method and should be avoided for most production systems.

1. Recursive Character Chunking

This is the default for most RAG tutorials and a smart step up from “fixed-size” chunking. Instead of a hard cut-off (e.g., “every 500 characters”), it splits text using a priority-ordered list of separators.

How it works: It tries to split on \n\n (paragraphs) first. If a resulting chunk is still too big, it splits that chunk by \n (lines). If that’s still too big, it splits by (spaces), and so on.
Pros: It’s fast, simple, and “context-aware” enough to respect basic document structure like paragraphs and lists.
Cons: It’s still “dumb” to the semantic meaning of the text. It can easily separate a key idea from its conclusion if they are in different paragraphs.

2. Parent-Child Chunking

This is the best “bang-for-your-buck” strategy and solves the Goldilocks Problem brilliantly. It separates the chunk you search for from the chunk you generate with.

How it works:
1. Parent Split: First, you split your document into large, logical “Parent” chunks (e.g., an entire section of a manual).
2. Child Split: Then, you split each Parent chunk into many small, precise “Child” chunks (e.g., individual paragraphs or sentences).
3. Indexing: You only embed and index the small Child Chunks. Each child chunk stores a pointer to its Parent.
4. Retrieval: The user’s query searches for the most relevant Child Chunks. But when it’s time to generate an answer, you retrieve the full Parent Chunks associated with those children.
Pros: You get the best of both worlds: the precise, targeted search of small chunks and the rich, full context of large chunks for the LLM.
Cons: It’s slightly more complex to set up your indexing pipeline.

3. Semantic Chunking

What if your document doesn’t have clear sections? What if it’s a dense, narrative essay or a long-form article? This is where Semantic Chunking shines. Instead of splitting by characters, it splits by meaning.

How it works:
1. It breaks the document into individual sentences.
2. It embeds every single sentence, converting them into vectors.
3. It calculates the “distance” (how different the meaning is) between one sentence and the next.
4. When it finds a large “semantic gap” between two sentences, it means the topic has changed, and it places a chunk boundary there.
Pros: Creates perfectly-sized, thematically coherent chunks. A chunk might be 3 sentences or 10, but it will always be about a single topic.
Cons: Requires an extra embedding step during indexing, which adds time and cost.
Further Reading: This approach has been systematically benchmarked. Recent research, like the one detailed in “Evaluating Semantic Chunking for RAG” (arXiv:2410.13070), confirms that semantic-based approaches can significantly outperform traditional chunking on retrieval tasks.

4. Propositional Chunking

This is the current state-of-the-art and a complete paradigm shift. The idea is simple: instead of indexing what the author wrote, you index what the author meant.

How it works:
1. You run your document through an LLM first. You instruct the LLM to break down every paragraph into a list of “atomic facts” or “propositions.” You then embed and index these propositions instead of the original text.
Original Text: “Agent Alpha, built by Acme Corp in 2024, is a data bot that excels at financial reports and has a 95% accuracy rate.” Propositions:
- “Agent Alpha is a data bot.”
- “Agent Alpha was built by Acme Corp.”
- “Agent Alpha was built in 2024.”
- “Agent Alpha excels at financial reports.”
- “Agent Alpha has a 95% accuracy rate.”
Pros: This is hyper-precise. A query like “When was Agent Alpha built?” will make a 1:1 match with the third proposition. This is the ultimate “needle-in-a-haystack” solution.
Cons: This is the most expensive indexing strategy, as it requires an LLM call for every paragraph in your entire knowledge base.
Further Reading: This strategy was detailed in the foundational paper “Dense Retrieval as Byproduct of RAG: A Novel Approach to Propositional Retrieval” (arXiv:2312.06648). The idea is so powerful that new research is even applying it to the query itself, as seen in “Improving RAG Retrieval via Propositional Content Extraction” (arXiv:2503.10654).

5. Structured Chunking

What about documents that aren’t just text? PDFs, financial reports, and web pages are full of tables, lists, and headers. Throwing them at a text splitter will create a mess.

How it works: This is a “content-aware” strategy that uses the document’s structure as its guide.
- It parses tables as a whole, often serializing them as Markdown or JSON, and indexes them as a “table chunk.”
- It uses headers (<h1>, <h2>) as natural “Parent” boundaries.
- It treats lists (<ul>, <ol>) as a single, coherent chunk.
Pros: By far the most effective method for complex, “real-world” documents. It preserves the data’s structure, which is often just as important as the text’s meaning.
Cons: Requires a robust parsing library (like unstructured.io or LlamaParse) and a more complex indexing pipeline.
Further Reading: The impact of this is huge in specific domains. A great example is “Financial Report Chunking for Effective Retrieval Augmented Generation” (arXiv:2402.05131), which shows that parsing a report’s structural elements (like tables) is critical for accuracy.

When to Use Which Strategy: A Simple Guide

There is no single “best” method. The right choice depends on your documents, your accuracy needs, and your budget.

If your primary need is…	The Best Strategy is…	Why?
Rapid Prototyping	Recursive Character Chunking	It’s fast, easy, and the default. Good enough to see if your RAG system is viable.
General Purpose Q&A(e.g., Manuals, Textbooks, Legal)	Parent-Child Chunking	The best balance of search precision (small child chunks) and generation context (large parent chunks).
Dense, Unstructured Text(e.g., Essays, Research Papers)	Semantic Chunking	It creates thematically pure chunks by finding the “topic breaks” in the narrative.
Extreme Factual Accuracy(e.g., High-Stakes Q&A, Fact-Checking)	Propositional Chunking	It creates a 1:1 mapping between a fact and a query. Highest accuracy, highest cost.
Complex, “Messy” Documents(e.g., PDFs, Tables, HTML)	Structured Chunking	It respects the document’s layout, preserving tables and sections, which are vital pieces of context.

Final Thought

In 2025, “chunking” is no longer just a preprocessing step you can ignore. It’s the core of your retrieval strategy. The trend is clear: we are moving away from static, fixed chunks and toward intelligent, dynamic, and structured representations of knowledge.

Choose your chunking strategy for your upcoming RAG application wisely—it will make all the difference.

Author
Recent Posts

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin.
Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.