Building a RAG Application with LangChain: Example Code

The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated applications that leverage large datasets to answer questions effectively. In this blog, we will explore the steps to build an LLM RAG application using LangChain.

Prerequisites

Before diving into the implementation, ensure you have the required libraries installed. Execute the following command to install the necessary packages:

!pip install langchain langchain_community langchainhub langchain-openai tiktoken chromadb

Setting Up Environment Variables

LangChain integrates with various APIs to enable tracing and embedding generation, which are crucial for debugging workflows and creating compact numerical representations of text data for efficient retrieval and processing in RAG applications. Set up the required environment variables for LangChain and OpenAI:

import os
os.environ['LANGSMITH_TRACING'] = 'true'
os.environ['LANGSMITH_API_KEY'] = '<langsmith-api-key>'
os.environ['OPENAI_API_KEY'] = '<openai-api-key>'

Step 1: Indexing Content

Indexing is the process of preparing your dataset for retrieval. In this example, we load and process a blog post for indexing.

Loading the Blog Content

We use WebBaseLoader to scrape the content from a blog URL. In this case, the content is restricted to certain HTML classes using BeautifulSoup:

import bs4
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

Splitting the Content

Large documents need to be divided into manageable chunks for efficient retrieval. This process ensures that the system can handle queries effectively by focusing on smaller, relevant sections of data instead of scanning an entire document. For example, in legal document review or scientific research, chunking helps pinpoint specific information quickly, improving both speed and accuracy of the retrieval process. We use RecursiveCharacterTextSplitter to split the blog into smaller pieces:

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300,
    chunk_overlap=50
)
splits = text_splitter.split_documents(blog_docs)

Indexing with Embeddings

The document chunks are converted into vector embeddings using OpenAI’s embedding model and stored in a vector database (Chroma):

Note some of the following in the code below:

An embedding model is created using the function call OpenAIEmbeddings. You can pass the available embedding models from OpenAI such as text-embedding-3-large, text-embedding-3-small, or text-embedding-ada-002. Each of the embedding models comes with its own trade-offs. If you are looking for greater accuracy, you can select text-embedding-3-large. However, this would result in higher cost and reduced performance in case of searches. The code would look like the following: embedding = OpenAIEmbeddings(model=”text-embedding-3-large”)
The next point to note is the instantiation of vector store for storing these embeddings.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

Step 2: Retrieval

The retriever enables the search functionality for fetching the most relevant chunks of content based on a query. For example, if you ask, ‘What are the key components of an AI agent?’, the retriever identifies and retrieves the most pertinent section from the indexed blog, ensuring precise and contextually relevant results. You can customize retrieval behavior by setting parameters like the number of results (k):

retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

Step 3: Generating Responses

With the retriever in place, we now configure a language model to generate responses based on the retrieved context.

Setting Up the Prompt

The prompt defines how the model should format and generate the response:

from langchain.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

You can also use the prompt template from LangChain hub. The following code can be used to get the prompt template:

# Define prompt for question-answering
from langchain import hub
prompt = hub.pull("rlm/rag-prompt")

This is what the prompt looks like when you visit the rag-prompt page.

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:

Configuring the Language Model

We use OpenAI’s GPT-3.5-turbo model to handle the generation task. The temperature is set to 0 for deterministic outputs:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

Step 4: Building the RAG Chain

LangChain provides a modular pipeline for combining retrieval and generation steps into a unified chain:

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Step 5: Querying the Application

Finally, invoke the RAG chain with your question and get a precise answer:

response = rag_chain.invoke("What is the difference between Self Reflection and Task Composition?")
print(response)

Conclusion

By following the steps outlined above, you can build a powerful RAG application capable of answering questions based on indexed content. The combination of LangChain’s modularity, OpenAI’s embeddings, and Chroma’s vector store makes the process seamless.

Start experimenting today and expand your application’s capabilities by integrating additional datasets, refining prompts, or enhancing retrieval strategies.

Author
Recent Posts

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin.
Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.