The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated applications that leverage large datasets to answer questions effectively. In this blog, we will explore the steps to build an LLM RAG application using LangChain.
Before diving into the implementation, ensure you have the required libraries installed. Execute the following command to install the necessary packages:
!pip install langchain langchain_community langchainhub langchain-openai tiktoken chromadb
LangChain integrates with various APIs to enable tracing and embedding generation, which are crucial for debugging workflows and creating compact numerical representations of text data for efficient retrieval and processing in RAG applications. Set up the required environment variables for LangChain and OpenAI:
import os
os.environ['LANGSMITH_TRACING'] = 'true'
os.environ['LANGSMITH_API_KEY'] = '<langsmith-api-key>'
os.environ['OPENAI_API_KEY'] = '<openai-api-key>'
Indexing is the process of preparing your dataset for retrieval. In this example, we load and process a blog post for indexing.
We use WebBaseLoader to scrape the content from a blog URL. In this case, the content is restricted to certain HTML classes using BeautifulSoup:
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
blog_docs = loader.load()
Large documents need to be divided into manageable chunks for efficient retrieval. This process ensures that the system can handle queries effectively by focusing on smaller, relevant sections of data instead of scanning an entire document. For example, in legal document review or scientific research, chunking helps pinpoint specific information quickly, improving both speed and accuracy of the retrieval process. We use RecursiveCharacterTextSplitter to split the blog into smaller pieces:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=300,
chunk_overlap=50
)
splits = text_splitter.split_documents(blog_docs)
The document chunks are converted into vector embeddings using OpenAI’s embedding model and stored in a vector database (Chroma):
Note some of the following in the code below:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()
The retriever enables the search functionality for fetching the most relevant chunks of content based on a query. For example, if you ask, ‘What are the key components of an AI agent?’, the retriever identifies and retrieves the most pertinent section from the indexed blog, ensuring precise and contextually relevant results. You can customize retrieval behavior by setting parameters like the number of results (k
):
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})
With the retriever in place, we now configure a language model to generate responses based on the retrieved context.
The prompt defines how the model should format and generate the response:
from langchain.prompts import ChatPromptTemplate
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
You can also use the prompt template from LangChain hub. The following code can be used to get the prompt template:
# Define prompt for question-answering
from langchain import hub
prompt = hub.pull("rlm/rag-prompt")
This is what the prompt looks like when you visit the rag-prompt page.
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
We use OpenAI’s GPT-3.5-turbo model to handle the generation task. The temperature is set to 0 for deterministic outputs:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
LangChain provides a modular pipeline for combining retrieval and generation steps into a unified chain:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
Finally, invoke the RAG chain with your question and get a precise answer:
response = rag_chain.invoke("What is the difference between Self Reflection and Task Composition?")
print(response)
By following the steps outlined above, you can build a powerful RAG application capable of answering questions based on indexed content. The combination of LangChain’s modularity, OpenAI’s embeddings, and Chroma’s vector store makes the process seamless.
Start experimenting today and expand your application’s capabilities by integrating additional datasets, refining prompts, or enhancing retrieval strategies.
Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…
Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…
Software-as-a-Service (SaaS) providers have long relied on traditional chatbot solutions like AWS Lex and Google…
Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large…
Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…
When building a Retrieval-Augmented Generation (RAG) application powered by Large Language Models (LLMs), which combine…