The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated applications that leverage large datasets to answer questions effectively. In this blog, we will explore the steps to build an LLM RAG application using LangChain.
Before diving into the implementation, ensure you have the required libraries installed. Execute the following command to install the necessary packages:
!pip install langchain langchain_community langchainhub langchain-openai tiktoken chromadb
LangChain integrates with various APIs to enable tracing and embedding generation, which are crucial for debugging workflows and creating compact numerical representations of text data for efficient retrieval and processing in RAG applications. Set up the required environment variables for LangChain and OpenAI:
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = '<langchain-api-key>'
os.environ['OPENAI_API_KEY'] = '<openai-api-key>'
Indexing is the process of preparing your dataset for retrieval. In this example, we load and process a blog post for indexing.
We use WebBaseLoader to scrape the content from a blog URL. In this case, the content is restricted to certain HTML classes using BeautifulSoup:
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
blog_docs = loader.load()
Large documents need to be divided into manageable chunks for efficient retrieval. This process ensures that the system can handle queries effectively by focusing on smaller, relevant sections of data instead of scanning an entire document. For example, in legal document review or scientific research, chunking helps pinpoint specific information quickly, improving both speed and accuracy of the retrieval process. We use RecursiveCharacterTextSplitter to split the blog into smaller pieces:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=300,
chunk_overlap=50
)
splits = text_splitter.split_documents(blog_docs)
The document chunks are converted into vector embeddings using OpenAI’s embedding model and stored in a vector database (Chroma):
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()
The retriever enables the search functionality for fetching the most relevant chunks of content based on a query. For example, if you ask, ‘What are the key components of an AI agent?’, the retriever identifies and retrieves the most pertinent section from the indexed blog, ensuring precise and contextually relevant results. You can customize retrieval behavior by setting parameters like the number of results (k
):
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})
With the retriever in place, we now configure a language model to generate responses based on the retrieved context.
The prompt defines how the model should format and generate the response:
from langchain.prompts import ChatPromptTemplate
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
We use OpenAI’s GPT-3.5-turbo model to handle the generation task. The temperature is set to 0 for deterministic outputs:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
LangChain provides a modular pipeline for combining retrieval and generation steps into a unified chain:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
Finally, invoke the RAG chain with your question and get a precise answer:
response = rag_chain.invoke("What is the difference between Self Reflection and Task Composition?")
print(response)
By following the steps outlined above, you can build a powerful RAG application capable of answering questions based on indexed content. The combination of LangChain’s modularity, OpenAI’s embeddings, and Chroma’s vector store makes the process seamless.
Start experimenting today and expand your application’s capabilities by integrating additional datasets, refining prompts, or enhancing retrieval strategies.
Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…
When building a Retrieval-Augmented Generation (RAG) application powered by Large Language Models (LLMs), which combine…
Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…
Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…