Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large language models (LLMs) to enhance response accuracy and contextual relevance. Unlike traditional retrieval systems that return existing documents or generative models that rely solely on pre-trained knowledge, RAG technique dynamically integrates context as retrieved information related to query with LLM outputs. LangGraph, an advanced extension of LangChain, provides a structured workflow for developing RAG applications. This guide will walk through the process of building a RAG system using LangGraph with example implementations.
Setting Up the Environment
To get started, we need to install the necessary dependencies. The following commands will ensure that all required LangChain and Langgraph packages are available:
!pip install langchain-openapi langchain-community langchain-text-splitters langgraph --quiet --upgrade
!pip install langchain_openai
Next, configure environment variables to enable seamless API interactions. We set up LangSmith key and tracing for audit and OpenAI API key to integrate with OpenAI chat models.
import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = 'lsv2_pt-xxx'
os.environ["OPENAI_API_KEY"] = 'sk-proj-xxxx'
Initializing the Language Model
A core component of the RAG system is the LLM, which generates responses by integrating context based on retrieving contextual information related to user question. Here, we initialize OpenAI’s GPT-4o-mini model:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
Selecting an Embedding Model and Vector Store
To efficiently retrieve relevant information, we use an embedding model to convert text into vector representations. An embedding model maps words, sentences, or entire documents into high-dimensional numerical vectors, capturing semantic relationships between them. This transformation enables efficient similarity searches, allowing the system to retrieve contextually relevant information based on query inputs. OpenAI provides several powerful embedding models, including text-embedding-3-large, text-embedding-3-small, and text-embedding-ada-002, each designed for different performance and efficiency trade-offs. text-embedding-3-large offers high accuracy and deeper semantic understanding, making it suitable for complex retrieval tasks, while text-embedding-3-small is optimized for lower computational costs with good performance. text-embedding-ada-002, a widely used earlier model, balances performance and efficiency for various natural language processing tasks.
The following code can be used for incorporate embedding model to convert text into vector representations. These vectors are stored in an in-memory vector database for quick lookup:
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
embedding_model = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = InMemoryVectorStore(embedding_model)
Collecting Data for the Knowledge Base
To enable retrieval, we need a dataset. In this example, we extract content from publicly available news articles related to digital arrest scams. This can be built into a chat application which helps users interact and validate whether they are becoming victim of digital arrest scam. The idea is to use content from upcoming news and create embeddings and store them in a vector store. When user sends the query, appropriate piece of information is retrieved from the vectorstore matching the query, and the context and query can be passed as prompt to LLM to generate the answer.
import requests
from bs4 import BeautifulSoup
import json
urls = [
"https://timesofindia.indiatimes.com/city/vijayawada/seven-cybercons-arrested-for-digital-arrest-scam-in-andhra-pradesh/articleshow/117649089.cms",
"https://timesofindia.indiatimes.com/city/bengaluru/digital-arrest-fraud-elderly-law-professor-in-bengaluru-duped-of-rs-7-lakh/articleshow/117751902.cms"
]
documents = []
for url in urls:
try:
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
script_tags = soup.find_all('script', {'type': 'application/ld+json'})
for script in script_tags:
try:
json_data = json.loads(script.string)
if "articleBody" in json_data:
documents.append(json_data["articleBody"])
break
except (json.JSONDecodeError, TypeError):
continue
else:
print(f"Failed to fetch {url}, Status code: {response.status_code}")
except requests.RequestException as e:
print(f"Error fetching {url}: {e}")
Processing Documents and Storing Vectors
To optimize retrieval, we split lengthy documents into smaller segments before storing them in the vector database. This process improves search precision and contextual matching:
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
docs = [Document(page_content=article) for article in documents]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
vectorstore.add_documents(documents=all_splits)
Designing the RAG Workflow
Setting Up the Prompt
The code initializes a predefined prompt template from LangChain’s hub to guide the response generation process. By pulling the “rlm/rag-prompt” template, it ensures that the language model follows a structured format when generating answers.
from langchain import hub
prompt = hub.pull("rlm/rag-prompt")
This is how the rag prompt look like:
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
Defining the System State
The application state in this RAG workflow is defined using a TypedDict named State, which organizes data at different stages of processing. It includes the user’s input question, a list of retrieved documents providing contextual information, and the generated response. This structured representation ensures smooth data flow between retrieval and generation steps, enabling efficient knowledge retrieval and response synthesis within the LangGraph framework.
The RAG system requires a structured state that tracks user queries, retrieved contexts, and generated responses:
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
class State(TypedDict):
question: str
context: List[Document]
answer: str
Implementing Retrieval and Response Generation
The retrieval function fetches relevant content from the vector database, while the generation function synthesizes an AI-driven response:
def retrieve(state: State):
retrieved_docs = vectorstore.similarity_search(state["question"])
return {"context": retrieved_docs}
def generate(state: State):
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
messages = prompt.invoke({"question": state["question"], "context": docs_content})
response = llm.invoke(messages)
return {"answer": response.content}
Building and Running the Workflow
The LangGraph workflow connects retrieval and generation in a structured manner to ensure seamless execution:
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
Testing the system with an example query:
response = graph.invoke({"question": "List three different scenarios of digital arrest?"})
print(response["answer"])
Conclusion
This guide provided a step-by-step approach to building a RAG system using LangGraph and LangChain. By integrating retrieval with AI-generated responses, we created a structured knowledge retrieval system that can process queries with improved accuracy.
For further learning, explore the official documentation for LangChain and LangGraph. Engaging with research papers on Retrieval-Augmented Generation and participating in community forums will provide deeper insights into this evolving field.
- Creating a RAG Application Using LangGraph: Example Code - February 1, 2025
- Building a RAG Application with LangChain: Example Code - January 31, 2025
- Building an OpenAI Chatbot with LangChain - January 30, 2025
I found it very helpful. However the differences are not too understandable for me