Large language models (LLMs), also termed large foundation models (LFMs), in recent times have been enabling the creation of innovative software products that are solving a wide range of problems that were unimaginable until recent times. Different stakeholders in the software engineering and AI arena need to learn about how to create such LLM-powered software applications. And, the most important aspect of creating such apps is the application architecture of such LLM applications.
In this blog, we will learn about key application architecture components for LLM-based applications. This would be helpful for product managers, software architects, LLM architects, ML engineers, etc. LLMs in the software engineering landscape are also termed as the reasoning engine.
You might want to check this book to learn more – Building LLM-powered applications.
The following are the key application architectural components of LLM-powered applications:
The following code uses LanChain as an AI orchestration framework and is used to interact with embedding models, vector databases, LLM providers (such as OpenAI), etc.
Prompt converted to vector representation: The prompts once entered by the users are converted into a vector representation. The following is a sample code:
from langchain.embeddings import HuggingFaceEmbeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
prompt_vector = embedding_model.embed(prompt)
Similarity search from Vector DB to retrieve context: Use the generated prompt vector in the above step to perform a similarity search in the vector database (e.g., Pinecone, FAISS, Milvus) to retrieve relevant past conversations.
from langchain.vectorstores import Pinecone
vector_store = Pinecone(api_key='your_api_key', index_name='conversation_index')
relevant_conversations = vector_store.similarity_search(prompt_vector, top_k=5)
Context creation: Retrieve the most relevant past conversations and process them to create a context for the LLM. This could involve concatenating the retrieved conversations or using a more sophisticated method to combine them.
context = "\n".join([conv['text'] for conv in relevant_conversations])
Generate the response from LLM: Pass the combined context along with the new prompt to the LLM to generate a response.
from langchain.llms import OpenAI
llm = OpenAI(api_key='your_openai_api_key')
full_prompt = f"{context}\nUser: {prompt}\nAssistant:"
response = llm.generate(full_prompt)
user_response = response['choices'][0]['text']
Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…
Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…
Software-as-a-Service (SaaS) providers have long relied on traditional chatbot solutions like AWS Lex and Google…
Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large…
The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated…
Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…