Large language models (LLMs), also termed large foundation models (LFMs), in recent times have been enabling the creation of innovative software products that are solving a wide range of problems that were unimaginable until recent times. Different stakeholders in the software engineering and AI arena need to learn about how to create such LLM-powered software applications. And, the most important aspect of creating such apps is the application architecture of such LLM applications.
In this blog, we will learn about key application architecture components for LLM-based applications. This would be helpful for product managers, software architects, LLM architects, ML engineers, etc. LLMs in the software engineering landscape are also termed as the reasoning engine.
You might want to check this book to learn more – Building LLM-powered applications.
The following are the key application architectural components of LLM-powered applications:
The following code uses LanChain as an AI orchestration framework and is used to interact with embedding models, vector databases, LLM providers (such as OpenAI), etc.
Prompt converted to vector representation: The prompts once entered by the users are converted into a vector representation. The following is a sample code:
from langchain.embeddings import HuggingFaceEmbeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
prompt_vector = embedding_model.embed(prompt)
Similarity search from Vector DB to retrieve context: Use the generated prompt vector in the above step to perform a similarity search in the vector database (e.g., Pinecone, FAISS, Milvus) to retrieve relevant past conversations.
from langchain.vectorstores import Pinecone
vector_store = Pinecone(api_key='your_api_key', index_name='conversation_index')
relevant_conversations = vector_store.similarity_search(prompt_vector, top_k=5)
Context creation: Retrieve the most relevant past conversations and process them to create a context for the LLM. This could involve concatenating the retrieved conversations or using a more sophisticated method to combine them.
context = "\n".join([conv['text'] for conv in relevant_conversations])
Generate the response from LLM: Pass the combined context along with the new prompt to the LLM to generate a response.
from langchain.llms import OpenAI
llm = OpenAI(api_key='your_openai_api_key')
full_prompt = f"{context}\nUser: {prompt}\nAssistant:"
response = llm.generate(full_prompt)
user_response = response['choices'][0]['text']
When building a regression model or performing regression analysis to predict a target variable, understanding…
If you've built a "Naive" RAG pipeline, you've probably hit a wall. You've indexed your…
If you're starting with large language models, you must have heard of RAG (Retrieval-Augmented Generation).…
If you've spent any time with Python, you've likely heard the term "Pythonic." It refers…
Large language models (LLMs) have fundamentally transformed our digital landscape, powering everything from chatbots and…
As Large Language Models (LLMs) evolve into autonomous agents, understanding agentic workflow design patterns has…