Large language models (LLMs), also termed large foundation models (LFMs), in recent times have been enabling the creation of innovative software products that are solving a wide range of problems that were unimaginable until recent times. Different stakeholders in the software engineering and AI arena need to learn about how to create such LLM-powered software applications. And, the most important aspect of creating such apps is the application architecture of such LLM applications.
In this blog, we will learn about key application architecture components for LLM-based applications. This would be helpful for product managers, software architects, LLM architects, ML engineers, etc. LLMs in the software engineering landscape are also termed as the reasoning engine.
You might want to check this book to learn more – Building LLM-powered applications.
The following are the key application architectural components of LLM-powered applications:
The following code uses LanChain as an AI orchestration framework and is used to interact with embedding models, vector databases, LLM providers (such as OpenAI), etc.
Prompt converted to vector representation: The prompts once entered by the users are converted into a vector representation. The following is a sample code:
from langchain.embeddings import HuggingFaceEmbeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
prompt_vector = embedding_model.embed(prompt)
Similarity search from Vector DB to retrieve context: Use the generated prompt vector in the above step to perform a similarity search in the vector database (e.g., Pinecone, FAISS, Milvus) to retrieve relevant past conversations.
from langchain.vectorstores import Pinecone
vector_store = Pinecone(api_key='your_api_key', index_name='conversation_index')
relevant_conversations = vector_store.similarity_search(prompt_vector, top_k=5)
Context creation: Retrieve the most relevant past conversations and process them to create a context for the LLM. This could involve concatenating the retrieved conversations or using a more sophisticated method to combine them.
context = "\n".join([conv['text'] for conv in relevant_conversations])
Generate the response from LLM: Pass the combined context along with the new prompt to the LLM to generate a response.
from langchain.llms import OpenAI
llm = OpenAI(api_key='your_openai_api_key')
full_prompt = f"{context}\nUser: {prompt}\nAssistant:"
response = llm.generate(full_prompt)
user_response = response['choices'][0]['text']
In this blog, we will learn about the concepts of completion and chat large language…
As part of laying down application architecture for LLM applications, one key focus area is…
Suppose your machine learning model is serialized as a Python pickle file and later loaded…
Last updated: 15th May, 2024 Have you ever wondered how your bank decides what to…
In this fast-changing world, the ability to learn effectively is more valuable than ever. Whether…
Last updated: 13th May, 2024 Whether you are a researcher, data analyst, or data scientist,…