Large Language Models (LLMs) & Semantic Search

Large Language Models and Semantic Search

Ever scratched your head wondering how a few typed words can bring up the precise information you need from the sprawling web? That’s the work of something called Large Language Models or LLMs, like the GPT-series from OpenAI. These large language models (LLMs) can be used to search that needle-in-a-haystack piece of information you’re after. So, how do they do it? They use smart techniques like Dense Retrieval, Reranking, and Generative Search. In this blog, you will learn about these great techniques in an easy-to-understand way.

Dense Retrieval

Dense retrieval is a departure from traditional information retrieval approaches that often rely on sparse features like Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). These sparse methods create high-dimensional vectors where most elements are zero. In contrast, dense retrieval generates dense, lower-dimensional representations for both queries and documents. The following are key steps of dense retrieval method:

  1. Encoding: The first step in dense retrieval involves encoding both the query and the documents. The encoding process uses a neural model, such as a transformer network, to convert the text of the query and the documents into dense vectors in a shared embedding space.
  2. Mapping: Each document and query in the database is mapped to a vector in the embedding space. Semantically similar documents and queries should be close together in this space. This process involves a machine learning model that has been trained to understand and capture the semantic relationships between different pieces of text.
  3. Query Processing: When a new query comes in, it is also transformed into a dense vector using the same model. The vector represents the semantic content of the query.
  4. Vector Comparison: The dense vector of the new query is compared to the vectors of the documents in the database. The comparison is typically done using cosine similarity, which is a measure of the cosine of the angle between two vectors. This measures how closely related the new query is to each document.
  5. Ranking: The documents are ranked based on their similarity to the query vector. The most similar documents (those whose vectors are most similar to the query vector) are considered the most relevant to the query.
  6. Retrieval: The top ranked documents are then retrieved as the search results.


Reranking, or re-scoring, is an essential step in many retrieval systems to enhance the quality of initially retrieved results. After the retrieval step, which could be sparse or dense retrieval, reranking models score and reorder the returned results based on a more nuanced understanding of the query’s semantics.

Here are the key steps involved in the reranking process:

  1. Initial Retrieval: First, an initial set of relevant documents or results is retrieved using either a sparse retrieval method like Bag-of-Words (BoW) or Term Frequency-Inverse Document Frequency (TF-IDF), or a dense retrieval method like the one described above.
  2. Model Selection: Choose a model that is capable of understanding deeper semantic and contextual relationships in the text. This could be a pre-trained large language model (LLM) like BERT, GPT, or others.
  3. Feature Extraction: Extract features from both the query and the initially retrieved documents. This involves converting the text into a form that the model can understand and process. For LLMs, this typically involves encoding the text into embeddings or dense vectors that represent the semantic content of the text.
  4. Scoring: The model then assigns a new score to each document based on its relevance to the query. The model does this by considering the query and the document together, allowing it to take into account the semantic and contextual relationship between them. This is a significant advantage over initial retrieval methods, which often consider the query and documents separately.
  5. Ranking: The initially retrieved documents are then re-ranked based on the new scores. The goal is to move the most relevant documents to the top of the list, based on the deeper understanding of the text provided by the reranking model.
  6. Post-processing: Optional step depending on the use case. This could involve applying business rules, user preference, or further fine-tuning of the ranking based on additional features.
  7. Retrieval: The re-ranked list of documents is then returned as the final search results.

Like with dense retrieval, the effectiveness of reranking can depend heavily on the quality of the model and the data it was trained on.

Generative Search

Generative Search, on the other hand, opens a new frontier in semantic search. Traditional retrieval methods start by finding relevant documents based on the query and then extract answers from those documents. In contrast, generative models directly generate responses to queries.

LLMs, with their sophisticated sequence-generation capabilities, are ideally suited for this task. They do not just find relevant information but can generate answers in natural language, thus offering a conversational feel to the search process. This trait of LLMs has been particularly beneficial in areas such as chatbots and question-answering systems. However, the quality and reliability of generative search often depend on the training data and how well the model has been fine-tuned for the specific task.

Here are the key steps in the Generative Search process:

  1. Query Understanding: The model first parses the user’s query to understand its semantic and contextual meaning. This might involve encoding the query into an internal representation using learned embeddings.
  2. Model Selection: Choose a generative model capable of generating responses based on the query. This is often a pre-trained large language model (LLM) like GPT, which has been trained on a vast corpus of text and thus learned to generate meaningful and coherent text.
  3. Generation Process: The model uses the parsed query to generate a response. This involves selecting words or phrases one at a time to build up a response. The model uses its internal representation of the query and its learned understanding of language to choose each word in a way that maximizes the coherence and relevance of the response.
  4. Result Refinement: Depending on the model and use case, there may be a post-generation refinement step. This could involve filtering or editing the generated response to ensure it meets certain criteria, like maximum length, appropriateness, or relevance to the original query.
  5. Delivery: Finally, the generated response is returned to the user. In a chatbot or dialogue system, this could involve displaying the response as a message from the bot. In a search system, the response could be displayed as the search result.


In a nutshell, Large Language Models (LLMs), through techniques like Dense Retrieval, Reranking, and Generative Search, are rewriting the rules of information retrieval and semantic search. They’ve transformed how we pull precise information from the ever-expanding digital universe, enhancing both speed and accuracy. As we move forward, we can expect these models to get even better, bringing us closer to a seamless interaction between humans and machines. The future of search promises to be more intuitive, personal, and effective, making it an exciting space to watch.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog,
Posted in Deep Learning, Generative AI, Machine Learning. Tagged with , , , .

Leave a Reply

Your email address will not be published. Required fields are marked *