Have you ever wondered how your smartphone seems to know exactly what you’re going to type next? Or how virtual assistants like Alexa and Siri understand and respond to your queries with such precision? The magic is NLP language models. In this blog, we will explore the diverse types of language models in NLP that have evolved over time, each with its unique capabilities and applications. From the simplicity of N-gram models, which predict text based on preceding words, to the sophisticated neural network-based models like RNNs, LSTMs, and the groundbreaking large language models using Transformers, we will learn about the intricacies of these models, examples of real-world applications and Python code examples.
N-gram language models are fundamental in NLP. They rely on the probability of a word based on its preceding words, with the number of considered words defined by “N“. The “N” in N-gram refers to the number of words considered in a given context, and this leads to different types of N-gram models. The following are different types of N-gram language models:
The following are examples of real-world applications where N-gram language models can be used:
The following is an example of how to implement a basic 2-gram (bigram) language model for next word prediction in Python. This example uses the NLTK library, which is a popular toolkit for natural language processing in Python. Make sure to install NLTK (pip install nltk).
import nltk
from nltk.util import bigrams
from nltk.corpus import reuters
from collections import Counter, defaultdict
# Download necessary NLTK data
nltk.download('reuters')
nltk.download('punkt')
# Function to build a bigram model
def build_bigram_model():
model = defaultdict(lambda: defaultdict(lambda: 0))
for sentence in reuters.sents():
for w1, w2 in bigrams(sentence, pad_right=True, pad_left=True):
model[w1][w2] += 1
# Convert frequencies to probabilities
for w1 in model:
total_count = float(sum(model[w1].values()))
for w2 in model[w1]:
model[w1][w2] /= total_count
return model
# Build the model
model = build_bigram_model()
# Predict the next word
def predict_next_word(previous_word):
next_word = model[previous_word]
# Sort by probability
next_word = sorted(next_word.items(), key=lambda item: item[1], reverse=True)
return next_word[0][0] if next_word else None
# Example usage
previous_word = 'economic'
predicted_word = predict_next_word(previous_word)
print(f"The predicted next word after '{previous_word}' is '{predicted_word}'")
The Python code does the following:
Neural network based language models are superior to N-gram models in capturing complex relationships between words. These language models employ deep learning architectures to understand and generate human language. Many neural language models, especially those based on RNNs, LSTMs (Long Short-Term Memory), and GRUs (Gated Recurrent Units), are designed to process sequential data. This makes them adept at understanding the context and maintaining state over a sequence of words, essential for tasks like sentence completion and text generation. These models require more computational resources than N-gram models and can be slower to train, which might be a limiting factor in resource-constrained scenarios.
For implementing neural language models, TensorFlow or PyTorch are commonly used. They provide extensive support for building and training various neural network architectures. Here is an example using TensorFlow and Keras for an LSTM model:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
# Model architecture
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim))
model.add(LSTM(units=50))
model.add(Dense(units=vocab_size, activation='softmax'))
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy')
# Model training
# model.fit(x_train, y_train, epochs=num_epochs)
The following are examples of real-world applications where neural networks based language models such as RNN / LSTM / GRUs can be used:
Unlike RNNs and their derivatives such as LSTM and GRUs, transformers based language models are the go-to choice for complex language tasks like translation, question-answering, and text generation due to their superior ability to understand context and handle long-range dependencies.. The key innovation in transformers is the self-attention mechanism, allowing the model to weigh the importance of different words in a sentence, regardless of their positional distance. This ability to capture both short and long-range dependencies makes them incredibly powerful. Models like GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers) are prominent examples of transformer based language models. These models are also termed as large language models.
For transformers, the Hugging Face’s Transformers library is a popular choice. It provides pre-trained models like GPT and BERT, which can be fine-tuned for specific tasks. The Python code given below uses the Hugging Face transformers library to generate text using a pre-trained GPT-2 language model.
from transformers import GPT2LMHeadModel, GPT2Tokenizer
# Load pre-trained model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
# Encode text inputs
inputs = tokenizer("AI has taken the world by ", return_tensors="pt")
# Generate text
outputs = model.generate(inputs['input_ids'], max_length=20)
# Decode and print the output text
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
The following are some of the key steps done in the above code:
When the above code is executed, the following gets printed: AI has taken the world by a storm. It has been a long time since I have
The following are examples of real-world applications where transformers based language models can be used:
Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…
Have you ever wondered how systems determine whether to grant or deny access, and how…
What revolutionary technologies and industries will define the future of business in 2025? As we…
For data scientists and machine learning researchers, 2024 has been a landmark year in AI…