NLP Pre-trained Models Explained with Examples

NLP pretrained models

The NLP (Natural Language Processing) is a branch of AI with the goal to make machines capable of understanding and producing human language. NLP has been around for decades, but it has recently seen an explosion in popularity due to pre-trained models (PTMs) which can be implemented with minimal effort and time on the side of NLP developers. This blog post will introduce you to different types of pre-trained machine learning models for NLP and discuss their usage in real-world examples.

Before we get into looking at different types of pre-trained models for NLP, let’s understand the concepts related to pre-trained models for NLP.

What are pre-trained models for NLP?

Pre-trained models (PTMs) for NLP are deep learning models (such as transformers) which are trained on a large dataset to perform specific NLP tasks. PTMs when trained on the large corpus can learn universal language representations, which can be beneficial for downstream NLP tasks and can avoid training a new model from scratch. That way, pre-trained models can be termed reusable NLP models which NLP developers can use to quickly build an NLP application. Transformers provides a suite of pre-trained deep learning NLP models across different NLP tasks such as text classification, question answering, machine translation, etc. These pre-trained NLP tasks are available for free, with no NLP knowledge required to use them. The first generation Pre-trained models were trained to learn good word embeddings. However, the latest or 2nd generation PTM is trained to learn contextual word embeddings. Details on different types of pre-trained models will be taken up in my next blogs.

Pre-trained models can be easily loaded into NLP libraries such as PyTorch, Tensorflow, etc, and used for performing NLP tasks with almost no extra effort required from NLP developers. Pre-trained models are getting used more and more often on NLP tasks due to the fact that they are easier to implement, have high accuracy, and require less training time compared to custom-built models.

What are some real-world NLP examples where pre-trained models are used?

Some real-world and most popular examples where pre-trained NLP models are getting used are the following:

  • Named Entity Recognition (NER) is an NLP task where the model tries to identify the type of every word/phrase which appears in the input text. For example, given a sentence like “Chris Cairns was born on August 14th, 1980”, NER should recognize that “Chris Cairns” as a person’s name, “August 14th” as the date and “1980” as the year. NER models are getting used in many scenarios like spam detection, customer support & chatbots, etc. There are several examples of pre-trained NER models provided by popular open-source NLP libraries such as NLTK, Spacy, Stanford CoreNLP , BERT etc. These models can be loaded with Tensorflow or PyTorch and executed for NER tasks.
  • Sentiment Analysis is an NLP task where a model tries to identify if the given text has positive, negative, or neutral sentiment. Sentiment analysis can be used in many real-world scenarios like customer support chatbots and spam detection. Pre-trained NLP models for sentiment analysis are provided by open-source NLP libraries such as BERT, NTLK, Spacy, and Stanford NLP.
  • Machine Translation is an NLP task where a model tries to translate sentences from one language into another. NER models are often used as part of the machine translation pipeline for pre-processing input text before sending it over to the Translate Model which performs actual sentence translations using Neural Machine Translation (NMT) models. NER and NMT are often combined together to achieve better results than NER or NMT alone, as NER pre-processes the input text by removing stopwords and other unimportant words which do not contribute much towards an understanding of the sentence. And then NMT receives a clean version of each sentence in both source and target languages. NER and NMT models are pre-trained by popular open-source NLP libraries such as OpenNMT, BERT-NMT, etc.
  • Text Summarization is an NLP task where a model tries to summarize the input text into a shorter version in an efficient way that preserves all important information from the input text. NER, NMT, and Sentiment Analysis models are often used as part of the pipeline for pre-processing input text before sending it over to a summarization model. Some popular open-source NLP libraries like Stanford CoreNLP offer these NLP pipelines consisting of NER, NMT or sentiment analysis, and summarization NLP models. Transformers and NLP libraries such as BERT, GPT, etc could be used for text summarization.
  • Natural Language Generation is an NLP task where the model tries to generate natural language sentences from input data or information given by NLP developers. Pre-trained NLP models for NLG are getting used to generate personalized content like emails, social media posts, etc. One does not need to write the entire code of generating sentences from data/information because pre-trained NLP models can be easily implemented with less effort and time compared to custom-built NLP models.
  • Speech Recognition is an NLP task where a model tries to identify what the user is saying. NLP pre-trained models for speech recognition are getting used in many NLP libraries/APIs which are available online like Amazon Alexa, Google API, etc. Speech recognition can be implemented with high accuracy on NLP tasks by using NLP pre-trained models on NLP APIs from different companies/developers.
  • Content Moderation is an NLP task where a model tries to identify the content which might be inappropriate (offensive/explicit), or should not be shown on public channels like social media posts, comments, etc. NLP pre-trained models for content moderation are getting used in NLP APIs like Clarifai API, Google Cloud NLP API, Microsoft Azure Cognitive Services Text Analytics API, etc.
  • Automated Question Answering Systems (QA): Automated QA systems try to answer user-defined questions automatically by looking at the input text. NER is one of the key components in such systems because it allows QA systems to identify what type of question they need to answer and extract the relevant information from the input text for answering that specific question.

What are different services/libraries which provide NLP pre-trained models?

There are several open-source libraries/cloud services that provide pre-trained models accessible for NLP, each tailored to a certain type of NLP task. Some of the most popular ones are listed below:

  • Google BERT: BERT stands for Bidirectional Encoder Representations from Transformers and it is a state-of-the-art machine learning model used for NLP tasks. Jacob Devlin and his colleagues developed BERT at Google in 2018. It was made open source in March 2019, as part of the TensorFlow project to make it easier for developers and data scientists to build AI models using existing state-of-the-art algorithms like BERT. BERT has been trained on NLP tasks like NER, sentence segmentation, part-of-speech tagging etc.
  • CodeBERT: NLP engineers at Microsoft have published their NLP pre-trained model, CodeBERT, on GitHub. CodeBERT is a bimodal pre-trained model for programming languages (PL) and natural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language code search, code documentation generation, etc. CodeBERT is developed with transformer-based neural architecture, and trained with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators. Here is the page for detailed read
  • Huggingface transformers: Huggingface provides pipeline APIs for grouping together different pre-trained models for different NLP tasks. Check out supported transformers on the github page including BERT, RoBERTa, GPT-2, XLNet, BlenderBot etc.
  • OpenNMT: OpenNMT is an open-source ecosystem for neural machine translation and neural sequence learning. Started in December 2016 by the Harvard NLP group and SYSTRAN, the project has since been used in several research projects and industrial applications. It is currently maintained by SYSTRAN and Ubiqus. NER models are often used as part of the machine translation pipeline for pre-processing input text before sending it over to the Translate Model which performs actual sentence translations using Neural Machine Translation (NMT) models.
  • Facebook RoBERTa: NLP engineers at Facebook have published their NLP pre-trained model, RoBERTa, on GitHub. RoBERTa has been used in NLP applications like Facebook Messenger, NLP API etc. RoBERTa improves upon Bidirectional Encoder Representations from Transformers, or BERT, the self-supervised method released by Google in 2018. RoBERTa builds on BERT’s language masking strategy, wherein the system learns to predict intentionally hidden sections of text within otherwise unannotated language examples. 
  • ELMo: ELMo stands for “Embeddings from Language Models”. NLP pre-trained model is developed at Allen AI research center by NLP scientists. It was made open source in March 2019, as part of the TensorFlow project to make it easier for developers and data scientists to build AI models using existing state-of-the-art algorithms like ELMo. ELMo is a deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). These word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. They can be easily added to existing models
  • GPT-3: GPT-3 is an autoregressive language model that uses deep learning to produce human-like text. It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San Francisco-based artificial intelligence research laboratory.
  • XLNet: XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally, XLNet employs Transformer-XL as the backbone model, exhibiting excellent performance for language tasks involving long context. 
  • ULMFit: ULMFit enables robust inductive transfer learning for any NLP tasks, akin to fine-tuning ImageNet models: The same 3-layer LSTM architecture— with the same hyperparameters and no additions other than tuned dropout hyperparameters.

NLP pre-trained models are useful for NLP tasks like translating text, predicting missing parts of a sentence or even generating new sentences. NLP pre-trained models can be used in many NLP applications like such as chatbots and NLP API etc. There are many types of pre-trained models that you could use to get started with NER, text summarization, NMT (Neural Machine Translation) or NLG (Natural Language Generation), depending on your project needs.  These include CodeBERT, OpenNMT, RoBERTa, GPT-3 etc. Greater details regarding each type of pre-trained model and libraries will be posted in the near future.

Ajitesh Kumar
Follow me
Latest posts by Ajitesh Kumar (see all)

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking
Posted in Deep Learning, NLP. Tagged with , .

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.