Category Archives: NLP

NLP Pre-trained Models: Concepts, Examples

NLP pretrained models

The NLP (Natural Language Processing) is a branch of AI with the goal to make machines capable of understanding and producing human language. NLP has been around for decades, but it has recently seen an explosion in popularity due to pre-trained models (PTMs) which can be implemented with minimal effort and time on the side of NLP developers. This blog post will introduce you to different types of pre-trained machine learning models for NLP and discuss their usage in real-world examples. Before we get into looking at different types of pre-trained models for NLP, let’s understand the concepts related to pre-trained models for NLP. What are pre-trained models for NLP …

Continue reading

Posted in Deep Learning, NLP. Tagged with , .

Demystifying Encoder Decoder Architecture & Neural Network

encoder decoder architecture

In the field of AI / machine learning, the encoder-decoder architecture is a widely-used framework for developing neural networks that can perform natural language processing (NLP) tasks such as language translation, etc which requires sequence to sequence modeling. This architecture involves a two-stage process where the input data is first encoded into a fixed-length numerical representation, which is then decoded to produce an output that matches the desired format. As a data scientist, understanding the encoder-decoder architecture and its underlying neural network principles is crucial for building sophisticated models that can handle complex data sets. By leveraging encoder-decoder neural network architecture, data scientists can design neural networks that can learn …

Continue reading

Posted in Deep Learning, Generative AI, Machine Learning, NLP. Tagged with , , , .

NLP: Huggingface Transformers Code Examples

Do you want to build cutting-edge NLP models? Have you heard of Huggingface Transformers? Huggingface Transformers is a popular open-source library for NLP, which provides pre-trained machine learning models and tools to build custom NLP models. These models are based on Transformers architecture, which has revolutionized the field of NLP by enabling state-of-the-art performance on a range of NLP tasks. In this blog post, I will provide Python code examples for using Huggingface Transformers for various NLP tasks such as text classification (sentiment analysis), named entity recognition, question answering, text summarization, and text generation. I used Google Colab for testing my code. Before getting started, get set up with transformers …

Continue reading

Posted in Generative AI, Machine Learning, NLP. Tagged with , , .

Generative AI: Scaling Techniques for LLM Models

Scaling techniques for foundational LLMs

In the rapidly evolving world of artificial intelligence, large language models (LLMs) have emerged as a game-changing force, revolutionizing the way we interact with technology and transforming countless industries. These powerful models can perform a vast array of tasks, from text generation and translation to question-answering and summarization. However, unlocking the full potential of these LLMs requires a deep understanding of how to effectively scale these LLMs, ensuring optimal performance and capabilities. In this blog post, we will delve into the crucial concept of scaling techniques for LLM models and explore why mastering this aspect is essential for anyone working in the AI domain. As the complexity and size of …

Continue reading

Posted in AI, Deep Learning, Generative AI, Machine Learning, NLP.

Sequence to Sequence Models: Types, Examples

sequence-to-sequence model

Sequence to sequence (Seq2Seq) modeling is a powerful machine learning technique that has revolutionized the way we do natural language processing (NLP). It allows us to process input sequences of varying lengths and produce output sequences of varying lengths, making it particularly useful for tasks such as language translation, speech recognition, and chatbot development.  Sequence to sequence modeling also provides a great foundation for creating text summarizers, question answering systems, sentiment analysis systems, and more. With its wide range of applications, learning about sequence to sequence modeling concepts is essential for anyone who wants to work in the field of natural language processing. This blog post will discuss types of …

Continue reading

Posted in Data Science, Machine Learning, NLP. Tagged with , , , .

Natural Language Processing (NLP) Task Examples

natural language processing tasks examples

Have you ever wondered how your phone’s voice assistant understands your commands and responds appropriately? Or how search engines are able to provide relevant results for your queries? The answer lies in Natural Language Processing (NLP), a subfield of artificial intelligence (AI) that focuses on enabling machines to understand and process human language.  NLP is becoming increasingly important in today’s world as more and more businesses are adopting AI-powered solutions to improve customer experiences, automate manual tasks, and gain insights from large volumes of textual data. With recent advancements in AI technology, it is now possible to use pre-trained language models such as ChatGPT to perform various NLP tasks with …

Continue reading

Posted in Data Science, NLP. Tagged with , .

Sentiment Analysis & Machine Learning Techniques

sentiment analysis machine learning

Artificial intelligence (AI) / Machine learning (ML) techniques are getting more and more popular. Many people use machine learning to analyze the sentiment of tweets, for example, to make predictions related to different business areas. In this blog post, you will learn about different machine learning / deep learning and NLP techniques which can be used for sentiment analysis. What is sentiment analysis? Sentiment analysis is about predicting the sentiment of a piece of text and then using this information to understand users’ (such as customers) opinions. . The principal objective of sentiment analysis is to classify the polarity of textual data, whether it is positive, negative, or neutral. Whether …

Continue reading

Posted in AI, Deep Learning, Machine Learning, NLP. Tagged with , , .

Spacy Tokenization Python Example

Spacy Tokenizer Python Example

In this post, you will quickly learn about how to use Spacy for reading and tokenising a document read from text file or otherwise. As a data scientist starting on NLP, this is one of those first code which you will be writing to read the text using spaCy. First and foremost, make sure you have got set up with Spacy, and, loaded English tokenizer. The following commands help you set up in Jupyter notebook. Reading text using spaCy: Once you are set up with Spacy and loaded English tokenizer, the following code can be used to read the text from the text file and tokenize the text into words. Pay attention …

Continue reading

Posted in Data Science, NLP. Tagged with , .

NLTK – How to Read & Process Text File

In this post, you will learn about the how to read one or more text files using NLTK and process words contained in the text file. As data scientists starting to work on NLP, the Python code sample for reading multiple text files from local storage will be very helpful.  Python Code Sample for Reading Text File using NLTK Here is the Python code sample for reading one or more text files. Pay attention to some of the following aspects: Class nltk.corpus.PlaintextCorpusReader reader is used for reading the text file. The constructor takes input parameter such as corpus root and the regular expression representing the files. List of files that are read could be found using method such as fileids List …

Continue reading

Posted in AI, NLP. Tagged with , .

Python – Extract Text from HTML using BeautifulSoup

Extracting Text from HTML Pages

In this post, you will learn about how to use Python BeautifulSoup and NLTK to extract words from HTML pages and perform text analysis such as frequency distribution. The example in this post is based on reading HTML pages directly from the website and performing text analysis. However, you could also download the web pages and then perform text analysis by loading pages from local storage. Python Code for Extracting Text from HTML Pages Here is the Python code for extracting text from HTML pages and perform text analysis. Pay attention to some of the following in the code given below: URLLib request is used to read the html page …

Continue reading

Posted in AI, Data Science, NLP, Python. Tagged with , , .

Python – Extract Text from PDF file using PDFMiner

In this post, you will get a quick code sample on how to use PDFMiner, a Python library, to extract text from PDF files and perform text analysis. I will be posting several other posts in relation to how to use other Python libraries for extracting text from PDF files.  In this post, the following topic will get covered: How to set up PDFMiner Python code for extracting text from PDF file using PDFMiner Setting up PDFMiner Here is how you would set up PDFMiner.six. You could execute the following command to get set up with PDFMiner while working in Jupyter notebook: Python Code for Extracting Text from PDF file …

Continue reading

Posted in AI, NLP, Python. Tagged with , , .

NLTK Hello World Python Example

In this post, you will learn about getting started with natural language processing (NLP) with NLTK (Natural Language Toolkit), a platform to work with human languages using Python language. The post is titled hello world because it helps you get started with NLTK while also learning some important aspects of processing language. In this post, the following will be covered: Install / Set up NLTK Common NLTK commands for language processing operations Install / Set up NLTK This is what you need to do set up NLTK. Make sure you have Python latest version set up as NLTK requires Python version 3.5, 3.6, 3.7, or 3.8 to be set up. In Jupyter notebook, you could execute …

Continue reading

Posted in AI, NLP. Tagged with , , .

N-Gram Language Models Explained with Examples

Ngram language model explained with examples

Language models are models which assign probabilities to a sentence or a sequence of words or, probability of an upcoming word given previous set of words. Language models are used in fields such as speech recognition, spelling correction, machine translation etc. Language models are primarily of two kinds: N-Gram language models Grammar-based language models such as probabilistic context-free grammars (PCFGs) In this post, you will learn about some of the following: Introduction to Language Models N-Grams language models Introduction to Language Models Language models, as mentioned above, is used to determine the probability of occurrence of a sentence or a sequence of words. Language models are created based on following …

Continue reading

Posted in AI, NLP. Tagged with .

Quick Introduction to Smoothing Techniques for Language Models

smoothing techniques NLP

Smoothing techniques in NLP are used to address scenarios related to determining probability / likelihood estimate of a sequence of words (say, a sentence) occuring together when one or more words individually (unigram) or N-grams such as bigram([latex]w_{i}[/latex]/[latex]w_{i-1}[/latex]) or trigram ([latex]w_{i}[/latex]/[latex]w_{i-1}w_{i-2}[/latex]) in the given set have never occured in the past. In this post, you will go through a quick introduction to various different smoothing techniques used in NLP in addition to related formulas and examples. The following is the list of some of the smoothing techniques: Laplace smoothing: Another name for Laplace smoothing technique is add one smoothing. Additive smoothing Good-turing smoothing Kneser-Ney smoothing Katz smoothing Church and Gale Smoothing …

Continue reading

Posted in AI, NLP. Tagged with , .