Category Archives: NLP

Generative AI: Scaling Techniques for LLM Models

Scaling techniques for foundational LLMs

In the rapidly evolving world of artificial intelligence, large language models (LLMs) have emerged as a game-changing force, revolutionizing the way we interact with technology and transforming countless industries. These powerful models can perform a vast array of tasks, from text generation and translation to question-answering and summarization. However, unlocking the full potential of these LLMs requires a deep understanding of how to effectively scale these LLMs, ensuring optimal performance and capabilities. In this blog post, we will delve into the crucial concept of scaling techniques for LLM models and explore why mastering this aspect is essential for anyone working in the AI domain. As the complexity and size of …

Continue reading

Posted in AI, Deep Learning, Generative AI, Machine Learning, NLP.

Sequence to Sequence Models: Types, Examples

sequence-to-sequence model

Sequence to sequence (Seq2Seq) modeling is a powerful machine learning technique that has revolutionized the way we do natural language processing (NLP). It allows us to process input sequences of varying lengths and produce output sequences of varying lengths, making it particularly useful for tasks such as language translation, speech recognition, and chatbot development.  Sequence to sequence modeling also provides a great foundation for creating text summarizers, question answering systems, sentiment analysis systems, and more. With its wide range of applications, learning about sequence to sequence modeling concepts is essential for anyone who wants to work in the field of natural language processing. This blog post will discuss types of …

Continue reading

Posted in Data Science, Machine Learning, NLP. Tagged with , , , .

Sentiment Analysis & Machine Learning Techniques

sentiment analysis machine learning

Artificial intelligence (AI) / Machine learning (ML) techniques are getting more and more popular. Many people use machine learning to analyze the sentiment of tweets, for example, to make predictions related to different business areas. In this blog post, you will learn about different machine learning / deep learning and NLP techniques which can be used for sentiment analysis. What is sentiment analysis? Sentiment analysis is about predicting the sentiment of a piece of text and then using this information to understand users’ (such as customers) opinions. . The principal objective of sentiment analysis is to classify the polarity of textual data, whether it is positive, negative, or neutral. Whether …

Continue reading

Posted in AI, Deep Learning, Machine Learning, NLP. Tagged with , , .

Spacy Tokenization Python Example

Spacy Tokenizer Python Example

In this post, you will quickly learn about how to use Spacy for reading and tokenising a document read from text file or otherwise. As a data scientist starting on NLP, this is one of those first code which you will be writing to read the text using spaCy. First and foremost, make sure you have got set up with Spacy, and, loaded English tokenizer. The following commands help you set up in Jupyter notebook. Reading text using spaCy: Once you are set up with Spacy and loaded English tokenizer, the following code can be used to read the text from the text file and tokenize the text into words. Pay attention …

Continue reading

Posted in Data Science, NLP. Tagged with , .

NLTK – How to Read & Process Text File

In this post, you will learn about the how to read one or more text files using NLTK and process words contained in the text file. As data scientists starting to work on NLP, the Python code sample for reading multiple text files from local storage will be very helpful.  Python Code Sample for Reading Text File using NLTK Here is the Python code sample for reading one or more text files. Pay attention to some of the following aspects: Class nltk.corpus.PlaintextCorpusReader reader is used for reading the text file. The constructor takes input parameter such as corpus root and the regular expression representing the files. List of files that are read could be found using method such as fileids List …

Continue reading

Posted in AI, NLP. Tagged with , .

Python – Extract Text from HTML using BeautifulSoup

Extracting Text from HTML Pages

In this post, you will learn about how to use Python BeautifulSoup and NLTK to extract words from HTML pages and perform text analysis such as frequency distribution. The example in this post is based on reading HTML pages directly from the website and performing text analysis. However, you could also download the web pages and then perform text analysis by loading pages from local storage. Python Code for Extracting Text from HTML Pages Here is the Python code for extracting text from HTML pages and perform text analysis. Pay attention to some of the following in the code given below: URLLib request is used to read the html page …

Continue reading

Posted in AI, Data Science, NLP, Python. Tagged with , , .

Python – Extract Text from PDF file using PDFMiner

In this post, you will get a quick code sample on how to use PDFMiner, a Python library, to extract text from PDF files and perform text analysis. I will be posting several other posts in relation to how to use other Python libraries for extracting text from PDF files.  In this post, the following topic will get covered: How to set up PDFMiner Python code for extracting text from PDF file using PDFMiner Setting up PDFMiner Here is how you would set up PDFMiner.six. You could execute the following command to get set up with PDFMiner while working in Jupyter notebook: Python Code for Extracting Text from PDF file …

Continue reading

Posted in AI, NLP, Python. Tagged with , , .

NLTK Hello World Python Example

In this post, you will learn about getting started with natural language processing (NLP) with NLTK (Natural Language Toolkit), a platform to work with human languages using Python language. The post is titled hello world because it helps you get started with NLTK while also learning some important aspects of processing language. In this post, the following will be covered: Install / Set up NLTK Common NLTK commands for language processing operations Install / Set up NLTK This is what you need to do set up NLTK. Make sure you have Python latest version set up as NLTK requires Python version 3.5, 3.6, 3.7, or 3.8 to be set up. In Jupyter notebook, you could execute …

Continue reading

Posted in AI, NLP. Tagged with , , .

N-Gram Language Models Explained with Examples

Ngram language model explained with examples

Language models are models which assign probabilities to a sentence or a sequence of words or, probability of an upcoming word given previous set of words. Language models are used in fields such as speech recognition, spelling correction, machine translation etc. Language models are primarily of two kinds: N-Gram language models Grammar-based language models such as probabilistic context-free grammars (PCFGs) In this post, you will learn about some of the following: Introduction to Language Models N-Grams language models Introduction to Language Models Language models, as mentioned above, is used to determine the probability of occurrence of a sentence or a sequence of words. Language models are created based on following …

Continue reading

Posted in AI, NLP. Tagged with .

Quick Introduction to Smoothing Techniques for Language Models

smoothing techniques NLP

Smoothing techniques in NLP are used to address scenarios related to determining probability / likelihood estimate of a sequence of words (say, a sentence) occuring together when one or more words individually (unigram) or N-grams such as bigram([latex]w_{i}[/latex]/[latex]w_{i-1}[/latex]) or trigram ([latex]w_{i}[/latex]/[latex]w_{i-1}w_{i-2}[/latex]) in the given set have never occured in the past. In this post, you will go through a quick introduction to various different smoothing techniques used in NLP in addition to related formulas and examples. The following is the list of some of the smoothing techniques: Laplace smoothing: Another name for Laplace smoothing technique is add one smoothing. Additive smoothing Good-turing smoothing Kneser-Ney smoothing Katz smoothing Church and Gale Smoothing …

Continue reading

Posted in AI, NLP. Tagged with , .