In this post, you will learn about getting started with natural language processing (NLP) with NLTK (Natural Language Toolkit), a platform to work with human languages using Python language. The post is titled hello world because it helps you get started with NLTK while also learning some important aspects of processing language. In this post, the following will be covered:
This is what you need to do set up NLTK.
# Pip install
#
pip install nltk
#
# Import NLTK
#
import nltk
You could get started with practicing NLTK commands by downloading the book collection comprising of several books. Here is what you need to execute:
#
# NLTK Book Download
#
nltk.download()
Executing above command will open up a utility where you could select book and download. Here is how it looks like:
Select the book and click download. Once the download is complete, you could execute the following command to load the book.
#
# Load the books
#
from nltk.book import *
This is how it would look like by executing the above command.
Here are some of the common NLTK commands vis-a-vis their utility:
import nltk
#
# Sentence
#
intro = 'My name is Ajitesh Shukla. I work in HighRadius. I live in Hyderabad.'
#
# Tokenize using word_tokenize method
#
tokens = nltk.word_tokenize(intro)
#
#
print(tokens)
#
#
print(set(tokens))
Here is how the output would look like:
We will try and understand with one of the text (text7 – Wall Street Journal) loaded from nltk.book. In the example below, common_contexts output is to_the and to_their. This implies that to_the and to_their occurred around both the words, finance and improve. If the output of common_contexts would have been null / empty, the output of method similar would also have been null / empty.
import nltk
#
# Sentence
#
intro = 'My name is Ajitesh Shukla. I write blogs on Vitalflux.com. I live in Hyderabad. I love writing blogs. I also have good expertise in cloud computing. I am also good in AWS.'
#
# Tokenize using word_tokenize method
#
tokens = nltk.word_tokenize(intro)
#
# Create an instance of FreqDist
#
freqdist = FreqDist(tokens)
#
# Draw the frequency distribution of tokens
#
freqdist.plot()
Here is how the output plot would look like:
import nltk
#
# Sentence
#
intro = 'My name is Ajitesh Shukla. I write blogs on Vitalflux.com. I live in Hyderabad. I love writing blogs. I also have good expertise in cloud computing. I am also good in AWS.'
#
# Tokenize using word_tokenize method
#
tokens = nltk.word_tokenize(intro)
#
# Condition to filter words meeting criteria
#
long_words = [words for words in tokens if len(words) > 5]
This is what will be printed
Here is the sumary of what you learned in this post related to NLTK set up and some common methods:
In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…
Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…
With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…
Anxiety is a common mental health condition that affects millions of people around the world.…
In machine learning, confounder features or variables can significantly affect the accuracy and validity of…
Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…