Lets load a set of messages along with appropriate classification using following command.
messages <- read.table( file.choose(), sep="\t", stringsAsFactors=FALSE)
The messages data frame could have two features, such as type and text where each piece of text is associated with an appropriate type. Once done, lets go ahead and create a Corpus object out of all the message text. Following command helps to create a Corpus object. For those of you who are new to Corpus object, note that it comes as part of famous text mining R package named as “tm”.
corpus <- Corpus(VectorSource(messages$text))
Once Corpus object is created it is time to clean the text.
Following is cleaned as part of text cleaning activity:
Following is command set that achieves above objectives:
# Change all the words to lowercase
corpus_clean <- tm_map(corpus, content_transformer(tolower))
# Remove all the numbers
corpus_clean <- tm_map(corpus_clean, removeNumbers)
# Remove the stop words such as to, and, or etc.
corpus_clean <- tm_map(corpus_clean, removeWords, stopwords())
# Remove punctuation
corpus_clean <- tm_map(corpus_clean, removePunctuation)
# Remove whitespaces
corpus_clean <- tm_map(corpus_clean, stripWhitespace)
Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…
Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…
Software-as-a-Service (SaaS) providers have long relied on traditional chatbot solutions like AWS Lex and Google…
Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large…
The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated…
Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…