Categories: Big Data

Data Science – R Packages & Methods for naive Bayes Classification

This article represents different R packages and related methods which could be used to create a naive Bayes classifier. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.
Following are the key packages described later in this article:
  • TM
  • WordCloud
  • e1071
  • Gmodels

 

Following is a list of R packages that could be used for naive Bayes classification:

  • TM Package: Originally created by Ingo Feinerer as a dissertation project at the Vienna University of Economics and Business, tm package is a very popular package that provides a framework for text mining applications within R. More about TM package could be found on http://tm.r-forge.r-project.org/ Following are some of the widely used methods in tm package:
    • Corpus() method is used to create R object that stores the text.
    • Methods representing datasources such as VectorSource() that takes vector of text as a parameter. A Corpus object could be created using a vector of messages or text using command such as Corpus(VectorSource(messages$text)). Here “messages” is a data frame consisting of a feature “text” which represent messages.
    • tm_map method: This method is used to clean the corpus object by removing numbers, stopwords, punctuations, whitespaces and, changing to lower case etc.
    • inspect() method to look at the corpus object created using Corpus() method.
    • DocumentTermMatrix() that takes Corpus object as argument and returns a sparse matrix. Each column/feature in this matrix represents words that appeared in the corpus. Each row represents a document. A particular row with values would represent one or more words count that appeared in a specific document.
    • findFreqTerms() method returns a character vector consisting of number of words. It takes DocumentTermMatrix object as argument and minimum number of messages in which word must appear.
  • WordCloud Package: Following method helps to visualize tag could of a Corpus object. More about wordcloud could be found on http://cran.r-project.org/web/packages/wordcloud/index.html
    • wordcloud() method takes argument such as Corpus object, min.freq (word to appear in minimum messages), max.words (most commonly found words count), random.order, scale etc.
  • e1071 Package: Developed at the statistics department at the Vienna University of Technology (TU Wien), e1071 package provides “naiveBayes” and “predict” method which could be used to create naiveBayes classifier and predict. Following represents these methods:
    # Takes argument as DocumentTermMatrix object and factor object representing the classification of each instance/row in DTM object. 
    msg_classifier <- naiveBayes(messages_dtm_train, messages_train$type)
    
    # Takes argument as classifier and DocumentTermMatrix object that needs to be predicted
    msg_test_pred <- predict(msg_classifier, messages_dtm_test)
    
  • gmodels Package: This package helps to evaluate the naiveBayes classifier model performance. The function which could be used to evaluate the model is following:
    • CrossTable() function is basically used to compare predicted value with actual value.
Latest posts by Ajitesh Kumar (see all)
Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

What are AI Agents? How do they work?

Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…

2 weeks ago

Agentic AI Design Patterns Examples

In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…

2 weeks ago

List of Agentic AI Resources, Papers, Courses

In this blog, I aim to provide a comprehensive list of valuable resources for learning…

2 weeks ago

Understanding FAR, FRR, and EER in Auth Systems

Have you ever wondered how systems determine whether to grant or deny access, and how…

3 weeks ago

Top 10 Gartner Technology Trends for 2025

What revolutionary technologies and industries will define the future of business in 2025? As we…

3 weeks ago

OpenAI GPT Models in 2024: What’s in it for Data Scientists

For data scientists and machine learning researchers, 2024 has been a landmark year in AI…

3 weeks ago