Categories: Big Data

Data Science – R Packages & Methods for naive Bayes Classification

This article represents different R packages and related methods which could be used to create a naive Bayes classifier. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.
Following are the key packages described later in this article:
  • TM
  • WordCloud
  • e1071
  • Gmodels

 

Following is a list of R packages that could be used for naive Bayes classification:

  • TM Package: Originally created by Ingo Feinerer as a dissertation project at the Vienna University of Economics and Business, tm package is a very popular package that provides a framework for text mining applications within R. More about TM package could be found on http://tm.r-forge.r-project.org/ Following are some of the widely used methods in tm package:
    • Corpus() method is used to create R object that stores the text.
    • Methods representing datasources such as VectorSource() that takes vector of text as a parameter. A Corpus object could be created using a vector of messages or text using command such as Corpus(VectorSource(messages$text)). Here “messages” is a data frame consisting of a feature “text” which represent messages.
    • tm_map method: This method is used to clean the corpus object by removing numbers, stopwords, punctuations, whitespaces and, changing to lower case etc.
    • inspect() method to look at the corpus object created using Corpus() method.
    • DocumentTermMatrix() that takes Corpus object as argument and returns a sparse matrix. Each column/feature in this matrix represents words that appeared in the corpus. Each row represents a document. A particular row with values would represent one or more words count that appeared in a specific document.
    • findFreqTerms() method returns a character vector consisting of number of words. It takes DocumentTermMatrix object as argument and minimum number of messages in which word must appear.
  • WordCloud Package: Following method helps to visualize tag could of a Corpus object. More about wordcloud could be found on http://cran.r-project.org/web/packages/wordcloud/index.html
    • wordcloud() method takes argument such as Corpus object, min.freq (word to appear in minimum messages), max.words (most commonly found words count), random.order, scale etc.
  • e1071 Package: Developed at the statistics department at the Vienna University of Technology (TU Wien), e1071 package provides “naiveBayes” and “predict” method which could be used to create naiveBayes classifier and predict. Following represents these methods:
    # Takes argument as DocumentTermMatrix object and factor object representing the classification of each instance/row in DTM object. 
    msg_classifier <- naiveBayes(messages_dtm_train, messages_train$type)
    
    # Takes argument as classifier and DocumentTermMatrix object that needs to be predicted
    msg_test_pred <- predict(msg_classifier, messages_dtm_test)
    
  • gmodels Package: This package helps to evaluate the naiveBayes classifier model performance. The function which could be used to evaluate the model is following:
    • CrossTable() function is basically used to compare predicted value with actual value.
Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

2 months ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

3 months ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

3 months ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

3 months ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

3 months ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

3 months ago