Data Science – Common Exploratory R Commands for Classification Problems

0
This article represents common exploratory R commands that could used during the stage of data preparation when solving classification problems. I found them being used when I have been going through KNN or naive Bayes algorithms. I know that there may be more to the list below. I would love to hear those additional commands from you. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

 

In the set of commands listed below, a data frame, message_text, is used which is a set of text data, loaded using read.table command such as following:

messages_text <- read.table( file.choose(), sep="\t", stringsAsFactors=FALSE)
Common Exploratory R Commands for Data Preparation Stage

Using the commands listed below, following is achieved:

  • Seeing the summary of loaded data using str command
  • Changing the names of columns to desired names
  • Converting target feature from character vector to factor
  • Analyzing the percentage occurrence of different categories
# Find the summary information about the data frame loaded using command such as
# read.csv, read.table etc.
str(messages_text)

# Change the name of the columns to desired names; At times, during loading, the text file 
# could start straight away with the data. And, when that happens, the features are names as V1, V2 etc. 
# Thus, it may be good idea to name the features appropriately.
names(messages_text) <- c( "type", "text")

# as.factor command is frequenctly used to derive the categorical features as factor. When loaded, 
# this variable is loaded as character vector. 
messages_text$type <- as.factor(messages_text$type)

# table command when used on variable of class, factor, gives number of occurences of 
# different categories
table(messages_text$type)

# prop.table command when used on categorical variable (of class, factor) gives the percentage occurences of
# different categories
prop.table(table(messages_text$type))*100

# round command with prop.table gives the percentage occurence of categorical variable, 
# rounded by number of digits specified in the command
round(prop.table(table(messages_text$type))*100, digits=2)
Ajitesh Kumar

Ajitesh Kumar

Ajitesh has been recently working in the area of AI and machine learning. Currently, his research area includes Safe & Quality AI. In addition, he is also passionate about various different technologies including programming languages such as Java/JEE, Javascript and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc.

He has also authored the book, Building Web Apps with Spring 5 and Angular.
Ajitesh Kumar

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.