Categories: Big Data

Learn R – How to Fix Read.Table Command Reading Lesser Rows

This article represents the problem statement related with read.table reading fewer or incorrect or lesser number of lines or rows when reading a text file having multiple columns, and the solution to the same. This is going to be a shorter blog. But since it solved a problem on which I spent some time, I chose to write about the same. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.
Problem Statement: Reading Fewer Lines with read.table Command

I have been learning the naive bayes classification. I downloaded this SMS collection data. I went ahead and tried to load the data using following command. And, it listed around 1630 rows, although there were 5574 rows.

messages <- read.table( file.choose(), sep="\t", stringsAsFactors=FALSE)

I check with commands such as dim(messages) and it gave me 1630 messages with 2 columns. This is lesser (and thus, incorrect) than what existed in the document.

Solution to getting exact number of rows

After investigation, I found that the messages consisted of single/double quotes and this needed to be disabled for read.table to read correct number of rows. I did the same with following command and it worked pretty well. Note the usage quote=” parameter.

messages <- read.table( file.choose(), sep="\t", stringsAsFactors=FALSE, quote='')

 

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog, Revive-n-Thrive.com

Recent Posts

Mean Squared Error vs Cross Entropy Loss Function

Last updated: 28th April, 2024 As a data scientist, understanding the nuances of various cost…

2 days ago

Cross Entropy Loss Explained with Python Examples

Last updated: 28th April, 2024 In this post, you will learn the concepts related to…

2 days ago

Logistic Regression in Machine Learning: Python Example

Last updated: 26th April, 2024 In this blog post, we will discuss the logistic regression…

3 days ago

MSE vs RMSE vs MAE vs MAPE vs R-Squared: When to Use?

Last updated: 22nd April, 2024 As data scientists, we navigate a sea of metrics to…

5 days ago

Gradient Descent in Machine Learning: Python Examples

Last updated: 22nd April, 2024 This post will teach you about the gradient descent algorithm…

1 week ago

Loss Function vs Cost Function vs Objective Function: Examples

Last updated: 19th April, 2024 Among the terminologies used in training machine learning models, the…

1 week ago