This article represents the problem statement related with read.table reading fewer or incorrect or lesser number of lines or rows when reading a text file having multiple columns, and the solution to the same. This is going to be a shorter blog. But since it solved a problem on which I spent some time, I chose to write about the same. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.
Problem Statement: Reading Fewer Lines with read.table Command
I have been learning the naive bayes classification. I downloaded this SMS collection data. I went ahead and tried to load the data using following command. And, it listed around 1630 rows, although there were 5574 rows.
messages <- read.table( file.choose(), sep="\t", stringsAsFactors=FALSE)
I check with commands such as dim(messages) and it gave me 1630 messages with 2 columns. This is lesser (and thus, incorrect) than what existed in the document.
Solution to getting exact number of rows
After investigation, I found that the messages consisted of single/double quotes and this needed to be disabled for read.table to read correct number of rows. I did the same with following command and it worked pretty well. Note the usage quote=” parameter.
messages <- read.table( file.choose(), sep="\t", stringsAsFactors=FALSE, quote='')
Latest posts by Ajitesh Kumar (see all)
- Credit Risk Modeling & Machine Learning Use Cases - June 9, 2023
- Underwriting & Machine Learning Models Examples - June 8, 2023
- Matplotlib Bar Chart Python / Pandas Examples - June 7, 2023
Leave a Reply