Tag Archives: datascience

Data Science – R Packages & Methods for naive Bayes Classification

This article represents different R packages and related methods which could be used to create a naive Bayes classifier. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key packages described later in this article: TM WordCloud e1071 Gmodels   Following is a list of R packages that could be used for naive Bayes classification: TM Package: Originally created by Ingo Feinerer as a dissertation project at the Vienna University of Economics and Business, tm package is a very popular package that provides a framework for text mining applications within R. More about TM package could be …

Continue reading

Posted in Big Data. Tagged with .

Learn R – How to Fix Read.Table Command Reading Lesser Rows

This article represents the problem statement related with read.table reading fewer or incorrect or lesser number of lines or rows when reading a text file having multiple columns, and the solution to the same. This is going to be a shorter blog. But since it solved a problem on which I spent some time, I chose to write about the same. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Problem Statement: Reading Fewer Lines with read.table Command I have been learning the naive bayes classification. I downloaded this SMS collection data. I went ahead and tried to load …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – Data Cleaning R Commands for Text Classification Problems

This article represents concepts and related R command set used to clean the text in order to make it ready for text classification. The R command set belongs to tm package. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Lets load a set of messages along with appropriate classification using following command. messages <- read.table( file.choose(), sep=”\t”, stringsAsFactors=FALSE) The messages data frame could have two features, such as type and text where each piece of text is associated with an appropriate type. Once done, lets go ahead and create a Corpus object out of all the message text. …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – Common Exploratory R Commands for Classification Problems

This article represents common exploratory R commands that could used during the stage of data preparation when solving classification problems. I found them being used when I have been going through KNN or naive Bayes algorithms. I know that there may be more to the list below. I would love to hear those additional commands from you. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.   In the set of commands listed below, a data frame, message_text, is used which is a set of text data, loaded using read.table command such as following: messages_text <- read.table( file.choose(), sep=”\t”, …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – List of Key Machine Learning Algorithms

This article represents a list of key machine learning algorithms which are most widely used by data scientists while doing data analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. The list of machine learning algorithms presented below covers some of the most important and widely used algorithms which could set a stage for one to get started with data science/analytics and create models for predictions. Following are two high level classifications in which these machine learning algorithms fall under: Supervised learning Unsupervised learning Following are some of the key tasks that are performed by machine learning algorithms …

Continue reading

Posted in Big Data. Tagged with .

Learn R – How to Add New Column to Data Frame

This article represents concepts and code samples on how to add new columns to a data frame using R programming language. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Lets create a student data frame. Following is the code: # Create non-empty data frame with column names # Assign names to x x <- c( “Calvin”, “Chris”, “Raj”) # Assign names to y y <- c( 10, 25, 19) # Create a non-empty data frame with column names # Assign x to “First Name” as column name # Assign y to “Age” as column name student <- data.frame( …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – How to Scale or Normalize Numeric Data using R

This article represents concepts around the need to normalize or scale the numeric data and code samples in R programming language which could be used to normalize or scale the data. Please feel free to comment/suggest if I missed mentioning one or more important points. Also, sorry for the typos. Following are the two different ways which could be used to normalize the data, and thus, described later in this article: Why Normalize or Scale the data? Min-Max Normalization Z-Score Standardization Why Normalize or Scale the data? There can be instances found in data frame where values for one feature could range between 1-100 and values for other feature could …

Continue reading

Posted in AI, Big Data, Data Science. Tagged with , .

Learn R – How to Append Rows to Data Frame

This article represents concepts and code samples on how to append rows to a data frame when working with R programming language. Please feel free to comment/suggest if I missed mentioning one or more important points. Also, sorry for the typos. Following are the key points described later in this article: How to append one or more rows to an empty data frame How to append one or more rows to non-empty data frame For illustration purpose, we shall use a student data frame having following information: How to Append one or more rows to an Empty Data Frame Following code represents how to create an empty data frame and …

Continue reading

Posted in Big Data. Tagged with , .

Learn R – How to Create Data Frame with Column Names

This article represents code in R programming language which could be used to create a data frame with column names. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Create empty dataframe with column names Create non-empty dataframe with column names Create an Empty Dataframe with Column Names Following is the code sample: Following gets printed:   Create non-empty Dataframe with Column Names Following is the code sample: Following gets printed. Note the column names such as “First Name” and “Age”  

Posted in Big Data. Tagged with , .

Data Science – How to Load Data included with R

This article represents different ways in which data from different R packages could be loaded. One of the important aspect of getting on aboard with Data Science is to play with data as much as possible while one is going through the  learning phase. When doing that, some of the key activities include data loading, data extraction, data wrangling/munging etc. This is where I found that loading data from different R packages is one of the key to get access to these data sets and hence, decided to write this quick article. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for …

Continue reading

Posted in Big Data. Tagged with .

Data Science – Quick Start Guide for Machine Learning

machine learning

This article represents a very high-level information on different aspects of machine learning with an objective to present a quick-start read/guide for the data science beginners. One could grab one or more books on Machine Learning to learn the subject in detail. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: What is machine learning? Key phases of machine learning Prediction API model of machine learning   What is Machine Learning? Simply speaking, Machine Learning is a set of artifical intelligence techniques which are used to solve one of …

Continue reading

Posted in Big Data. Tagged with , , .

Data Science – Top 5 Videos to Learn Bayes’ Theorum

This article represents the top 5 videos that I thought to be great when I was trying to understand Bayes theorum from Youtube channels. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.   Following are top 5 videos that I found quite useful to understand Bayes theorum: Bayes’ Theorum Formula: This one, I liked most. Very short and sweet video which explains about Bayes theorum with a very nice example of economy and stock values in just 6 minutes. For beginners, I would recommend this to be first video to get started with Bayes theorum. Bayes Theorum with …

Continue reading

Posted in Big Data. Tagged with , .

Learn R – Hello World with R – Code Example

This article represents some of the basic concepts required to be understood to write Hello world using R programming language and, execute the same. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Basic Concepts to Write Hello World Function in R Hello World – Code Example Basic Concepts to Write Hello World Function in R Following are some key points to pay attention at, while working Hello World example: R code is written as a set of one or more functions. In R, one could assign a function …

Continue reading

Posted in Big Data. Tagged with , , .

Learn R – How to Get Started with GGPlot – Code Example

This article represents quick introduction to GGPlot along with key concepts and code examples using R programming language. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.   Following are the key points described later in this article: Quick introduction to GGPlot Installation and loading of GGPlot GGPlot – Key Concepts   Quick Introduction to GGPlot ggplot is one of statistical package that facilitates the easy creation of different plots. One of the key concept related to ggplot is that ggplot is built up layer by layer. This means that one could start by initializing the ggplot using ggplot(data) …

Continue reading

Posted in Big Data. Tagged with , .

Learn R – When to use Histogram, Scatterplot & Boxplot – Code Example

This article represents some facts on when to use what kind of plots with code example and plots, when working with R programming language. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key plots described later in this article: Histogram Scatterplot Boxplot   Following is the description for above mentioned plots along with code examples based on base R package. Note that each of the these plots could be done using different commands when using ggplot2 package. Histogram:Histograms is one of the best form of visualizations when working with single continuous variable. It plots the relative …

Continue reading

Posted in Big Data. Tagged with , .

How Can I Become A Data Scientist?

data-scientist

This article represents thoughts, primarily, on how to become a data scientist. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points related with different aspects of data scientist, that are described later in this article: Key skills of a data scientist Key roles & responsibilities of a data scientist What would it take me to become a data scientist? What would I create as a Data Scientist? Key Skills of a Data Scientist Mathematics & Statistics Knowledge: A data scientist would do a great job if he/she has a strong mathematics and statistical background. …

Continue reading

Posted in Big Data. Tagged with .