Category Archives: Big Data

Machine Learning Usecases for Pinterest.com & related Kosei Acquisition

This article represents thoughts on recent acquisition of Kosei, a commerce recommendation system, by Pinterest.com. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: How could Machine Learning help Pinterest fuel its overall growth? How could Kosei help Pinterest.com?   How could Machine Learning help Pinterest fuel its overall growth? Yet another acquisiton in the space of machine learning, Pinterest.com acquires Kosei to achieve some of the following objective: Better ad targeting for greater mometization from ad clicks. This looks to be a case of identifying users clusters based …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – List of Common Machine Learning Problems with Examples

This article represents quick examples for 5 different classes of machine learning problems/tasks. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following is a set of 5 key machine learning problems/tasks whose examples have been listed later in this article: Regression Classification Clustering Association Rules Artificial Neural Networks   Examples – Regression Models Real Estate – Housing price estimation Financial – Stock price estimation Insurance – Estimate medical care expenses Sales & Marketing – Sales vs Ad spend Company growth estimation   Examples – Classification Models Following are four different algorithms whose examples have been listed below: Naive …

Continue reading

Posted in Big Data. Tagged with .

Cheat Sheet – 10 Machine Learning Algorithms & R Commands

This article lists down 10 popular machine learning algorithms and related R commands (& package information) that could be used to create respective models. The objective is to represent a quick reference page for beginners/intermediate level R programmers who working on machine learning related problems. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the different ML algorithms included in this article: Linear regression Logistic Regression K-Means Clustering K-Nearest Neighbors (KNN) Classification Naive Bayes Classification Decison Trees Support Vector Machine (SVM) Artifical Neural Network (ANN) Apriori AdaBoost Cheat Sheet – ML Algorithms & R Commands Linear regression: …

Continue reading

Posted in Big Data. Tagged with .

Data Science – Examine Data Spread using Histogram and Density Plot

This article represents code samples in R programming language which could be used to draw histogram and density plot. Note that these plots are very useful for examining the data spread. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Code Sample – Draw Histogram and Density Plot Histrogram and density plot are very useful for examining the spread of a data variable. Following R commands with ggplot package helps in drawing histogram and density plots. As I am explaining with ggplot package, I am using diamonds data which comes with ggplot package. Pay attention to some of the …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – Top 5 Videos to Get Started with Neural Networks

This article represents some good youtube videos that I found useful to get started with understanding how brain works and what is neural networks. Note that I needed to do this as I wanted to get started with machine learning and neural network algorithm. In order to do that effectively, I needed to understand what are neural networks and videos below helped me get started within an hour. Please feel free to suggest other great videos which I may have missed. Sorry for the typos.   From Neurons to Networks I would rate it as the one of the best videos I saw on how human brain works. MUST watch!!! …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – 3 Key Aspects of Applying KMeans Algorithm for Clustering Tasks

This article represents key concepts around KMeans algorithm including key aspects and formula/R command when you are working on clustering tasks. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Key aspects of applying KMeans algorithm KMeans Algorithm – R Command   Key aspects of applying KMeans Algorithm Key aspects of applying KMeans algorithm are following: Selecting a right combination of features set: On the data set on which you may observe some of the following: There are one or more features having non-numeric or character data sets. As …

Continue reading

Posted in Big Data. Tagged with .

Data Science – R Packages & Methods for naive Bayes Classification

This article represents different R┬ápackages and related methods which could be used to create a naive Bayes classifier. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key packages described later in this article: TM WordCloud e1071 Gmodels   Following is a list of R packages that could be used for naive Bayes classification: TM Package: Originally created by Ingo Feinerer as a dissertation project at the Vienna University of Economics and Business, tm package is a very popular package that provides a framework for text mining applications within R. More about TM package could be …

Continue reading

Posted in Big Data. Tagged with .

Learn R – How to Fix Read.Table Command Reading Lesser Rows

This article represents the problem statement related with read.table reading fewer or incorrect or lesser number of lines or rows when reading a text file having multiple columns, and the solution to the same. This is going to be a shorter blog. But since it solved a problem on which I spent some time, I chose to write about the same. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Problem Statement: Reading Fewer Lines with read.table Command I have been learning the naive bayes classification. I downloaded this SMS collection data. I went ahead and tried to load …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – Data Cleaning R Commands for Text Classification Problems

This article represents concepts and related R command set used to clean the text in order to make it ready for text classification. The R command set belongs to tm package. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Lets load a set of messages along with appropriate classification using following command. messages <- read.table( file.choose(), sep=”\t”, stringsAsFactors=FALSE) The messages data frame could have two features, such as type and text where each piece of text is associated with an appropriate type. Once done, lets go ahead and create a Corpus object out of all the message text. …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – Common Exploratory R Commands for Classification Problems

This article represents common exploratory R commands that could used during the stage of data preparation when solving classification problems. I found them being used when I have been going through KNN or naive Bayes algorithms. I know that there may be more to the list below. I would love to hear those additional commands from you. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.   In the set of commands listed below, a data frame, message_text, is used which is a set of text data, loaded using read.table command such as following: messages_text <- read.table( file.choose(), sep=”\t”, …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – List of Key Machine Learning Algorithms

This article represents a list of key machine learning algorithms which are most widely used by data scientists while doing data analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. The list of machine learning algorithms presented below covers some of the most important and widely used algorithms which could set a stage for one to get started with data science/analytics and create models for predictions. Following are two high level classifications in which these machine learning algorithms fall under: Supervised learning Unsupervised learning Following are some of the key tasks that are performed by machine learning algorithms …

Continue reading

Posted in Big Data. Tagged with .

Learn R – How to Add New Column to Data Frame

This article represents concepts and code samples on how to add new columns to a data frame using R programming language. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Lets create a student data frame. Following is the code: # Create non-empty data frame with column names # Assign names to x x <- c( “Calvin”, “Chris”, “Raj”) # Assign names to y y <- c( 10, 25, 19) # Create a non-empty data frame with column names # Assign x to “First Name” as column name # Assign y to “Age” as column name student <- data.frame( …

Continue reading

Posted in Big Data. Tagged with , .

Learn R – How to Convert Columns from Character to Factor

This article represents different ways in which one or more columns in a data frame could be converted to factor when working with R programming language. Please feel free to comment/suggest if I missed mentioning one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Convert single column to factor Convert multiple columns to factor Following data frame, df, is used in the code sample below: In above data frame, both diagnosis and param_d are character vectors. One could quickly check classes of all columns using the following command: Convert Single Column to Factor Following is demonstrated the code samples …

Continue reading

Posted in Big Data.

Data Science – How to Scale or Normalize Numeric Data using R

This article represents concepts around the need to normalize or scale the numeric data and code samples in R programming language which could be used to normalize or scale the data. Please feel free to comment/suggest if I missed mentioning one or more important points. Also, sorry for the typos. Following are the two different ways which could be used to normalize the data, and thus, described later in this article: Why Normalize or Scale the data? Min-Max Normalization Z-Score Standardization Why Normalize or Scale the data? There can be instances found in data frame where values for one feature could range between 1-100 and values for other feature could …

Continue reading

Posted in AI, Big Data, Data Science. Tagged with , .

Learn R – How to Append Rows to Data Frame

This article represents concepts and code samples on how to append rows to a data frame when working with R programming language. Please feel free to comment/suggest if I missed mentioning one or more important points. Also, sorry for the typos. Following are the key points described later in this article: How to append one or more rows to an empty data frame How to append one or more rows to non-empty data frame For illustration purpose, we shall use a student data frame having following information: How to Append one or more rows to an Empty Data Frame Following code represents how to create an empty data frame and …

Continue reading

Posted in Big Data. Tagged with , .

Learn R – How to Create Data Frame with Column Names

This article represents code in R programming language which could be used to create a data frame with column names. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Create empty dataframe with column names Create non-empty dataframe with column names Create an Empty Dataframe with Column Names Following is the code sample: Following gets printed:   Create non-empty Dataframe with Column Names Following is the code sample: Following gets printed. Note the column names such as “First Name” and “Age”  

Posted in Big Data. Tagged with , .