Category Archives: Big Data
Data Science – 8 Steps to Multiple Regression Analysis
This article represents a list of steps and related details that one would want to follow when doing multiple regression analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: 8 Steps to Multiple Regression Analysis Techniques used in Multiple regression analysis 8 Steps to Multiple Regression Analysis Following is a list of 7 steps that could be used to perform multiple regression analysis Identify a list of potential variables/features; Both independent (predictor) and dependent (response) Gather data on the variables Check the relationship between each predictor variable …
Big Data – Top Education Resources from MIT
This article represents information on Big Data initiative from MIT (Massachusetts Institute of Technology) including bookmarks on lecture notes related machine learning courses and also, machine learning video channel from MIT on Youtube. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: MIT CSAIL Big Data Initiative Machine Learning Lecture Notes & Videos MIT CSAIL Big Data Initiative MIT has a website dedicated to Big Data initiative from MIT CSAIL (Computer Science and Artificial Intelligence Laboratory). Following pages are worth visits to understand ongoing research and listen/view talks …
Weekly Roundup – Machine Learning & Statistics Bookmarks – 02 Feb 2015
This article represents links to some of cool pages on machine learning & statistics that I thought worth sharing. Please feel free to comment/suggest any other webpages that found to be good. Sorry for the typos. Machine Learning & Statistics Bookmarks Andrew NG: One starting to learn machine learning is sure to come across course, paper, or a web page related with Andrew NG, an Associate Professor at Stanford; Chief Scientist of Baidu; and Chairman and Co-Founder of Coursera. Some of the pages sighting his work are following: Courses Publications Research Andrew W. Moore: Great set of tutorials by Andrew D. More, who is Dean of the School of Computer …
Machine Learning – 9 Most Common Usecases for Higher Business Growth
This article represents some of the most common use cases of machine learning algorithms which has been found to impact business growth (in terms of revenues) in a positive manner. These usecases could be most commonly seen with all businesses which are running some or the other form of ecommerce site to support one or more aspects of their business. I have tried and provide information regarding which algorithm (or class of algorithm) could be used to come up with a solution for these usecases. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are different areas, at …
Top 4 Machine Learning Usecases for Energy Forecasting
This article represents top 4 machine learning usecases for energy forecasting. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Machine Learning Usecases for Energy Forecasting Following are different usecases in relation with energy management where machine learning could be used for probabilistic energy forecasting. For those who are new to probabilistic forecasting, here is the definition from Wikipedia: Probabilistic forecasting summarises what is known, or opinions about, future events. In contrast to a single-valued forecasts (such as forecasting that the maximum temperature at given site on a given day will be 23 degrees Celsius or that the result …
Big Data – Free Hadoop Online Training Course from MapR
This article represents quick information on free Hadoop online on-demand training that has been announced yesterday by MapR Technologies, the Hadoop distribution specialist. I took Hadoop Essentials course and I must say that I liked the training session. The downside of these training sessions is that you would soon hit MapR related technologies in relation with MapReduce, HBase, HDFS. However, that said, its worth giving a shot. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Training Courses for Hadoop Developer, Hadoop Administrator & Data Analyst The training includes topics related with a range of Hadoop technologies for …
Machine Learning Usecases for Pinterest.com & related Kosei Acquisition
This article represents thoughts on recent acquisition of Kosei, a commerce recommendation system, by Pinterest.com. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: How could Machine Learning help Pinterest fuel its overall growth? How could Kosei help Pinterest.com? How could Machine Learning help Pinterest fuel its overall growth? Yet another acquisiton in the space of machine learning, Pinterest.com acquires Kosei to achieve some of the following objective: Better ad targeting for greater mometization from ad clicks. This looks to be a case of identifying users clusters based …
Data Science – List of Common Machine Learning Problems with Examples
This article represents quick examples for 5 different classes of machine learning problems/tasks. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following is a set of 5 key machine learning problems/tasks whose examples have been listed later in this article: Regression Classification Clustering Association Rules Artificial Neural Networks Examples – Regression Models Real Estate – Housing price estimation Financial – Stock price estimation Insurance – Estimate medical care expenses Sales & Marketing – Sales vs Ad spend Company growth estimation Examples – Classification Models Following are four different algorithms whose examples have been listed below: Naive …
Cheat Sheet – 10 Machine Learning Algorithms & R Commands
This article lists down 10 popular machine learning algorithms and related R commands (& package information) that could be used to create respective models. The objective is to represent a quick reference page for beginners/intermediate level R programmers who working on machine learning related problems. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the different ML algorithms included in this article: Linear regression Logistic Regression K-Means Clustering K-Nearest Neighbors (KNN) Classification Naive Bayes Classification Decison Trees Support Vector Machine (SVM) Artifical Neural Network (ANN) Apriori AdaBoost Cheat Sheet – ML Algorithms & R Commands Linear regression: …
Data Science – Top 5 Videos to Get Started with Neural Networks
This article represents some good youtube videos that I found useful to get started with understanding how brain works and what is neural networks. Note that I needed to do this as I wanted to get started with machine learning and neural network algorithm. In order to do that effectively, I needed to understand what are neural networks and videos below helped me get started within an hour. Please feel free to suggest other great videos which I may have missed. Sorry for the typos. From Neurons to Networks I would rate it as the one of the best videos I saw on how human brain works. MUST watch!!! …
Data Science – 3 Key Aspects of Applying KMeans Algorithm for Clustering Tasks
This article represents key concepts around KMeans algorithm including key aspects and formula/R command when you are working on clustering tasks. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Key aspects of applying KMeans algorithm KMeans Algorithm – R Command Key aspects of applying KMeans Algorithm Key aspects of applying KMeans algorithm are following: Selecting a right combination of features set: On the data set on which you may observe some of the following: There are one or more features having non-numeric or character data sets. As …
Data Science – R Packages & Methods for naive Bayes Classification
This article represents different R packages and related methods which could be used to create a naive Bayes classifier. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key packages described later in this article: TM WordCloud e1071 Gmodels Following is a list of R packages that could be used for naive Bayes classification: TM Package: Originally created by Ingo Feinerer as a dissertation project at the Vienna University of Economics and Business, tm package is a very popular package that provides a framework for text mining applications within R. More about TM package could be …
Learn R – How to Fix Read.Table Command Reading Lesser Rows
This article represents the problem statement related with read.table reading fewer or incorrect or lesser number of lines or rows when reading a text file having multiple columns, and the solution to the same. This is going to be a shorter blog. But since it solved a problem on which I spent some time, I chose to write about the same. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Problem Statement: Reading Fewer Lines with read.table Command I have been learning the naive bayes classification. I downloaded this SMS collection data. I went ahead and tried to load …
Data Science – Data Cleaning R Commands for Text Classification Problems
This article represents concepts and related R command set used to clean the text in order to make it ready for text classification. The R command set belongs to tm package. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Lets load a set of messages along with appropriate classification using following command. messages <- read.table( file.choose(), sep=”\t”, stringsAsFactors=FALSE) The messages data frame could have two features, such as type and text where each piece of text is associated with an appropriate type. Once done, lets go ahead and create a Corpus object out of all the message text. …
Data Science – Common Exploratory R Commands for Classification Problems
This article represents common exploratory R commands that could used during the stage of data preparation when solving classification problems. I found them being used when I have been going through KNN or naive Bayes algorithms. I know that there may be more to the list below. I would love to hear those additional commands from you. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. In the set of commands listed below, a data frame, message_text, is used which is a set of text data, loaded using read.table command such as following: messages_text <- read.table( file.choose(), sep=”\t”, …
Data Science – List of Key Machine Learning Algorithms
This article represents a list of key machine learning algorithms which are most widely used by data scientists while doing data analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. The list of machine learning algorithms presented below covers some of the most important and widely used algorithms which could set a stage for one to get started with data science/analytics and create models for predictions. Following are two high level classifications in which these machine learning algorithms fall under: Supervised learning Unsupervised learning Following are some of the key tasks that are performed by machine learning algorithms …
I found it very helpful. However the differences are not too understandable for me