# Tag Archives: Data Science

## Machine Learning – When to Use Linear vs Guassian Kernel with SVM

This article represents guidelines which could be used to decide whether to use Linear kernel or Gaussian kernel when working with Support Vector Machine (SVM). Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: When to Use Linear Kernel When to Use Gaussian Kernel   When to Use Linear Kernel In case there are large number of features and comparatively smaller number of training examples, one would want to use linear kernel. As a matter of fact, it can also be called as SVM with No Kernel. One may …

Posted in Big Data. Tagged with , , .

## 8 Key Steps to Follow When Solving A Machine Learning Problem

This article represents some of the key steps one could take in order to create most effective model to solve a given machine learning problem, using different machine learning algorithms. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. 8 Key Steps for Solving A Machine Learning Problem Gather the data set: This is one of the most important step where the objective is to as much large volume of data set as possible. Given that features have been selected appropriately, large data set helps to minimize the training data set error and also, enable cross-validation and training data set error …

Posted in Big Data. Tagged with , , .

## Data Science – 8 Steps to Multiple Regression Analysis

This article represents a list of steps and related details that one would want to follow when doing multiple regression analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: 8 Steps to Multiple Regression Analysis Techniques used in Multiple regression analysis   8 Steps to Multiple Regression Analysis Following is a list of 7 steps that could be used to perform multiple regression analysis Identify a list of potential variables/features; Both independent (predictor) and dependent (response) Gather data on the variables Check the relationship between each predictor variable …

Posted in Big Data. Tagged with , , .

## Learn R – How to Extract Rows & Columns from Data Frame

This article represents command set in R programming language, which could be used to extract rows and columns from a given data frame. When working on data analytics or data science projects, these commands come very handy in data cleaning activities.  This article is meant for beginners/rookies getting started with R and wanting to know or see examples of extracting information from a data frame. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Commands to extract rows and columns Command to extract a column as data frame Command …

Posted in Big Data. Tagged with , .

## Machine Learning – How to Predict Software Developers Productivity

This article represents my thoughts on how machine learning techniques could be used to solve one of the most popular problem of software industry such as whether a software developer is productive or not. Of all the effort that I have made to solve this problem using traditional programming techniques (rules-based), I could say that there is no definitive way of finding a concrete solution. As a matter of fact, I created a tool, AgileSQM to capture the software quality metrics (SQM) such as code coverage, duplication, complexity and infer from the trending data whether a software developer is productive. However, I soon hit the road-block in terms of acceptance …

Posted in Big Data. Tagged with .

## Data Science – Examples of Machine Learning Problems

This article represents different classification of machine learning problems along with some of the examples taken from real world problems. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following is listed different categories which covers 80% of machine learning problems: Classification Clustering Regression   Machine Learning – Classification Problems Simply speaking, if the answer to problems consists of discrete values such as some of the following, the problem can be termed as classification problems. These are called as “Logistic Regression” problems. Yes or no,. e.g., 1 or 0. Finite set of values representing multi-classification problems Mathematically speaking, if “h(x)” …

Posted in Big Data. Tagged with .

## Top 7 Data Science Subreddits to Follow

This article represents top subreddits related with Data Science on reddit.com that the Data Science aspirants or professionals could watch on regular basis for news, stories and discussions. Generally, I find reddit.com very useful to remain in touch with latest and interesting stories and keep myself up-to-date. For those unaware of what is subreddit, subreddit, simply speaking, represents the topic-based groups on reddit.com that comprise of users who want to publish/discuss news or stories related with that topic. For data science, there are multiple groups each focused on a single topic such as those mentioned below. Please feel free to comment/suggest if I missed to mention one or more important …

Posted in Big Data. Tagged with .

## Data Science – Quick Start Guide for Machine Learning

This article represents a very high-level information on different aspects of machine learning with an objective to present a quick-start read/guide for the data science beginners. One could grab one or more books on Machine Learning to learn the subject in detail. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: What is machine learning? Key phases of machine learning Prediction API model of machine learning   What is Machine Learning? Simply speaking, Machine Learning is a set of artifical intelligence techniques which are used to solve one of …

Posted in Big Data. Tagged with , , .

## Data Scraping – Top 5 Reasons for using Import.io Tool

This article represents my thoughts on why one would want to use this web data scraping tool, named as import.io. I must say that I am glad I found this tool for data scraping. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Key aspects of Import.io Reasons Why One Must Try Import.io for their next Data Scraping Project Use-cases where Import.io scraping tool could be used   Key Aspects of Import.io Tool Import.io is a cloud-based web scraping tool which could act as a boon for those looking …

Posted in Big Data. Tagged with , .

## Data Science – 8 Steps to Perform Regression Analysis using R

This article represents my thoughts on steps that may be required to perform regression analysis (linear or multiple) using R programming language, on a given data set where response variable is primarily a continuous variable. Remember that continuous variables are the ones which could take any numeric data unlike discreet variables which could take only limited set of data. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.   Following are the key steps described later in this article: Load the data Observe the data Clean the data Explore the data visually Fit the linear or multiple regression model …

Posted in Big Data. Tagged with , .

## Data Science – Top 5 Videos to Learn Bayes’ Theorum

This article represents the top 5 videos that I thought to be great when I was trying to understand Bayes theorum from Youtube channels. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.   Following are top 5 videos that I found quite useful to understand Bayes theorum: Bayes’ Theorum Formula: This one, I liked most. Very short and sweet video which explains about Bayes theorum with a very nice example of economy and stock values in just 6 minutes. For beginners, I would recommend this to be first video to get started with Bayes theorum. Bayes Theorum with …

Posted in Big Data. Tagged with , .

## Data Science – 6 Steps to Perform Data Analysis using R

This article represents steps that one could take to perform data analysis on available datasets using data science (machine learning algorithms) with the help of R programming language. The objective of this article is to introduce an approach for data science beginners to get started with data analysis. However, as you get experience you could adopt your own techniques that works for you. These are just my thoughts and there could be better way of approaching data analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key steps which could be taken as a blueprint …

Posted in Big Data. Tagged with .

## Learn R – Different Data Types with Code Examples

This article represents quick concepts on key data types in R programming language, along with code examples and some good go-to links for further read. For those new to R, I would like to quickly re-iterate that R programming language helps in performing data analysis and, is an integral part of data science as a practice. In other words, it is one of the go-to language/platform for data scientist to work with the data. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are different data types in R that would be discussed in this article: Vector List Factor …

Posted in Big Data. Tagged with .

## Learn R – What are Vectors – Code Examples

This article represents high level concepts in relation with Vector data type in R programming language along with code samples. For those new to R language, it should be noted that R provides a console-based platform to perform analysis on data. R can be seen as a programming language for data scientist. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: What are Vectors? Vectors – Code Examples   What are Vectors? Vector, in R, can be defined as a collection of things of same data type. Simply speaking, it …

Posted in Big Data. Tagged with , .

## Data Science – Commonly Used Plot Parameters in R Programming

This article represents some of the commonly used plot parameters across different plot commands, while you are working with different kind of plots in R. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: What are some of the common plots (commands) in R? Commonly Used Plot Parameters   What are some of the common plots (commands) in R? Following represents some of the plots (commands) used in R language for different purposes. I shall be writing different blog on different use-cases where one should use one or more …