Tag Archives: Data Science

Data Science – Top 10 Websites to Bookmark for Daily News

top 10 data science websites

This article represents links and information in relation with top 10 websites that publishes data science related news and article on daily/regular basis. These links are my favorites and help me remain up-to-date with latest and greatest happening in the field of data science. Please feel free to comment/suggest if I missed to mention/include one or more important and interesting websites in the list given below. Also, sorry for the typos. Following are the key points described later in this article: Top 5 Data Science News Websites – Recommended Daily Visit Top 5 Data Science News Websites – Recommended Regular Visit   Top 5 Data Science News Websites – Recommended …

Continue reading

Posted in Big Data. Tagged with , , .

Data Science – Hypothesis Testing & Type I and Type II Errors

This article describes Type I and Type II errors made during hypothesis testing, based on a couple of examples such as House on Fire, Swine Flu. You may want to note that it is key to understand type I and type II errors as these concepts will show up when we are evaluating a hypothesis function such as that related with machine learning algorithms such as linear regression, logistic regression etc. For example, in case of linear regression models, the significance value (often set as 0.05 and represent probability of making Type I error) is compared with p-value and, the null hypothesis that the parameter/coefficient is equal to zero is …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – Hypothesis Testing Explained with Examples

Hypothesis Testing Workflow

This article represents some of the key statistical concepts along with examples in relation with how to formulate a hypothesis for hypothesis testing. The knowledge of hypothesis formulation and hypothesis testing would prove key to building various different machine learning models. In later articles, hypothesis formulation for machine learning algorithms such as linear regression, logistic regression models etc., will be explained. Please feel free to comment/suggest if I missed mentioning one or more important points. Also, sorry for the typos. Following are the key points described later in this article: What is a hypothesis? How to formulate a hypothesis as Null or Alternate Hypothesis? What is hypothesis testing? What is a …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , , .

Machine Learning – Mathematical Concepts for Linear Regression Models

linear regression model

This article represents some of the key mathematics & statistics concepts that one may need to learn in order to work with linear regression models. Understanding following concepts would help in some of the following manners in relation with evaluating linear regression models: Interpreting coefficients Evaluating the regression model Comparing multiple regression models and choosing the best out of them Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key mathematical concepts/topics described later in this article: Statistical hypothesis testing Probability distributions Quantitative data analysis Plots   Key Mathematics & Statistics Topics for Linear Regression Models …

Continue reading

Posted in Big Data. Tagged with , , .

Data Science – Descriptive Vs Predictive Vs Prescriptive Analytics

Descriptive vs Predictive vs Prescriptive

This article represents key classification or types of analytics that business stakeholders, in this Big Data age, would want to adopt in order to take the most informed and smarter decisions for better business outcomes. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key categories of analytics which are described later in this article: Descriptive Analytics Predictive Analytics Prescriptive Analytics What is Descriptive Analytics? Descriptive analytics answers the question or gains insights into or summarize, “What has happened?”. This could be seen as first stage of business analytics and still accounts for the majority of …

Continue reading

Posted in Big Data. Tagged with , , .

Data Science – Key Algebra Topics to Master

algebra topics for data science

This article represents some of the key topics in Algebra that one may need to brush up or master in order to get good at understanding different aspects of machine learning algorithms. If you are gearing up to become the data scientist, the topics below may be worth your attention as I had to brush them up eventually when I was learning different machine learning algorithms. The concepts listed below, especially related with linear algebra, touches almost all machine learning algorithms. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key high level topics which are …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – Key Probability & Statistics Topics to Master

Table of content for probability & statistics

This article represents a list of key probability & statistics topics that one may need to master if he is aiming to become a data scientist. This article lists topics that has worked for me so far in relation with working on a data science problem. One could also see the below list as table of content for key probability and statistics topics for data science. However, I do believe that there are some topics that I might not have mentioned. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Probability & Statistics Topics Following are some of the …

Continue reading

Posted in Big Data. Tagged with , .

Learn R – How to Get Random Training and Test Data Set

This article represents sample source code which could be used to extract random training and test data set from a data frame using R programming language. The R code below could prove very handy while you are working to create a model using any machine learning algorithm. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.   # Read the data from a file; The command below assumes that the working # directory has already been set. One could set working directory using # setwd() command. sample_df <- read.csv(“glass.data”, header=TRUE, stringsAsFactors=FALSE) # get a vector comprising of all indices …

Continue reading

Posted in Big Data. Tagged with , , .

Machine Learning – Bookmarks for Great Tutorials, Books & Videos

This article represents quick bookmarks on some good machine learning web pages including tutorials’ documents and videos. Please feel free to comment/suggest if you know of further good bookmarks. I shall be adding more bookmarks in time to come. Also, sorry for the typos. Following are the key bookmarks: List of Tutorial Pages on Different Machine Learning Topics: You shall surely want to bookmark this page as it consists of some real cool links covering different topics in machine learning. List of Machine Learning Books: Those looking out for machine learning books to get started would want to bookmark this page which consists of list of some great books recommended …

Continue reading

Posted in Big Data. Tagged with , , .

Machine Learning – When to Use Logistic Regression vs. SVM

Logistic Regression vs SVM

This article represents guidelines based on which one could determine whether to use Logistic regression or SVM with Kernels when working on a classification problem. These are guidelines which I gathered from one of the Andrew NG videos on SVM from his machine learning course in Coursera.org. As I wanted a place to reach out quickly in future when I am working on classification problem and, want to refer which algorithm to use out of Logistic regression or SVM, I decided to blog it here. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Key Criteria for Using Logistic Regression vs …

Continue reading

Posted in Big Data. Tagged with , , .

Machine Learning – When to Use Linear vs Guassian Kernel with SVM

This article represents guidelines which could be used to decide whether to use Linear kernel or Gaussian kernel when working with Support Vector Machine (SVM). Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: When to Use Linear Kernel When to Use Gaussian Kernel   When to Use Linear Kernel In case there are large number of features and comparatively smaller number of training examples, one would want to use linear kernel. As a matter of fact, it can also be called as SVM with No Kernel. One may …

Continue reading

Posted in Big Data. Tagged with , , .

8 Key Steps to Follow When Solving A Machine Learning Problem

This article represents some of the key steps one could take in order to create most effective model to solve a given machine learning problem, using different machine learning algorithms. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. 8 Key Steps for Solving A Machine Learning Problem Gather the data set: This is one of the most important step where the objective is to as much large volume of data set as possible. Given that features have been selected appropriately, large data set helps to minimize the training data set error and also, enable cross-validation and training data set error …

Continue reading

Posted in Big Data. Tagged with , , .

Data Science – 8 Steps to Multiple Regression Analysis

This article represents a list of steps and related details that one would want to follow when doing multiple regression analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: 8 Steps to Multiple Regression Analysis Techniques used in Multiple regression analysis   8 Steps to Multiple Regression Analysis Following is a list of 7 steps that could be used to perform multiple regression analysis Identify a list of potential variables/features; Both independent (predictor) and dependent (response) Gather data on the variables Check the relationship between each predictor variable …

Continue reading

Posted in Big Data. Tagged with , , .

Learn R – How to Extract Rows & Columns from Data Frame

This article represents command set in R programming language, which could be used to extract rows and columns from a given data frame. When working on data analytics or data science projects, these commands come very handy in data cleaning activities.  This article is meant for beginners/rookies getting started with R and wanting to know or see examples of extracting information from a data frame. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Commands to extract rows and columns Command to extract a column as data frame Command …

Continue reading

Posted in Big Data. Tagged with , .

Machine Learning – How to Predict Software Developers Productivity

This article represents my thoughts on how machine learning techniques could be used to solve one of the most popular problem of software industry such as whether a software developer is productive or not. Of all the effort that I have made to solve this problem using traditional programming techniques (rules-based), I could say that there is no definitive way of finding a concrete solution. As a matter of fact, I created a tool, AgileSQM to capture the software quality metrics (SQM) such as code coverage, duplication, complexity and infer from the trending data whether a software developer is productive. However, I soon hit the road-block in terms of acceptance …

Continue reading

Posted in Big Data. Tagged with .

Data Science – Examples of Machine Learning Problems

This article represents different classification of machine learning problems along with some of the examples taken from real world problems. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following is listed different categories which covers 80% of machine learning problems: Classification Clustering Regression   Machine Learning – Classification Problems Simply speaking, if the answer to problems consists of discrete values such as some of the following, the problem can be termed as classification problems. These are called as “Logistic Regression” problems. Yes or no,. e.g., 1 or 0. Finite set of values representing multi-classification problems Mathematically speaking, if “h(x)” …

Continue reading

Posted in Big Data. Tagged with .