Author Archives: Ajitesh Kumar

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Learn R – How to Create Multiple Density Plots using GGPlot

This article represents code samples which could be used to create multiple density curve or plots using ggplot2 package in R programming language. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Multiple Density Curves/Graphs with GGPlot The code samples given below works for “diamonds” dataset which is loaded as part of ggplot2 package. Following are two different types of plots shown below: Density plots with multiple fills Density plot with single fill Density Plots with Multiple Fills:Following code represents density plots with multiple fills. Pay attention to the “fill” parameter passed to “aes” method. # Create density plots for …

Continue reading

Posted in Data Science. Tagged with .

Learn R – How to Create Density Plot over Histogram

This article represents code examples for overlaying or creating density curve on Histogram using ggplot2 package in R programming. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Code Samples to Overlay Density Curve on Histogram In the code examples below, diamonds data set belonging to ggplot2 package is used. One must load the ggplot2 package (require(“ggplot2”)) before executing the code samples given below. # Most simplistic density curve ggplot(diamonds, aes(x=carat)) + geom_histogram(aes(y=..density..)) + geom_density() + labs(title=”Histogram & Density Curve”, x=”Carat”) Following diagram would get displayed by executing the above code. # Density curve with histogram painted using body …

Continue reading

Posted in Data Structure. Tagged with .

Learn R – 3 Commands to Generate Random Numbers

This article represents 3 different commands with code examples which could be used to generate random numbers in R programming language. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Runif command Sample command Rnorm command Difference between runif and rnorm command Runif: Generate Random Numbers based on Uniform Distribution “Runif” command can be used for generating random numbers based on uniform distribution. One can generate one or more random numbers within a range of numbers. One should note that the random numbers generated using runif commands are all …

Continue reading

Posted in Data Science. Tagged with .

Learn R – Extract Data Frame with One Column

This article represents code sample that could be used to create/extract data frame with one column from existing data frame. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Extract Data Frame with One Column In the code sample below, diamonds dataset from ggplot2 package is used. To work with the example below, one needs to load the ggplot2 library using command such as require(“ggplot2”). In the command below, method as.data.frame is used. Make a note of drop=false parameter passed to as.data.frame method. dfn1 <- as.data.frame(diamonds[,c(1)], drop=false)

Posted in Data Science. Tagged with .

Learn R – How to Create Histogram using GGPlot

This article represents techniques (commands samples) which could be used to create histogram using ggplot2 package in R programming. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following is the summary of commands used to create histogram using ggplot: Using simplistic ggplot and geom_histogram method Using ggplot and geon_histogram command with attributes such as col, fill, alpha Using ggplot, geom_histogram and  scale_fill_gradient method Common Techniques to Create Histogram using ggplot2 In the code examples below, diamonds dataset from ggplot2 package is used. To work with examples below, load the ggplot2 library prior to executing the commands given below. …

Continue reading

Posted in Data Science. Tagged with .

Learn R – How to Create Data Frames using Existing Data Frame

This article represents commands that could be used to create data frames using existing data frame. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following is a list of command summary for creating data frames by extracting multiple columns from existing data frame based on following criteria, whose sample is provided later in this article: Column indices Column names Subset command Data.frame command 6 Techniques for Extracting Data Frame from Existing Data Frames Following commands have been based on diamonds data frame which is loaded as part of loading ggplot2 library.   Following is how the diamonds data …

Continue reading

Posted in Data Science. Tagged with .

Learn R – 5 Techniques to Create Empty Data Frames with Column Names

This article represents techniques on how one could create an empty data frame with column names. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. 5 Techniques to Create Empty Data Frames In each of the examples below, the data frame is created with three columns, namely, ‘name’, ‘rating’, ‘relyear’. It represents moview names, ratings, and the release year. # Command data.frame is used df1 <- data.frame(name=””, rating=””, relyear=””, stringsAsFactors=FALSE) # Command data.frame is used df2 <- data.frame(name=character(), rating=character(), relyear=character(), stringsAsFactors=FALSE) # Usage of read.table command to create empty data frame df3 <- read.table(text = “”, colClasses = c(“character”, …

Continue reading

Posted in Data Science. Tagged with .

Learn R – How to Get Data Frames Columns as Vectors

This article represents different ways in which one could get a data frame column as a vector. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. 4 Techniques to Get Data Frame Column as Vector In the examples below, diamonds dataset from ggplot2 package is considered. This is how a diamond dataset looks like: Following are four different technique/method using which one could retrieve a data frame column as a vector. # In the data set shown above, carat represents column name and hence, [[‘carat’]] carat1 <- diamonds[[‘carat’]] # In the data set shown above, carat represents 1st column …

Continue reading

Posted in Big Data. Tagged with .

Dummies Notes – How SAML-based SSO Authentication Works?

This article represents dummies notes on how could one go for SSO implementation using SAML. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: What is SAML? How does SSO authentication happen using SAML? What are Key Components of SSO Design, in general?   What is SAML? For those of you unaware of what is SAML, here is the definition from WIKIPedia page on SAML: Security Assertion Markup Language (SAML, pronounced sam-el[1]) is an XML-based, open-standard data format for exchanging authentication and authorization data between parties, in particular, between …

Continue reading

Posted in Application Security, Software Engg. Tagged with .

Top 8 Data Science Training Institutes in India

Data analytics training

This article lists down top 8 data science/analytics training institutes from India. Some of them including INSOFE just provide classroom coaching while others such as Edureka provide online training. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following is the list of training institues which are detailed later in this article: INSOFE Jigsaw Academy UReach Solutions AnalytixLabs Edureka SpringPeople SimpliLearn EduPristine   INSOFE International School of Engineering was launched in 2011 with an aim to transform the applied engineering education space in India. Their current focus area is Big Data Analytics / Data Science. Out of all of …

Continue reading

Posted in Big Data, Career Planning. Tagged with .

Top 5 Usecases of Solr to Power Your Web & Mobile Search

This article represents top 5 usecases for using Solr to power your web and mobile search. Note that in case of mobile search requirements, Solr exposes APIs that could be used to retrieve data from Solr index server and serve to mobile client. It also presents a classification of websites which are using Solr to fulfill their search requirements. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Top 5 Usecases for Solr Search Different Classes of Websites using Solr to Power Search Engines Top 5 Usecases for Solr Search Search Engine: Many …

Continue reading

Posted in Big Data. Tagged with .

Dummies Notes – What is B-Tree and Why Use Them?

This article represents quick notes on what is B-Tree Data structure and why use them. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. I found this page (Memory locality & the magic of B-Trees!) on B-Trees as a very interesting read and, would recommend anyone and everyone to go through it to quickly understand the nuances of B-Tree. B-Tree could be defined as a linked sorted distributed range array with predefined sub array size which allows searches, sequential access, insertions and deletions in logarithmic time. Simply speaking, B-Tree is nothing but the generalization of a Binary Search Tree. One may …

Continue reading

Posted in Data Structure, Dummies, Software Engg. Tagged with .

Key Training Topics for Hadoop Developer

hadoop training

This article represents key topics that one would want to learn in order to become a Hadoop Developer. One may also check these topics against topics provider by the training vendor. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key areas tof focus for learning/training which are described later in this article: Java Essentials Hadoop Essentials Java Essentials As Hadoop is based on Java programming language, one would want to get expertise of at least intermediary level to do good with Hadoop development. Following are some of the key concepts that one would want to …

Continue reading

Posted in Career Planning. Tagged with , .

Dummies Notes on How Distributed Computing Works using Hadoop

distributed computing using hadoop

This article intends to present dummies notes on how distributed computing works using Hadoop. As Hadoop is inspired by Google GFS/Map-Reduce/BigTable paper,I have tried and refer to GFS/Map-Reduce/BigTable in this article appropriately wherever possible. One must note that distributed computing paradigm has become mainstream given the advent of Big Data related large scale project implementation going on in several companies. Please feel free to shout if you find discrepancies with my understanding and help me correct the mistakes. Simply speaking, distributed computing refers to the computing paradigm in which processing happens on multiple different boxes consisting of data and, the result is, then, aggregated appropriately to display the final result. In traditional …

Continue reading

Posted in Big Data, Dummies. Tagged with .

60 Most Commonly Used R Packages in R Programming Language

This article represents a comprehensive list of 60 most commonly used R packages which helps to achieve some of the following objectives when working with data science/analytics projects: Predictive modeling Data handling/manipulation Visualization Integration Hadoop GUI Database   60 Most Commonly Used R Packages Following is the list of 60 or so R packages which help take care of different aspects when working to create predictive models: Predictive Modeling: Represents packages which help in working with various different predictive models (linear/multivariate/logistic regression models, SVM, neural network etc.) caret: Stands for Classification And REgression Training. Provides a set of functions which could be used to do some of the following when …

Continue reading

Posted in Big Data. Tagged with , .

API Tips – How to Write API Documentation

This article represents tips on how to write documentation for APIs which are going to be published to developers, both internal and external. It touches upon some of the important areas/points that needed to be included in API documentation such that developers find it easy enough to work with APIs. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. 3 Areas to Cover while doing API Documentation Landing page which provides details such as high level information of APIs, links to APIs pages, release information, changelog details A summary page providing an overview on APIs in general, list of API …

Continue reading

Posted in API Development, Enterprise Architecture. Tagged with .