Author Archives: Ajitesh Kumar

Ajitesh Kumar

I have been recently working in the area of Data Science and Machine Learning / Deep Learning. In addition, I am also passionate about various different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia etc and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc. I would love to connect with you on Linkedin.

Top Data Sources for Climate Change Research

climate change data sources

In this post, you will get to learn about top data sources online from where you can learn and get data for doing research on climate change. Vitalflux is committing itself to AI and climate change research for next 15 years. You will get to learn about climate change and how data science / machine learning can be leveraged to tackle climate change in time to come.   Without further ado, lets list down the data sources related to climate change research: United Kingdom’s Met Office Hadley Centre: Researchers at the Met Office Hadley Centre produce and maintain a range of gridded datasets of meteorological variables for use in climate monitoring and climate …

Continue reading

Posted in Climate Change. Tagged with , .

Python Scraper for GoogleNews, Twitter, Reddit & Arxiv

Python scraper GoogleNews Twitter Reddit Arxiv

In this post, you will get the Python code for scraping latest and greatest news about any topics from Google News, Twitter, Reddit and Arxiv. This could prove to be very useful for data scientist, machine learning enthusiats to keep track of latest and greatest happening in the field of artificial intelligence. If you are doing some research work, these pieces of code would prove to be very handy to quickly access the information. The code in this post has been worked out in Google Colab notebook. First and foremost, import the necessary Python libraries such as the following for GoogleNews, Twitter and Arxiv.  Python Code for mining GoogleNews Here …

Continue reading

Posted in Data Science, Python. Tagged with .

Reddit Scraper Code using Python & Reddit API

Reddit app client id and secret token

In this post, you will get Python code sample using which you can search Reddit for specific subreddit posts including hot posts. Reddit API is used in the Python code. This code will be helpful if you quickly want to scrape Reddit for popular posts in the field of machine learning (subreddit –  r/machinelearning), data science (subreddit – r/datascience), deep learning (subreddit – r/deeplearning) etc.   There will be two steps to be followed to scrape Reddit for popular posts in any specific subreddits. Python code for authentication and authorization Python code for retrieving the popular posts Check the Reddit API documentation page to learn about Reddit APIs. Python code for …

Continue reading

Posted in Python. Tagged with .

First Principles Thinking Explained with Examples

elon musk first principles thinking

In this post, you will learn about the concepts of First Principles thinking with the help of examples. The following topics will be covered in this post: What are first principles? What is first principles thinking? Examples of first principles thinking The first principles thinking can be used to solve any type of problem including real-life problem, product related problems, science related problems etc.  What are First Principles? As per Wikipedia, the first principle is a basic proposition or assumption that can not be derived from any other proposition or assumption. From a Mathematics perspective, the first principle can be thought of as Axioms. Axioms are propositions that are regarded …

Continue reading

Posted in Reasoning. Tagged with .

Mining Twitter Data – Python Code Example

Twitter data mining with Python Twitter API

In this post, you will learn about how to get started with mining Twitter data. This will be very helpful if you would like to build machine learning models based on NLP techniques.  The Python source code used in this post is worked out using Jupyter notebook. The following are key aspects of getting started with Python Twitter APIs.  Set up Twitter dev app and Python Twitter package Establish connection with Twitter Twitter API example – location-based trends, user timeline, etc Search twitter by hashtags Setup Twitter Dev App & Python Twitter Package In this section, you will learn about the following two key aspects before you get started with …

Continue reading

Posted in Data Mining, Python. Tagged with , .

Spend Analytics – 5 Ws of Spend Analysis

spend analytics

In this post, you will learn about 5 Ws of spend analytics. In case you are a procurement professional looking to understand use cases related to spend analytics, you may find this post to be very useful. In simple words, spend analytics is about extracting insights from spend in different procurement categories.  What are we spending on? First and foremost, it is important to get visibility on what items are we spending on. This can be achieved using a dashboard. This form of analytics is also called descriptive analytics. Analyzing item spends can be termed as Item spend analytics. The items can be related to direct or indirect procurement. Indirect …

Continue reading

Posted in Analytics, Data Science, Procurement.

Python Scraper Code to Search Arxiv Latest Papers

python arxiv library

In this post, you will learn about Python source code related to search Arxiv for relevant and latest machine learning and data science research papers. If you are looking for a faster way to research on Arxiv papers without really going to the Arxiv website, you may want to get this piece of code in your kitty. You can further automate the Arxiv search to get notified based on some logic. Without further ado, let’s get started.  Step 1: Install Python Arxiv Library As a first step, install the Python Arxiv library using the code such as below in your Jupyter notebook or Google colab instance: Step 2: Execute the …

Continue reading

Posted in Python. Tagged with .

Google News Search Python API Example

In this post, you will learn about how to use GoogleNews search Python library to get or retrieve or scrape news from Google News for last N number of days. This would be very helpful for someone wanting to track new work / projects in relation to machine learning, data science, deep learning or any field including sports, politics etc. Without further ado, lets jump in right away. You can log into Google colab and practise the code.  Step 1: First and foremost, lets install GoogleNews python library. Step 2: Instantiate GoogleNews object. One can pass the language and period to instantiate the object. The parameter, period, represents the news …

Continue reading

Posted in Python. Tagged with .

Python – How to Create Dictionary using Pandas Series

In this post, you will learn about one of the important Pandas fundamental data structure namely Series and how it can be used as a dictionary. It will be useful for beginner data scientist to understand the concept of Pandas Series object.  A dictionary is a structure that maps arbitrary keys to a set of arbitrary values. Pandas Series is a one-dimensional array of indexed data. It can be created using a list or an array. Pandas Series can be thought of as a special case of Python dictionary. It is a structure which maps typed keys to a set of typed values. Here are the three different ways in …

Continue reading

Posted in Data Science, Python. Tagged with , .

Support Vector Machine (SVM) Interview Questions – Set 1

neural networks interview questions

This quiz consists of questions and answers on Support Vector Machine (SVM). This is a practice test (objective questions and answers) that can be useful when preparing for interviews. The questions in this and upcoming practice tests could prove to be useful, primarily, for data scientists or machine learning interns/freshers/beginners. The questions are focused on some of the following areas: Introduction to SVM Types of SVM such as maximum-margin classifier, soft-margin classifier, support vector machine Some of the key SVM concepts to understand while preparing for the machine learning interviews are following: SVM concepts and objective functions SVM kernel functions, tricks Concepts of C and Gamma value Scikit learn libraries for …

Continue reading

Posted in Data Science, Interview questions, Machine Learning. Tagged with , , .

Machine Learning – Feature Selection vs Feature Extraction

Feature extraction vs feature selection

In this post you will learn about the difference between feature extraction and feature selection concepts and techniques. Both feature selection and extraction are used for dimensionality reduction which is key to reducing model complexity and overfitting. The dimensionality reduction is one of the most important aspects of training machine learning models. As a data scientist, you must get a good understanding of dimensionality reduction techniques such as feature extraction and feature selection. In this post, the following topics will be covered: Feature selection concepts and techniques Feature extraction concepts and techniques When to use feature selection and feature extraction Feature Selection Concepts & Techniques Simply speaking, feature selection is …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Python – Replace Missing Values with Mean, Median & Mode

Boxplot for deciding whether to use mean, mode or median for imputation

In this post, you will learn about how to impute or replace missing values  with mean, median and mode in one or more numeric feature columns of Pandas DataFrame while building machine learning (ML) models with Python programming. You will also learn about how to decide which technique to use for imputing missing values with central tendency measures of feature column such as mean, median or mode. This is important to understand this technique for data scientists as handling missing values one of the key aspects of data preprocessing when training ML models. The dataset used for illustration purpose is related campus recruitment and taken from Kaggle page on Campus Recruitment.  As a first step, the …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Free Online Books – Machine Learning with Python

Python data science

This post lists down free online books for machine learning with Python. These books covers topiccs related to machine learning, deep learning, and NLP. This post will be updated from time to time as I discover more books.  Here are the titles of these books: Python data science handbook Building machine learning systems with Python Deep learning with Python Natural language processing with Python Think Bayes Scikit-learn tutorial – statistical learning for scientific data processing Python Data Science Handbook Covers topics such as some of the following: Introduction to Numpy Data manipulation with Pandas Visualization with Matplotlib Machine learning topics (Linear regression, SVM, random forest, principal component analysis, K-means clustering, Gaussian …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

42 Free Online Books on Machine Learning & Data Science

Machine Learning Books

This post represents a comprehensive list of 42 free books on machine learning which are available online for self-paced learning.  This would be very helpful for data scientists starting to learn or gain expertise in the field of machine learning / deep learning. Please feel free to comment/suggest if I missed to mention one or more important books that you like and would like to share. Also, sorry for the typos. Following are the key areas under which books are categorized: Pattern Recognition & Machine Learning Probability & Statistics Neural Networks & Deep Learning List of 42 Online Free eBooks on Machine Learning Following is a list of 35 FREE online …

Continue reading

Posted in Big Data, Data Science, Machine Learning. Tagged with , , .

Great Site for Matrix Multiplication Demo

Matrix multiplication demonstration

Here is a great website for the matrix multiplication demo. If you are a beginner data scientist, you will love this. Here is how the website looks like. It has just one page. It actually shows how multiplication happens given the different dimensions of the matrix. Here are few other websites for understanding matrix multiplication concepts: Khan Academy – Matrix multiplication

Posted in Data Science. Tagged with .

Different types of Machine Learning Problems

types of learning problems

This post describes the most popular types of machine learning problems using multiple different images/pictures. The following represent various different types of machine learning problems: Supervised learning Unsupervised learning Reinforcement learning Transfer learning Imitation learning Meta-learning In this post, the image shows supervised, unsupervised, and reinforcement learning. You may want to check the explanation on this Youtube lecture video. Unsupervised Learning Problems In unsupervised learning problems, the learning algorithm learns about the structure of data from the given data set and generates fakes or insights. In the above diagram, you may see that what is given is the unlabeled dataset X. The unsupervised learning algorithm learns the structure of data …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .