Tag Archives: python

Elbow Method vs Silhouette Score – Which is Better?

In K-means clustering, elbow method and silhouette analysis or score techniques are used to find the number of clusters in a dataset. The elbow method is used to find the “elbow” point, where adding additional data samples does not change cluster membership much. Silhouette score determines whether there are large gaps between each sample and all other samples within the same cluster or across different clusters. In this post, you will learn about these two different methods to use for finding optimal number of clusters in K-means clustering. Selecting optimal number of clusters is key to applying clustering algorithm to the dataset. As a data scientist, knowing these two techniques to find …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Hello World – Altair Python Install in Jupyter Notebook

Altair visualization python

This blog post will walk you through the steps needed to install Altair graphical libraries in Jupyter Notebook. For data scientists, Altair visualization library can prove to very useful. In this blog, we’ll look at how to download and install Altair, as well as some examples of using Altair capabilities for data visualization. What is Altair? Altair is a free statistical visualization library that can be used with python (2 or 3). It provides high-quality interactive graphics via an integrated plotting function ́plot() that produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. Altair is also easy to learn, with intuitive commands like ‘plot’, ‘hist’ …

Continue reading

Posted in Data Science, Python. Tagged with , .

Free Python & R Training from Spoken Tutorial Initiative

spoken tutorial on python and R

Many people today are interested in learning Python and R. Are you starting on data science and machine learning and looking to get trained with python and R skills? These two programming languages are very popular because they allow for the analysis of data sets that is not possible with other tools. The training offered at Spoken Tutorial Initiative will introduce you to Python and R, while also providing helpful tips on how to use them effectively. Spoken Tutorials Initiative by IIT Bombay is an initiative of NME (National mission on Education) through Govt. of India, ICT, MoE to promote IT literacy on free and open source software (FOSS) by …

Continue reading

Posted in Career Planning, Python, Tutorials. Tagged with .

14 Python Automl Frameworks Data Scientists Can Use

Python automl frameworks

In this post, you will learn about Automated Machine Learning (AutoML) frameworks for Python that can use to train machine learning models. For data scientists, especially beginners, who are unfamiliar with Automl, it is a tool designed to make the process of generating machine learning models in an automated manner, user-friendly, and less time-consuming. The goal of Automl is not just about making it easier for machine learning (ML) developers but also democratizing access to model development. What is AutoML? AutoML refers to automating some or all steps of building machine learning models, including selection and configuration of training data, tuning the performance metric(s), selecting/constructing features, training multiple models, evaluating …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Python Scraper for GoogleNews, Twitter, Reddit & Arxiv

Python scraper GoogleNews Twitter Reddit Arxiv

In this post, you will get the Python code for scraping latest and greatest news about any topics from Google News, Twitter, Reddit and Arxiv. This could prove to be very useful for data scientist, machine learning enthusiats to keep track of latest and greatest happening in the field of artificial intelligence. If you are doing some research work, these pieces of code would prove to be very handy to quickly access the information. The code in this post has been worked out in Google Colab notebook. First and foremost, import the necessary Python libraries such as the following for GoogleNews, Twitter and Arxiv.  Python Code for mining GoogleNews Here …

Continue reading

Posted in Data Science, Python. Tagged with .

Reddit Scraper Code using Python & Reddit API

Reddit app client id and secret token

In this post, you will get Python code sample using which you can search Reddit for specific subreddit posts including hot posts. Reddit API is used in the Python code. This code will be helpful if you quickly want to scrape Reddit for popular posts in the field of machine learning (subreddit –  r/machinelearning), data science (subreddit – r/datascience), deep learning (subreddit – r/deeplearning) etc.   There will be two steps to be followed to scrape Reddit for popular posts in any specific subreddits. Python code for authentication and authorization Python code for retrieving the popular posts Check the Reddit API documentation page to learn about Reddit APIs. Python code for …

Continue reading

Posted in Python. Tagged with .

Mining Twitter Data – Python Code Example

Twitter data mining with Python Twitter API

In this post, you will learn about how to get started with mining Twitter data. This will be very helpful if you would like to build machine learning models based on NLP techniques.  The Python source code used in this post is worked out using Jupyter notebook. The following are key aspects of getting started with Python Twitter APIs.  Set up Twitter dev app and Python Twitter package Establish connection with Twitter Twitter API example – location-based trends, user timeline, etc Search twitter by hashtags Setup Twitter Dev App & Python Twitter Package In this section, you will learn about the following two key aspects before you get started with …

Continue reading

Posted in Data Mining, Python. Tagged with , .

Python Scraper Code to Search Arxiv Latest Papers

python arxiv library

In this post, you will learn about Python source code related to search Arxiv for relevant and latest machine learning and data science research papers. If you are looking for a faster way to research on Arxiv papers without really going to the Arxiv website, you may want to get this piece of code in your kitty. You can further automate the Arxiv search to get notified based on some logic. Without further ado, let’s get started.  Step 1: Install Python Arxiv Library As a first step, install the Python Arxiv library using the code such as below in your Jupyter notebook or Google colab instance: Step 2: Execute the …

Continue reading

Posted in Python. Tagged with .

Google News Search Python API Example

In this post, you will learn about how to use GoogleNews search Python library to get or retrieve or scrape news from Google News for last N number of days. This would be very helpful for someone wanting to track new work / projects in relation to machine learning, data science, deep learning or any field including sports, politics etc. Without further ado, lets jump in right away. You can log into Google colab and practise the code.  Step 1: First and foremost, lets install GoogleNews python library. Step 2: Instantiate GoogleNews object. One can pass the language and period to instantiate the object. The parameter, period, represents the news …

Continue reading

Posted in Python. Tagged with .

Python – How to Create Dictionary using Pandas Series

In this post, you will learn about one of the important Pandas fundamental data structure namely Series and how it can be used as a dictionary. It will be useful for beginner data scientist to understand the concept of Pandas Series object.  A dictionary is a structure that maps arbitrary keys to a set of arbitrary values. Pandas Series is a one-dimensional array of indexed data. It can be created using a list or an array. Pandas Series can be thought of as a special case of Python dictionary. It is a structure which maps typed keys to a set of typed values. Here are the three different ways in …

Continue reading

Posted in Data Science, Python. Tagged with , .

Free Online Books – Machine Learning with Python

Python data science

This post lists down free online books for machine learning with Python. These books covers topiccs related to machine learning, deep learning, and NLP. This post will be updated from time to time as I discover more books.  Here are the titles of these books: Python data science handbook Building machine learning systems with Python Deep learning with Python Natural language processing with Python Think Bayes Scikit-learn tutorial – statistical learning for scientific data processing Python Data Science Handbook Covers topics such as some of the following: Introduction to Numpy Data manipulation with Pandas Visualization with Matplotlib Machine learning topics (Linear regression, SVM, random forest, principal component analysis, K-means clustering, Gaussian …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Gradient Boosting Regression Python Examples

Gradient Boosting Regressor Feature Importances

In this post, you will learn about the concepts of Gradient Boosting Regression with the help of Python Sklearn code example. Gradient Boosting algorithm is one of the key boosting machine learning algorithms apart from AdaBoost and XGBoost.  What is Gradient Boosting Regression? Gradient Boosting algorithm is used to generate an ensemble model by combining the weak learners or weak predictive models. Gradient boosting algorithm can be used to train models for both regression and classification problem. Gradient Boosting Regression algorithm is used to fit the model which predicts the continuous value. Gradient boosting builds an additive mode by using multiple decision trees of fixed size as weak learners or …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Keras CNN Image Classification Example

In this post, you will learn about how to train a Keras Convolution Neural Network (CNN) for image classification. Before going ahead and looking at the Python / Keras code examples and related concepts, you may want to check my post on Convolution Neural Network – Simply Explained in order to get a good understanding of CNN concepts. Keras CNN Image Classification Code Example First and foremost, we will need to get the image data for training the model. In this post, Keras CNN used for image classification uses the Kaggle Fashion MNIST dataset. Fashion-MNIST is a dataset of Zalando’s article images—consisting of a training set of 60,000 examples and a …

Continue reading

Posted in Data Science, Deep Learning, Machine Learning. Tagged with , , , , .

Keras Neural Network for Regression Problem

Keras Neural network for regression problem

In this post, you will learn about how to train neural network for regression machine learning problems using Python Keras. Regression problems are those which are related to predicting numerical continuous value based on input parameters / features. You may want to check out some of the following posts in relation to how to use Keras to train neural network for classification problems: Keras – How to train neural network to solve multi-class classification Keras – How to use learning curve to select most optimal neural network configuration for training classification model In this post, the following topics are covered: Design Keras neural network architecture for regression Keras neural network …

Continue reading

Posted in Data Science, Deep Learning. Tagged with , , .

Keras Multi-class Classification using IRIS Dataset

Python keras for multi-class classification model using IRIS dataset

In this post, you will learn about how to train a neural network for multi-class classification using Python Keras libraries and Sklearn IRIS dataset. As a deep learning enthusiasts, it will be good to learn about how to use Keras for training a multi-class classification neural network. The following topics are covered in this post: Keras neural network concepts for training multi-class classification model Python Keras code for fitting neural network using IRIS dataset Keras Neural Network Concepts for training Multi-class Classification Model Training a neural network for multi-class classification using Keras will require the following seven steps to be taken: Loading Sklearn IRIS dataset Prepare the dataset for training and testing …

Continue reading

Posted in Data Science, Deep Learning, Machine Learning, Python. Tagged with , , , .

Python – How to Add Trend Line to Line Chart / Graph

Chris Gayle - Rohit Sharma - Dhoni - Virat Kohli IPL Batting Average Score Trendline

In this plot, you will learn about how to add trend line to the line chart / line graph using Python Matplotlib.As a data scientist, it proves to be helpful to learn the concepts and related Python code which can be used to draw or add the trend line to the line charts as it helps understand the trend and make decisions. In this post, we will consider an example of IPL average batting scores of Virat Kohli, Chris Gayle, MS Dhoni and Rohit Sharma of last 10 years, and, assess the trend related to their overall performance using trend lines. Let’s say that main reason why we want to …

Continue reading

Posted in Python, statistics. Tagged with , , .