Category Archives: Data Science

Stock Price Prediction using Machine Learning Techniques

Stock movement machine learning techniques

In the past few decades, many advances have been made in the field of data analytics. Researchers are now able to predict stock prices with higher accuracy due to analytical predictive models. These predictive techniques utilize data from previous stock price movements and look for patterns that could indicate future stock price changes in the market. The use of these machine learning techniques will allow investors to make better decisions and invest more wisely by maximizing their returns and minimizing their losses. In this blog post, you will learn about some of the popular machine learning techniques in relation to making stock price movement (direction of stock price) predictions and …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Type I & Type II Errors in Hypothesis Testing: Examples

This article describes Type I and Type II errors made during hypothesis testing, based on a couple of examples such as House on Fire, and Covid-19. You may want to note that it is key to understand type I and type II errors as these concepts will show up when we are evaluating a hypothesis such as those related to machine learning algorithms (linear regression, logistic regression, etc). For example, in the case of linear regression models, the significance value is compared with the p-value and, the null hypothesis that the parameter/coefficient is equal to zero is either rejected or failed to be rejected. You may want to check my …

Continue reading

Posted in Data Science, statistics. Tagged with , , .

Hypothesis Testing Explained with Real-life Examples

Hypothesis Testing Workflow

Hypothesis testing is a statistical technique that helps researchers test the validity of their theories. It’s often used in statistics and data science to analyze whether an event has occurred, or if it will occur based on past events.  This blog post will cover some of the key statistical concepts along with examples in relation to how to formulate a hypothesis for hypothesis testing. The knowledge of hypothesis formulation and hypothesis testing would prove key to building various different machine learning models. In later articles, hypothesis formulation for machine learning algorithms such as linear regression, logistic regression models, etc., will be explained. What is a Hypothesis? Simply speaking, hypothesis testing …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , , .

Data Science: P-Value Explained with Examples

P-value explained with examples

Many describe p-value as the probability that the null hypothesis holds good. That is an incorrect definition. The concept of p-value is understood differently by different people and is considered as one of the most used & abused concepts in statistics. In this blog post, you will learn the P-VALUE concepts with multiple different examples. It is extremely important to get a good understanding of P-value if you are starting to learn data science/machine learning as the concepts of P-value are key to hypothesis testing. The following use cases and related hypotheses made about the population will either be accepted or rejected based on the P-VALUE: Whether a coin is fair …

Continue reading

Posted in Data Science, statistics. Tagged with , .

Bias-Variance Trade-off Concepts & Interview Questions

Bias variance concepts and interview questions

Bias vs variance tradeoff is a big problem machine learning models face. In this post, you will learn about the the concepts of bias & variance in relation to the machine learning (ML) models. Bias refers to how well your model can represent all possible outcomes, whereas variance refers to how sensitive your predictions are to changes in the model’s parameters. In addition to learning the concepts related to Bias vs variance trade-off, you would also get a chance to take quiz which would help you prepare for data scientists / ML Engineer interviews. As data scientists / ML Engineer, you must get a good understanding of Bias and Variance concepts …

Continue reading

Posted in Data Science, Interview questions, Machine Learning. Tagged with , , .

Difference between Parametric vs Non-Parametric Models

Machine learning models can be parametric or non-parametric. Parametric models are those that require the specification of some parameters before they can be used to make predictions, while non-parametric models do not rely on any specific parameter settings and therefore often produce more accurate results. This blog post discusses parametric vs non-parametric machine learning models with examples along with the key differences. What are parametric and non-parametric models? Training machine learning models is about finding a function approximation built using input or predictor variables, and whose output represents the response variable. The reason why it is called function approximation is because there is always an error in relation to the …

Continue reading

Posted in Data Science, Machine Learning. Tagged with .

Overfitting & Underfitting Concepts & Interview Questions

Overfitting and underfitting represented using Model error vs complexity plot

Machine learning models are built to learn from training and test data and make predictions on new, unseen data set. The machine learning model is said to overfit the data when it learns patterns that exist only in the training set make prediction with high accuracy. On the other hand, machine learning model underfits if it cannot find any pattern or relationship between variables in both training and testing data sets. In this post, you will learn about some of the key concepts of overfitting and underfitting in relation to machine learning models. In addition, you will also get a chance to test you understanding by attempting the quiz. The …

Continue reading

Posted in Data Science, Interview questions, Machine Learning. Tagged with , , .

Data Readiness Levels Assessment: Concepts

data readiness levels assessment

Data readiness levels (DRLs) and related assessments are an important part of data analytics. Data readiness levels is a concept where different stages represent the quality and maturity of data. Data science is becoming increasingly popular, but not all companies have the right level of data readiness for this type of work. Performing data readiness levels assessment is important because it gives an insight into the quality and quantity of your current datasets and helps determine future success of the data analytics project. This blog post will explain what data readiness levels are and why assessment tests are important in relation to them. What are data readiness levels? Data readiness …

Continue reading

Posted in Data, Data analytics, Data Science. Tagged with , .

Data Science / AI Team Structure – Roles & Responsibilities

Data Science Team Roles & Responsibilities

Setting up a successful artificial intelligence (AI) / data science or advanced analytics practice or center of excellence (CoE) is key to success of AI in your organization. In order to setup a successful data science COE, setting up a well-organized data science team with clearly defined roles & responsibilities is the key. Are you planning to set up the AI or data science team in your organization, and hence, looking for some ideas around data science team structure and related roles and responsibilities? In this post, you will learn about some of the following aspects related to the building data science/machine learning team. Focus areas Roles & responsibilities Data Science Team – Focus …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , , .

Clinical Trials & Predictive Analytics Use Cases

clinical trials predictive analytics machine learning use cases

Analytics plays a big role in modeling clinical trials and predictive analytics is one such technique that has been embraced by clinical researchers. Machine learning algorithms can be applied at various stages in the drug discovery process – from early compound selection to clinical trial simulation. Data scientists have been applying machine learning algorithms to clinical trial data in order to identify predictive patterns and correlations between clinical outcomes, patient demographics, drug response phenotypes, medical history, and genetic information. Predictive analytics has the potential to enhance clinical research by helping accelerate clinical trials through predictive modeling of clinical outcome probability for better treatment decisions with reduced clinical trial costs. In …

Continue reading

Posted in Data Science, Healthcare, Machine Learning, Pharma. Tagged with , , .

Local & Global Minima Explained with Examples

Optimization problems containing many local minima remains a critical problem in a variety of domains, including operations research, informatics, and material design. Efficient global optimization remains a problem of general research interest, with applications to a range of fields including operations design, network analysis, and bioinformatics. Within the fields of chemical physics and material design, efficient global optimization is particularly important for finding low potential energy configurations of isolated groups of atoms (clusters) and periodic systems (crystals). In case of Machine learning (ML) algorithms, theer is a need for optimising (minimising) the cost or loss function. In order to become very good at finding solutions to optimisation problems (relating to minimising …

Continue reading

Posted in Data Science. Tagged with , .

Most Common Machine Learning Tasks

common machine learning tasks

This article represents some of the most common machine learning tasks that one may come across while trying to solve machine learning problems. Under each task are also listed a set of machine learning methods that could be used to resolve these tasks. Please feel free to comment/suggest if I missed mentioning one or more important points. Also, sorry for the typos. You might want to check out the post on what is machine learning?. Different aspects of machine learning concepts have been explained with the help of examples. Here is an excerpt from the page: Machine learning is about approximating mathematical functions (equations) representing real-world scenarios. These mathematical functions …

Continue reading

Posted in AI, Big Data, Data Science, Machine Learning. Tagged with , .

Binomial Distribution Explained with Examples

binomial experiment coin tossing 100 experiments 50 trials

The binomial distribution is a probability distribution that applies to binomial experiments. It’s the number of successes in a specific number of tries. The binomial distribution may be imagined as the probability distribution of a number of heads that appear on a coin flip in a specific experiment comprising of a fixed number of coin flips. In this blog post, we will learn binomial distribution with the help of examples. If you are an aspiring data scientist looking forward to learning/understand the binomial distribution in a better manner, this post might be very helpful. What is a Binomial Distribution? The binomial distribution is a discrete probability distribution that represents the probabilities of binomial random …

Continue reading

Posted in AI, Data Science, Machine Learning, statistics. Tagged with , , .

Python – Replace Missing Values with Mean, Median & Mode

Boxplot for deciding whether to use mean, mode or median for imputation

Missing values are common in dealing with real-world problems when the data is aggregated over long time stretches from disparate sources, and reliable machine learning modeling demands for careful handling of missing data. One strategy is imputing the missing values, and a wide variety of algorithms exist spanning simple interpolation (mean. median, mode), matrix factorization methods like SVD, statistical models like Kalman filters, and deep learning methods. Missing value imputation or replacing techniques help machine learning models learn from incomplete data. There are three main missing value imputation techniques – mean, median and mode. Mean is the average of all values in a set, median is the middle number in …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Building Machine Learning Models & Dev Challenges

machine learning models development and deployment challenges

The machine learning models and AI implementation industry is booming. The demand for machine learning models has never been higher, but the challenges of machine learning development and deployment have also increased. In this post, we will discuss a few common machine learning development and deployment challenges. In future blogs, we will learn about solutions to overcome these challenges. This blog post will help you learn and understand some of the key challenges that you may face if you are planning to start machine learning practice in your organization. These challenges are also very much relevant if you have machine learning engineers and data scientists working across different offices/locations on …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Fixed vs Random vs Mixed Effects Models – Examples

fixed and random effects models

Have you ever wondered what fixed effect, random effect and mixed effects models are? Or, more importantly, how they differ from one another?  In this post, you will learn about the concepts of fixed and random effects models along with when to use fixed effects models and when to go for fixed + random effects (mixed) models. The concepts will be explained with examples. As data scientists, you must get a good understanding of these concepts as it would help you build better linear models such as general linear mixed models or generalized linear mixed models (GLMM).  What are fixed, random & mixed effects models? First, we will take a real-world example and try and understand …

Continue reading

Posted in Data Science, statistics. Tagged with .