Categories: Big Data

Data Science – Examples of Machine Learning Problems

This article represents different classification of machine learning problems along with some of the examples taken from real world problems. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

Following is listed different categories which covers 80% of machine learning problems:

Classification
Clustering
Regression

Machine Learning – Classification Problems

Simply speaking, if the answer to problems consists of discrete values such as some of the following, the problem can be termed as classification problems. These are called as “Logistic Regression” problems.

Yes or no,. e.g., 1 or 0.
Finite set of values representing multi-classification problems

Mathematically speaking, if “h(x)” is a hypothetical function, the value of h(x) would fall within 0 and 1. h(x) could be read as estimated probability that output is 0 or 1 on input x. In order to predict whether the output is either 1 (yes or positive class) or 0 (no or negative class), we may compare the value of h(x) with 0.5. It the value is greater than 0.5, we may predict the output as 1 (yes) or otherwise 0 (no). Well, if you are aware of naive Bayes, you may be smiling by now. This is because the above definition of h(x) could be written as probability that y=0 or 1 is true given x has occurred. Following are some of the examples:

Whether an email is spam, is a classical classification problem. The answer to whether an email is spam can be eiter yes (1) or no (0) or, close to yes or no when solving the problem using classification algorithm such as naive bayes algorithm. For example, an email when fed to the naive bayes spam classifier could give output as 70% which implies that there is 70% chances that email is a spam. Thus, email is filtered as spam.
Whether an online transaction is fraudulent or not? This is very much used in the banking and financial applications to classify whether a particular transaction is fraudulent.
Whether the software developer is productive or not?. Whether a software developer is productive or not, could be considered as a classification problem as the answer could be either yes or no or, closer to yes or no.
Does the resume matches the job description?. Whether the resume matches the job description or not can be considered as classification problem. The answer could be either yes, or closer to yes.

Following are some of the algorithms which could be used to solve classification problems:

Naive bayes
K-Nearest members
Support vector machines (SVM)

Machine Learning – Clustering Problems

Clustering problems are about grouping similar things together. Take a look at some of the following example:

Grouping similar news items from different websites under a phrase. This could be seen live on Google news
Another example is top 5 expertise in a resume. Often while taking interviews, most of us struggle to find out what are top 5 areas where the candidate excels. This could be solved using clustering algorithms.
Market segmentation. Data could be grouped together to identify emerging market segments and help sales & marketing team achieve better results
Software developers’ areas to improve. By examing code review comments and bugs from time to time, clustering algorithms could help identify top 2-3 areas where developers need to improve.

Machine Learning – Regression Problems

Regression problems are more related with predicting numbers based on input data sets. Mathematically, following is the formulae to solve regression problems:

# Linear regression; h(x) is hypothesis function, m is parameter, x is feature (or variable) and c is constant.
h(x) = mx + c
# Multiple regression; m1, m2, m3, .... mn are parameters and x1, x2, x3...xn are features
h(x) = c + m1.x1 + m2.x2 + m3.x3 + ... + mn.xn

Following are some of the examples:

Predict house pricing. This is a classicial multi-variate linear regression or multiple regression problem which takes into account different features to predict housing prices.
How many seats will a party win in the upcoming election?
Predict stock prices

Author
Recent Posts

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin.
Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.