Following is listed different categories which covers 80% of machine learning problems:
Machine Learning – Classification Problems
Simply speaking, if the answer to problems consists of discrete values such as some of the following, the problem can be termed as classification problems. These are called as “Logistic Regression” problems.
- Yes or no,. e.g., 1 or 0.
- Finite set of values representing multi-classification problems
Mathematically speaking, if “h(x)” is a hypothetical function, the value of h(x) would fall within 0 and 1. h(x) could be read as estimated probability that output is 0 or 1 on input x. In order to predict whether the output is either 1 (yes or positive class) or 0 (no or negative class), we may compare the value of h(x) with 0.5. It the value is greater than 0.5, we may predict the output as 1 (yes) or otherwise 0 (no). Well, if you are aware of naive Bayes, you may be smiling by now. This is because the above definition of h(x) could be written as probability that y=0 or 1 is true given x has occurred. Following are some of the examples:
- Whether an email is spam, is a classical classification problem. The answer to whether an email is spam can be eiter yes (1) or no (0) or, close to yes or no when solving the problem using classification algorithm such as naive bayes algorithm. For example, an email when fed to the naive bayes spam classifier could give output as 70% which implies that there is 70% chances that email is a spam. Thus, email is filtered as spam.
- Whether an online transaction is fraudulent or not? This is very much used in the banking and financial applications to classify whether a particular transaction is fraudulent.
- Whether the software developer is productive or not?. Whether a software developer is productive or not, could be considered as a classification problem as the answer could be either yes or no or, closer to yes or no.
- Does the resume matches the job description?. Whether the resume matches the job description or not can be considered as classification problem. The answer could be either yes, or closer to yes.
Following are some of the algorithms which could be used to solve classification problems:
- Naive bayes
- K-Nearest members
- Support vector machines (SVM)
Machine Learning – Clustering Problems
Clustering problems are about grouping similar things together. Take a look at some of the following example:
- Grouping similar news items from different websites under a phrase. This could be seen live on Google news
- Another example is top 5 expertise in a resume. Often while taking interviews, most of us struggle to find out what are top 5 areas where the candidate excels. This could be solved using clustering algorithms.
- Market segmentation. Data could be grouped together to identify emerging market segments and help sales & marketing team achieve better results
- Software developers’ areas to improve. By examing code review comments and bugs from time to time, clustering algorithms could help identify top 2-3 areas where developers need to improve.
Machine Learning – Regression Problems
Regression problems are more related with predicting numbers based on input data sets. Mathematically, following is the formulae to solve regression problems:
# Linear regression; h(x) is hypothesis function, m is parameter, x is feature (or variable) and c is constant. h(x) = mx + c # Multiple regression; m1, m2, m3, .... mn are parameters and x1, x2, x3...xn are features h(x) = c + m1.x1 + m2.x2 + m3.x3 + ... + mn.xn
Following are some of the examples:
- Predict house pricing. This is a classicial multi-variate linear regression or multiple regression problem which takes into account different features to predict housing prices.
- How many seats will a party win in the upcoming election?
- Predict stock prices