Categories: Big Data

Top 10 Solution Approaches for Supervised Learning Problems

This article represents top 10 solutions approaches that could be used to solve supervised learning problems. For those unaware of what is supervised learning problem, here is the supervised learning definition from Wikipedia:
Supervised learning is the machine learning task of inferring a function from labeled training data.[1] The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.

Following are two different kind of supervised learning problems which are later associated with different solution approaches:

Numerical related problems in which one predicts the quantity (represented using numbers). Algorithms such as regression, SVM, neural network, decision trees etc are used to solve these kind of problems.
Classification related problems in which one predicts classes such as yes/no, positive/negative, good/bad/ugly etc. Algorithms such as logistic regression, SVM, neural network, K-NN, decision trees etc are used to solve these kind of problems.

Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

10 Machine Learning Approaches to Supervised Learning Problems

Regression: Regression is related with predicting numerical related problems. For example, predicting housing prices, stock prices, inventory stocking quantities etc based on model created using data from past. Regression model could be of different kind such as simple linear, multi-variate, quadratic or polynomial.
Logistic regression: Logistic regression is related with predicting classes such as yes/no, positive/negative, good/bad/ugly etc. Sigmoid function is used for predicting the class of the training example.
Bayesian Methods: Bayesian methods such as naive Bayes classifier is used to solve the problem related with classification such as whether an email is a spam or ham.
Decision trees: Decision trees is a type of flowchart which uses tree like graph of decisions and their possible consequences. The consequences is represented as nodes and decisions are represented as branches. Decision trees are used to solve classification related problem such as some of the following:
- Whether a person would click on a link or not
- Whether a person would buy a product or not
- Whethar the person is suffering from Diabetes or not
- Whether an existing customer would leave the service and opt for another service (customer churn problem)
Following are some of the different kind of decision tree algorithms:
- Cart models
- Conditional inference trees
- ID3 and C5.0
Support vector machines (SVM): SVM is used to solve both classification and numerical related problems. Briefly speaking, SVM classifier is maximum margin classifier which applies kernel tricks for non-linear classification. For example, whether a person’s diagnostic report is positive or negative.
Artificial Neural Network: Neural network is used to solve both classification and numerical related problems. There are different kind of neural networks configuration such as perceptron, multi-layer perceptron etc.
Random forests: Random forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Read further on this wikipedia page on random forests.
Nearest Neighbours: Nearest neighbours, or more precisely known as K-NN is used to solve classification related problems. Based on the majority class of nearest neighbors, the class for the input vector is chosen.
Bagging: One of the popular ensembles methods, also named as Bootstrap Aggregation, Bagging is a technique used to take a set of classifiers/regression models and create a model having greater performance, based on model averaging. For each classifier, different samples are taken from the training set. In case of classifier, multiple classifier votes on the output and the majority is taken as the predicted class. For regression, each regression model is used to predict the output, and the final output is obtained by taking the average. The objective is to reduce overfitting or high variance.
Boosting: One of the popular class of ensemble methods, Boosting is is an approach to machine learning based on the idea of creating a highly accurate predictor by combining many weak learners. Boosting algorithms have enjoyed practical success in such fields as biology, vision, and speech processing

Author
Recent Posts

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning and BI. I would love to connect with you on Linkedin.
Check out my books titled as Designing Decisions, and First Principles Thinking.