9 Most Common Machine Learning Tasks

This article represents some of the most common machine learning tasks that one may come across while trying to solve a machine learning problem. Under each task are also listed a set of machine learning methods that could be used to resolve these tasks. Please feel free to comment/suggest if I missed mentioning one or more important points. Also, sorry for the typos.

Following are the key machine learning tasks briefed later in this article:

  • Data preprocessing
  • Exploratory data analysis (EDA)
  • Feature engineering / selection
  • Training machine learning models of the following kinds:
    • Regression
    • Classification
    • Clustering
  • Multivariate querying
  • Density estimation
  • Dimension reduction
  • Model / Algorithm selection
  • Testing and matching

Following are top 8 most common machine learning tasks that one could come across most frequently while solving an advanced analytics problem:

  1. Data Preprocessing: Before starting training the models, it is of utmost important to prepare data appropriately. As part of data preprocessing, some of the following is done:
    • Data cleaning
    • Handling missing data
  2. Exploratory Data Analysis: Once data is preprocessed, the next step is to perform exploratory data analysis to understand data distribution and relationship between / within the data.
  3. Feature Engineering / Selection: Feature selection is one of the critical tasks which would be used when building machine learning models. Feature selection is important because selecting right features would not only help build models of higher accuracy but also help achieve objectives related to building simpler models, reduce overfitting etc. The following are some of the techniques which could be used for feature selection:
    • Filter methods which helps in selecting features based on the outcomes of statistical tests. The following are some of the statistical tests which are used:
      • Pearson’s correlation
      • Linear discriminant analysis (LDA)
      • Analysis of Variance (ANOVA)
      • Chi-square tests
    • Wrapper methods which helps in feature selection by using a subset of features and determining the model accuracy. The following are some of the algorithms used:
      • Forward selection
      • Backward elimination
      • Recursive feature elimination
    • Regularization techniques which penalizes one or more features appropriately to come up with most important features. The following are some of the algorithms used:
      • LASSO (L1) regularization
      • Ridge (L2) regularization
  4. Training Models:
    • Regression: Regression tasks mainly deal with estimation of numerical values (continuous variables). Some of the examples include estimation of housing price, product price, stock price etc. Some of the following ML methods could be used for solving regressions problems:
      • Kernel regression (Higher accuracy)
      • Gaussian process regression (Higher accuracy)
      • Regression trees
      • Linear regression
      • Support vector regression
      • LASSO / Ridge
      • Deep learning
      • Random forests
    • Classification: Classification tasks is simply related with predicting a category of a data (discrete variables). One of the most common example is predicting whether or not an email if spam or ham. Some of the common use cases could be found in the area of healthcare such as whether a person is suffering from a particular disease or not. It also has its application in financial use cases such as determining whether a transaction is fraud or not. The ML methods such as following could be applied to solve classification tasks:
      • Kernel discriminant analysis (Higher accuracy)
      • K-Nearest Neighbors (Higher accuracy)
      • Artificial neural networks (ANN) (Higher accuracy)
      • Support vector machine (SVM) (Higher accuracy)
      • Random forests (Higher accuracy)
      • Decision trees
      • Boosted trees
      • Logistic regression
      • naive Bayes
      • Deep learning
    • Clustering: Clustering tasks are all about finding natural groupings of data and a label associated with each of these groupings (clusters). Some of the common example includes customer segmentation, product features identification for product roadmap. Some of the following are common ML methods:
      • Mean-shift  (Higher accuracy)
      • Hierarchical clustering
      • K-means
      • Topic models
  5. Multivariate querying: Multivariate querying is about querying or finding similar objects. Some of the following ML methods could be used for such problems:
    • Nearest neighbors
    • Range search
    • Farthest neighbors
  6. Density estimation: Density estimation problems are related with finding likelihood or frequency of objects. In probability and statistics, density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. Some of the following ML methods could be used for solving density estimation tasks:
    • Kernel density estimation (Higher accuracy)
    • Mixture of Gaussians
    • Density estimation tree
  7. Dimension reduction: As per Wikipedia page on Dimension reduction , Dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. Following are some of ML methods that could be used for dimension reduction:
    • Manifold learning/KPCA (Higher accuracy)
    • Principal component analysis
    • Independent component analysis
    • Gaussian graphical models
    • Non-negative matrix factorization
    • Compressed sensing
  8. Model algorithm / selection: Many a times, there are multiple models which are trained using different algorithms. One of the important task is to select most optimal models for deploying them in production.
  9. Testing and matching: Testing and matching tasks relates to comparing data sets. Following are some of the methods that could be used for such kind of problems:
    • Minimum spanning tree
    • Bipartite cross-matching
    • N-point correlation

Ajitesh Kumar

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.