Machine learning is about learning one or more mathematical functions / models using data to solve a particular task. Any machine learning problem can be represented as a function of three parameters.
Machine Learning Problem = < T, P, E >
In the above expression, T stands for task, P stands for performance and E stands for experience (past data). A machine learning model learns to perform a task using past data and is measured in terms of performance (error)
What are features in machine learning?
Features are nothing but the variables in machine learning models. What is required to be learnt in any specific machine learning problem is set of these features (variables), coefficients of these features and parameters for coming up with appropriate function (also termed as hyper parameters).
Features can be raw data which are very straightforward and can be derived from real-life as it is. However, not all problems can be solved using raw data or data in its original form. Many times, they need to be represented or encoded in different form. For example, a color can be represented in RGB format or HSV format. Thus, a color can have two different representations or encodings. And, both of these representations or encodings can be used to solve different kind of problems. Some tasks that may be difficult with one representation can become easy with another. For example, the task “select all red pixels in the image” is simpler in the RGB format, whereas “make the image less saturated” is simpler in the HSV format.
Machine-learning models are all about finding appropriate representations / features for their input data—transformations of the data that make it more amenable to the task at hand, such as a classification task.
In case of machine learning, it is responsibility of data scientists to hand-craft some useful representations / features of data. In case of deep learning, the feature representations are learnt automatically based on the underlying algorithm. One of the most important reasons why deep learning took off instantly is that it completely automates what used to be the most crucial step in a machine-learning workflow: feature engineering
The figure given below represents usage of hand-crafted representations / features and raw data in building machine learning models.
The process of coming up with features including raw or derived features is called as feature engineering.
Hand-crafted features are also called as derived features.
Subsequent step is to select the most appropriate features out of these features. This is called as feature selection.