In this post, you will learn about a very popular approach or methodology called as Drivetrain approach coined by Jeremy Howard. The approach provides you a process to design data products that provide you with actionable outcomes while using one or more machine learning models. The approach is indeed very useful for data scientists/machine learning enthusiasts at all levels. However, this would prove to be a great guide for data science architects whose key responsibility includes designing the data products. Without further ado, let’s do a deep dive.
Why drivetrain approach?
Before getting into the drivetrain approach and understands the basic concepts, Lets understand why drivetrain approach in the first place?
Many times, you set out to create a model to solve only a piece of the business problem. And, it makes it difficult to use your model as it gets difficult to arrive at the actionable outcome based on the prediction of your models. This is where there is a need for a framework or methodology based on which one can design a system comprising of different predictive models which can be combined to answer what-if scenarios (simulator) and then select the combination of input values which results in most optimal actionable outcomes. This is where drivetrain approach comes to the rescue.
What is drivetrain approach?
The drivetrain approach is a 4-step process that can be used for designing data products by leveraging machine learning models. Here is the detail on four steps:
- Set the objective: First and foremost, you set the business objective you want to achieve. The following are some of the examples of clear business objectives:
- Google search engine: Provide the most relevant search results based on the user query
- Insurance price: Set the most optimal price of the insurance which could result in the business benefit
- Recommendation engine: Provide the users with the most appropriate recommendation which would result in users buy the item
- eCommerce: Set the most appropriate price of one or more items which can result in user purchasing the items
- Airline flight ticket: Set the most appropriate ticket price on any day which can result in users purchasing the flight tickets
- Identify the levers: What are the input values to the system which can be controlled? These can also be visualized as the levers which can be pulled to influence the final outcome. These inputs can be of the following two kinds:
- Controllable inputs or variables: The levers or input to the system which can be controlled. For example, for search engine requiring to show the most relevant result, the lever which can be pulled to influenced the search outcomes is the ranking of the search results.
- Uncontrollable inputs or variables: These are the input to the systems where there is no control.
- Collect the data: Next step is to determine what data we do have and what data do we need to collect. One should not be limited with the data (internal data) they have. This implies that you should look out for external data set as well.
- Design and create the model assembly line: The final step is to design and create one or more predictive models and combine them appropriately such that you receive actionable outcomes. Creating one or more models does depend on the objective you set out in the first place, levers you identified and the data you have and you cooollected. The following diagram shows the assembly line. You may note that there are three parts to assemble line:
- Modeler: Modeler represents the set of one or more predictive models to provide the predictions of different kinds. For example, in self-driving cars, modeler will comprise of 100s or thousands of models.
- Simulator: Simulator combines the models in a way post which you can ask the “what if” questions to see how the levers affect the distribution of the final outcome. There can be multiple outcomes from the simulator based on the inputs (levers). In case of self-driving cars, the simulator will find different outcomes based on the different values of input levers.
- Optimizer: Finally, the optimizer is used to identify the best combination of the input levers which can result in the best or most optimal outcome.. It can also identify catastrophic outcomes and show how to avoid them. In case of self-driving cars, the self-driving car must optimize the results of the simulation of all possible outcomes to pick the best combination of acceleration and braking, steering and signaling