In this post, you will learn about interview questions that can be asked if you are going for a data scientist architect job. Data science architect needs to have knowledge in both data science/machine learning and cloud architecture. In addition, it also helps if the person is hands-on with programming languages such as Python & R. Without further ado, let’s get into some of the common questions right away. I will add further questions in the time to come.
Q. How do you go about architecting a data science or machine learning solution for any business problem?
- Set the objective: The objective represents the business outcome that needs to be achieved
- Identify the levers: The levers represent the input to the system which can influence the business outcome. The input to the system can represent the variables that can be controlled (levers that can be pulled) and, the variables which can’t be controlled.
- Collect the data: Next step is to determine what data you have and what would you need to collect.
- Design one or more models and combine them as the solution: Once the objective, input levers, and the data are set, the final step is to design one or more models whose predictions can be combined to create solutions representing the modeler, simulator, and optimizer.
In relation to the above, check one of the related posts titled as drivetrain approach for machine learning. You can also get to learn examples for designing the machine learning solutions using the drivetrain approach.
Q. How would you go about deploying a machine learning model in the cloud and serve predictions through APIs?
Here are the steps for deploying the machine learning models in the cloud. The points below represents couple of options related to deploying the models in Amazon cloud.
- Deployment using Python Flask App
- Deploy the model file (say, python pickle file) in Amazon S3 storage.
- Create a Python flask-based app that loads the model for serving the predictions. The python flask app can be dockerized and deployed using Amazon elastic container (ECS) service.
- Expose the python Flask app through REST API. The REST API can be exposed using the Amazon gateway service.
- Deployment using Amazon Sagemaker
- Train the model using Amazon sagemaker studio
- Deploy the model as Lambda service right from within Sagemaker
Q. What will be your governance strategy for machine learning-based solutions?
The governance strategy for machine learning-based solutions is about measuring the performance of the models and take appropriate actions in case the model accuracy dips below a particular threshold. One can have a system of Red-Amber-Green. Note that the threshold accuracy range mentioned below is hypothetical and can vary based on your requirements.
- In case the model accuracy is above 85% or so, one can tag the model as green. Nothing needs to be done here.
- In case the model accuracy stays in the range of say, 70-85%, the model can be tagged as Amber. One should examine the reason for the dip in model accuracy and take the appropriate action such as re-training the models.
- In case, the model accuracy dips below 70%, one can tag the model as Red. In this case, the model should be replaced with the last best model, or some alternate rules-based solution be deployed or there should be the provision of exception handling.
Q. Talk about a cloud-based platform that could be used for training machine learning models by the data science team?
One can design the data science workbench using Amazon Sagemaker Studio (IDE for machine learning models). It is a great tool and provides a cost-effective platform for training machine learning models. The best part is it can be easily integrated with data lake (S3) on Amazon. There can be other viable options with other cloud platforms such as Azure and Google.