Data science projects need to go through different project lifecycle stages in order to become successful. In each of the stages, different stakeholders get involved as like in a traditional software development lifecycle.
In this post, you will learn some of the key stages/milestones of data science project lifecycle. This article is aimed to help some of the following project stakeholders who play key roles in data science project implementation:
- Product managers
- Project managers
- ML architects
The following represents 6 high-level stages of data science project lifecycle:
- Planning
- Model development & testing
- Product-level changes
- Model deployment
- Monitoring the model
- Model Enhancement
Data Science Project Lifecycle – Planning
- ML Problem identification: First and foremost, product manager work with business executives, sales & marketing, and CSR executives to identify problems which can be solved using machine learning techniques.
- Stakeholders: Product manager, business stakeholders, CSR, sales & marketing executives
- Deliverables: Problem statement
- Requirements elicitation: In this phase, the product managers work with machine learning (ML) architects to describe the problem at length and analyze whether machine learning could really solve the problem. In another word, decide whether the problem is a valid machine learning problem. In this phase, the null and alternate hypothesis is formulated.
- Stakeholders: Product manager, ML architects
- Deliverables: Product requirement specification
- Analysis & design phase: In this phase, product manager, project manager and ML architect drill down deeper and decide on some of the following such as business value proposition, delivery prioritization, architecture, and design etc.
- Stakeholders: Product manager, ML architect, project manager
- Deliverables: Technical design specification – Following are some of the details which are included in TDS:
- Initial set of features
- Machine learning algorithm to be used
- Development and testing procedures
- Architecture related details
- Project planning: In this phase, product manager, project manager, and ML architect discuss and decide on items such as project timelines, project resources, release planning etc.
- Stakeholders: Product manager, ML architect, project manager
- Deliverables: Project plan
Data Science Project Lifecycle – Model Development & Testing/Evaluation
- Feature Identification/Engineering: To start with, Ml Architect and machine learning engineer or data scientist work with product manager to identify the initial set of features. These features are, then, processed further as part of feature engineering to shortlist most suitable features.
- Manage/process data: After the feature identification, the following takes place as part of data processing:
- Data gathering: Data is gathered from different data sources.
- Data preparation: Data is prepared appropriately as per the features.
- Train ML models: In this phase, different models are created based on different machine learning algorithm. Data is split into training and test data set and ML models are trained and tested with the appropriate data set respectively.
- Evaluate ML models: Once the model is built, the next step includes evaluation of model in terms of accuracy/error. Different techniques such as confusion matrix are used for evaluating the models. Different algorithms have different ways of measuring the accuracy of the model. These techniques are applied appropriately.
Data Science Project Lifecycle – Product-level Changes
- Product-level changes: Data science model when deployed in production gets consumed by one or more product features. Thus, the product needs to be changed appropriately (UI, APIs) in order to prepare data, send the data to the model, get the prediction in response and show the output to the end customer.
Data Science Project Lifecycle – Model Deployment
In this phase, ML models are deployed into production.
- Deploy models: Generally, the model is built using R or Python programming language. However, deployment may happen using other technology/programming language. This is where data science team needs to work with model deployment team to move their models into production.
- Make predictions: Once the model is deployed into production, product calls the model through APIs in order to make predictions.
Data Science Project Lifecycle – Monitoring the model
- Monitor predictions: The model deployed in production needs to be regularly monitored to check the prediction accuracy. This does help in deciding whether to retrain the model with new data set or different feature set etc.
Data Science Project Lifecycle – Model Enhancement
- Feedback/Re-train model: In case of need for re-training the model, the following needs to happen:
- Feature engineering for selecting new set of features to be decided
- Redo all of the stages such as development/testing the re-trained model, model evaluation and finally deploying the model into production.
Summary
In this post, you learned about different phases of data science project lifecycle.
Did you find this article useful? Do you have any questions or suggestions about this article in relation to data science project lifecycle? Leave a comment and ask your questions and I shall do my best to address your queries.
- What are AI Agents? How do they work? - January 7, 2025
- Agentic AI Design Patterns Examples - January 6, 2025
- List of Agentic AI Resources, Papers, Courses - January 5, 2025
I found it very helpful. However the differences are not too understandable for me