As I have been going deeper into aspects of data science, in general, I am starting to believe that measuring software developer productivity seems to be a machine learning problem and could be solved using logistic regression algorithms. Following can be steps in creating a model that could be used to predict whether a software developer is productive or not:
- Identify the features that could be used to predict the software developer productivity. Some of these features could be following:
- Number of story points/function points (this is to put the problem complexity in context)
- Problem solving skills (great, decent, bad); Another approach could be to rate the developers on a scale of 100.
- What is the level of participation of developer in code reviews? Answer could again fall within discrete value range (active, inactive)
- How communicative a developer is (both oral & written); The value could be high, medium (or at times) or low as this looks to have discrete values.
- Developer contribution towards new initiative (yes or no)
- Individual developers’ code related data such as code coverage, code complexity could be gathered from tools such as Sonar. In this case, the focus may be to get the delta (change) in order to predict the productivity. For example, change in coverage (positive or negative), change in code complexity (positive or negative)
- Gather the above data (against every feature) for every developer, from key technology stakeholders such as tech lead from different teams etc. For uniformity purpose, one may need to baseline how to respond to above in a consistent manner.
- Along with gathering data, also have the stakeholders suggest whether the developer is productive or not. To avoid bias, there needs to be a set of baseline criteria that developers could use to decide the productivity.
- Try and gather above data every quarter and continue this process for 6-8 quarters.
- Create a machine learning model using above data. This model could be optimized further by feeding regular data after every quarter.
- Use the model to predict the productivity of a developer by gathering data against features mentioned such as above.
Following is how the response would look like, given a new data set is fed:
- It is 90% likelihood that the developer is productive.
- It is 55% likelihood that the developer is productive.
- It is 20% likelihood that the developer is productive.
Based on organization baseline, one could than choose a threshold based on which developer could be called as productive or not. For example, let’s say in case of your organization namely ABC, in those cases where there is 60% or more likelihood that developer is productive, only those developers would be termed as productive.
This is just a thought. I am preparing a test data and see if above solution approach could work in real world scenarios and help solve the problem related with predicting software developers productivity. In the meantime, please share your opinion on whether I am on right track.
- Beta Distribution Explained with Python Examples - September 24, 2020
- Bernoulli Distribution Explained with PythonExamples - September 23, 2020
- K-Nearest Neighbors Explained with Python Examples - September 22, 2020