Categories: Big Data

Machine Learning – How to Predict Software Developers Productivity

This article represents my thoughts on how machine learning techniques could be used to solve one of the most popular problem of software industry such as whether a software developer is productive or not. Of all the effort that I have made to solve this problem using traditional programming techniques (rules-based), I could say that there is no definitive way of finding a concrete solution. As a matter of fact, I created a tool, AgileSQM to capture the software quality metrics (SQM) such as code coverage, duplication, complexity and infer from the trending data whether a software developer is productive. However, I soon hit the road-block in terms of acceptance of this tool across widespread audience as there are various features which needed to captured and analyzed in order to infer about the software developers’ productivity. Now that I am deep into machine learning, I have started to believe that machine learning techniques (algorithms) could be used to solve this problem of predicting software developers’ productivity. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

As I have been going deeper into aspects of data science, in general, I am starting to believe that measuring software developer productivity seems to be a machine learning problem and could be solved using logistic regression algorithms. Following can be steps in creating a model that could be used to predict whether a software developer is productive or not:

  • Identify the features that could be used to predict the software developer productivity. Some of these features could be following:
    • Number of story points/function points (this is to put the problem complexity in context)
    • Problem solving skills (great, decent, bad); Another approach could be to rate the developers on a scale of 100.
    • What is the level of participation of developer in code reviews? Answer could again fall within discrete value range (active, inactive)
    • How communicative a developer is (both oral & written); The value could be high, medium (or at times) or low as this looks to have discrete values.
    • Developer contribution towards new initiative (yes or no)
    • Individual developers’ code related data such as code coverage, code complexity could be gathered from tools such as Sonar. In this case, the focus may be to get the delta (change) in order to predict the productivity. For example, change in coverage (positive or negative), change in code complexity (positive or negative)
  • Gather the above data (against every feature) for every developer, from key technology stakeholders such as tech lead from different teams etc. For uniformity purpose, one may need to baseline how to respond to above in a consistent manner.
  • Along with gathering data, also have the stakeholders suggest whether the developer is productive or not. To avoid bias, there needs to be a set of baseline criteria that developers could use to decide the productivity.
  • Try and gather above data every quarter and continue this process for 6-8 quarters.
  • Create a machine learning model using above data. This model could be optimized further by feeding regular data after every quarter.
  • Use the model to predict the productivity of a developer by gathering data against features mentioned such as above.

Following is how the response would look like, given a new data set is fed:

  • It is 90% likelihood that the developer is productive.
  • It is 55% likelihood that the developer is productive.
  • It is 20% likelihood that the developer is productive.

Based on organization baseline, one could than choose a threshold based on which developer could be called as productive or not. For example, let’s say in case of your organization namely ABC, in those cases where there is 60% or more likelihood that developer is productive, only those developers would be termed as productive.

This is just a thought. I am preparing a test data and see if above solution approach could work in real world scenarios and help solve the problem related with predicting software developers productivity. In the meantime, please share your opinion on whether I am on right track.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

1 month ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

2 months ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

2 months ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

2 months ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

2 months ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

2 months ago