Categories: Big Data

Big Data & Predictive Modelling

Talk about big data and things that appear first in an engineer’s mind is Hadoop & related technology. The key thing that is getting missed time and again by many developers’ working on Big Data is a sense of reading/understanding/learning the data and designing algorithms to achieve different objectives such as derivations, predictions etc.

Predictive Modelling

One of the key aspect of data science which is also key to Big Data is Predictive Modelling. I wanted to do some quick research and develop an understanding around this topic. However, while researching, it was found that the topic does include some complex underlying mathematical models which will surely be very hard to be understood by 80% of Software Engineers.

Lets try and understand basics of Predictive Modelling.

Predictive modelling is nothing but a process in which a model can be created/used to predict the probability of an outcome based on a set of input data. For example, lets take a very simple example. Companies do publish their plans to set up one or more plants/factories in a certain region. This can be simply used to predict that there are more jobs going to be created in that region. This prediction can be further used to predict money liquidity in that region leading to further investments of different sorts such as real estate, hospitals, schools etc. This data can be used by businesses to plan their investment in that region.

Recently, I have been working on a project where the objective is to come out with different models to predict growth in a region based on investments. Additionally, I have also been researching different models to predict company growth and next moves based on their past and present investments.

There are different models based on which predictive modelling is done. Some of the following is listed on wikipedia page which I shall be detailing out in due course of time:

Group method of data handling
Naive Bayes
Majority Classifier
Support vector machines
Logistic regressions
K-nearest neighbor algorithm

Big Data know for four V’s is certainly a candidate for predictive modelling owing to the volume, variety, velocity & veracity of the data. For software service providers vouching to have expertise in Big data and not having expertise to play with data may not add lot of value to big data implementation projects.

Author
Recent Posts

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning and BI. I would love to connect with you on Linkedin.
Check out my books titled as Designing Decisions, and First Principles Thinking.