Categories: Big Data

How to Start a Big Data Practice

This article represents key aspects of starting up Big Data practice in your organization. Currently, I have started working in the same area and this blog is the result of my research. Hope you find it useful. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

 

Big Data Center of Excellence (COE)

It may be a good idea to plan around setting up a Big Data Center of Excellence (COE)whose main objective would be take a holistic approach towards following two key aspects of Big Data from different perspectives such as setting up team, evaluating tools & frameworks, doing POCs etc.

  • Data processing
  • Data analytics

Senior Management Commitment: One of the most important aspect of running a successful Big Data COE is senior management commitment (towards sponsorship) for minimum of 1-1.5 years for results to start showing up. It is quite important to hire a dedicated team of minimum of 2-3 staff in Big Data team having expertise with in the area of both, data processing and data analytics.
Big Data vis-a-vis Business Domains: Another important point to consider is business domain you would want to consider for doing POCs. The idea is to pick one or two of the following and plan to identify data use-cases around which you would want to do one or more POCs. Following are some of the business domains (verticals) for your consideration:

  • Online + Advertising
  • Tele-communications
  • Financial
  • Insurance
  • Healthcare & Biotechnology
  • Automotive
  • Retail & eCommerce
  • Industrial controls

Key aspects of Big Data COE: As part of setting up the COE, following are the key areas where one would want to focus:

  • Big Data Team
  • Big Data Lab
  • Proof-of-concepts (POCs)

Following is discussed the above three aspects of Big Data COE.

 

Big Data Team

While setting up a team for Big Data, one needs to pay attention to the fact that the team needs to have a balance between having staff with skill-sets in following areas:

  • Data processing: For data processing, the staff would require to know (or get trained) in following two areas:
    • Open-source frameworks such as Hadoop, HBase, Hive, PIG, Solr etc
    • Big Data platforms provided by different vendors
  • Data analytics: Team needs to consist staff having expertise on different aspects of data science. This is quite a tricky one. It is predicted that there is going to be a great shortage of data scientist in near future. Having said that, the interesting thing is that not everyone could decide to get on board with data science and become expert at it. There are different aspects of Data Science which requires a person to be quite analytical and good at algorithms.

Out of the above two, it is becoming difficult for companies to find data scientist although they are able to manage a team having expertise at Hadoop stack (data processing).

 

Big Data Lab

Once the team is taken care of, it is equally important to setup a Big Data lab which could consist primarily of following:

  • Hardware/Boxes having sufficiently larger RAM than usual for Big Data processing. As we started with Big Data, we hit the road block sooner due to limitation of our usual laptop having the RAM of 8GB.
  • Softwares consisting of open-source technology stack and commercial Big Data platforms.

 

Big Data Proof-Of-Concepts (Case Studies)

Once you are setup with Big data team and lab, it is of utmost importance to identify a couple of proof-of-concepts (POC) projects which you could showcase to your potential customers. This is primarily because it is crucial to demonstrate to the potential customer that you have enough capabilities in the area of Big Data processing and analytics to take on projects of large size.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog, Revive-n-Thrive.com

Recent Posts

Feature Engineering in Machine Learning: Python Examples

Last updated: 3rd May, 2024 Have you ever wondered why some machine learning models perform…

7 hours ago

Feature Selection vs Feature Extraction: Machine Learning

Last updated: 2nd May, 2024 The success of machine learning models often depends on the…

1 day ago

Model Selection by Evaluating Bias & Variance: Example

When working on a machine learning project, one of the key challenges faced by data…

1 day ago

Bias-Variance Trade-off in Machine Learning: Examples

Last updated: 1st May, 2024 The bias-variance trade-off is a fundamental concept in machine learning…

2 days ago

Mean Squared Error vs Cross Entropy Loss Function

Last updated: 1st May, 2024 As a data scientist, understanding the nuances of various cost…

2 days ago

Cross Entropy Loss Explained with Python Examples

Last updated: 1st May, 2024 In this post, you will learn the concepts related to…

2 days ago