Categories: Big Data

How to Start a Big Data Practice

This article represents key aspects of starting up Big Data practice in your organization. Currently, I have started working in the same area and this blog is the result of my research. Hope you find it useful. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

 

Big Data Center of Excellence (COE)

It may be a good idea to plan around setting up a Big Data Center of Excellence (COE)whose main objective would be take a holistic approach towards following two key aspects of Big Data from different perspectives such as setting up team, evaluating tools & frameworks, doing POCs etc.

  • Data processing
  • Data analytics

Senior Management Commitment: One of the most important aspect of running a successful Big Data COE is senior management commitment (towards sponsorship) for minimum of 1-1.5 years for results to start showing up. It is quite important to hire a dedicated team of minimum of 2-3 staff in Big Data team having expertise with in the area of both, data processing and data analytics.
Big Data vis-a-vis Business Domains: Another important point to consider is business domain you would want to consider for doing POCs. The idea is to pick one or two of the following and plan to identify data use-cases around which you would want to do one or more POCs. Following are some of the business domains (verticals) for your consideration:

  • Online + Advertising
  • Tele-communications
  • Financial
  • Insurance
  • Healthcare & Biotechnology
  • Automotive
  • Retail & eCommerce
  • Industrial controls

Key aspects of Big Data COE: As part of setting up the COE, following are the key areas where one would want to focus:

  • Big Data Team
  • Big Data Lab
  • Proof-of-concepts (POCs)

Following is discussed the above three aspects of Big Data COE.

 

Big Data Team

While setting up a team for Big Data, one needs to pay attention to the fact that the team needs to have a balance between having staff with skill-sets in following areas:

  • Data processing: For data processing, the staff would require to know (or get trained) in following two areas:
    • Open-source frameworks such as Hadoop, HBase, Hive, PIG, Solr etc
    • Big Data platforms provided by different vendors
  • Data analytics: Team needs to consist staff having expertise on different aspects of data science. This is quite a tricky one. It is predicted that there is going to be a great shortage of data scientist in near future. Having said that, the interesting thing is that not everyone could decide to get on board with data science and become expert at it. There are different aspects of Data Science which requires a person to be quite analytical and good at algorithms.

Out of the above two, it is becoming difficult for companies to find data scientist although they are able to manage a team having expertise at Hadoop stack (data processing).

 

Big Data Lab

Once the team is taken care of, it is equally important to setup a Big Data lab which could consist primarily of following:

  • Hardware/Boxes having sufficiently larger RAM than usual for Big Data processing. As we started with Big Data, we hit the road block sooner due to limitation of our usual laptop having the RAM of 8GB.
  • Softwares consisting of open-source technology stack and commercial Big Data platforms.

 

Big Data Proof-Of-Concepts (Case Studies)

Once you are setup with Big data team and lab, it is of utmost importance to identify a couple of proof-of-concepts (POC) projects which you could showcase to your potential customers. This is primarily because it is crucial to demonstrate to the potential customer that you have enough capabilities in the area of Big Data processing and analytics to take on projects of large size.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

2 months ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

2 months ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

3 months ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

3 months ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

3 months ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

3 months ago