Categories: Big Data

How to Start a Big Data Practice

This article represents key aspects of starting up Big Data practice in your organization. Currently, I have started working in the same area and this blog is the result of my research. Hope you find it useful. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

 

Big Data Center of Excellence (COE)

It may be a good idea to plan around setting up a Big Data Center of Excellence (COE)whose main objective would be take a holistic approach towards following two key aspects of Big Data from different perspectives such as setting up team, evaluating tools & frameworks, doing POCs etc.

  • Data processing
  • Data analytics

Senior Management Commitment: One of the most important aspect of running a successful Big Data COE is senior management commitment (towards sponsorship) for minimum of 1-1.5 years for results to start showing up. It is quite important to hire a dedicated team of minimum of 2-3 staff in Big Data team having expertise with in the area of both, data processing and data analytics.
Big Data vis-a-vis Business Domains: Another important point to consider is business domain you would want to consider for doing POCs. The idea is to pick one or two of the following and plan to identify data use-cases around which you would want to do one or more POCs. Following are some of the business domains (verticals) for your consideration:

  • Online + Advertising
  • Tele-communications
  • Financial
  • Insurance
  • Healthcare & Biotechnology
  • Automotive
  • Retail & eCommerce
  • Industrial controls

Key aspects of Big Data COE: As part of setting up the COE, following are the key areas where one would want to focus:

  • Big Data Team
  • Big Data Lab
  • Proof-of-concepts (POCs)

Following is discussed the above three aspects of Big Data COE.

 

Big Data Team

While setting up a team for Big Data, one needs to pay attention to the fact that the team needs to have a balance between having staff with skill-sets in following areas:

  • Data processing: For data processing, the staff would require to know (or get trained) in following two areas:
    • Open-source frameworks such as Hadoop, HBase, Hive, PIG, Solr etc
    • Big Data platforms provided by different vendors
  • Data analytics: Team needs to consist staff having expertise on different aspects of data science. This is quite a tricky one. It is predicted that there is going to be a great shortage of data scientist in near future. Having said that, the interesting thing is that not everyone could decide to get on board with data science and become expert at it. There are different aspects of Data Science which requires a person to be quite analytical and good at algorithms.

Out of the above two, it is becoming difficult for companies to find data scientist although they are able to manage a team having expertise at Hadoop stack (data processing).

 

Big Data Lab

Once the team is taken care of, it is equally important to setup a Big Data lab which could consist primarily of following:

  • Hardware/Boxes having sufficiently larger RAM than usual for Big Data processing. As we started with Big Data, we hit the road block sooner due to limitation of our usual laptop having the RAM of 8GB.
  • Softwares consisting of open-source technology stack and commercial Big Data platforms.

 

Big Data Proof-Of-Concepts (Case Studies)

Once you are setup with Big data team and lab, it is of utmost importance to identify a couple of proof-of-concepts (POC) projects which you could showcase to your potential customers. This is primarily because it is crucial to demonstrate to the potential customer that you have enough capabilities in the area of Big Data processing and analytics to take on projects of large size.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Retrieval Augmented Generation (RAG) & LLM: Examples

Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…

1 week ago

How to Setup MEAN App with LangChain.js

Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…

2 weeks ago

Build AI Chatbots for SAAS Using LLMs, RAG, Multi-Agent Frameworks

Software-as-a-Service (SaaS) providers have long relied on traditional chatbot solutions like AWS Lex and Google…

2 weeks ago

Creating a RAG Application Using LangGraph: Example Code

Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large…

3 weeks ago

Building a RAG Application with LangChain: Example Code

The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated…

3 weeks ago

Building an OpenAI Chatbot with LangChain

Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…

3 weeks ago