Tag Archives: big data

CEP vs Traditional Database Examples

February 2, 2024 by Ajitesh Kumar · Leave a comment

In this blog, we will learn about the differences between complex event processing (CEP) and traditional database querying with the help of examples. We will learn about how these two methodologies tackle data to extract meaningful insights but in fundamentally different ways. In complex event processing, data flows dynamically which is then matched with pre-defined patterns thereby generating insights in real-time. Traditional Database Querying In a conventional database querying scenario, the data is stored first, and then queries are run against this stored data to find patterns or retrieve information. This process is reactive, in that the query is formulated based on a need to find out something specific about …

Continue reading →

Posted in Big Data, Database. Tagged with big data, database.

Hypothesis Testing Steps & Examples

August 30, 2023 by Ajitesh Kumar · Leave a comment

Hypothesis testing is a technique that helps scientists, researchers, or for that matter, anyone test the validity of their claims or hypotheses about real-world or real-life events in order to establish new knowledge. Hypothesis testing techniques are often used in statistics and data science to analyze whether the claims about the occurrence of the events are true, whether the results returned by performance metrics of machine learning models are representative of the models or they happened by chance. This blog post will cover some of the key statistical concepts including steps and examples in relation to what is hypothesis testing, how to formulate them and how to use them in …

Continue reading →

Posted in AI, Data Science, Machine Learning. Tagged with ai, big data, Data Science, machine learning.

85+ Free Online Books, Courses – Machine Learning & Data Science

June 25, 2022 by Ajitesh Kumar · Leave a comment

This post represents a comprehensive list of 85+ free books/ebooks and courses on machine learning, deep learning, data science, optimization, etc which are available online for self-paced learning. This would be very helpful for data scientists starting to learn or gain expertise in the field of machine learning / deep learning. Please feel free to comment/suggest if I missed mentioning one or more important books that you like and would like to share. Also, sorry for the typos. Following are the key areas under which books are categorized: Data science Pattern Recognition & Machine Learning Probability & Statistics Neural Networks & Deep Learning Optimization Data mining Mathematics Here is my post …

Continue reading →

Posted in Big Data, Books, Career Planning, Data Science, Deep Learning, Machine Learning, Online Courses. Tagged with big data, Data Science, Deep Learning, machine learning.

HBase Architecture Components for Beginners

October 15, 2017 by Ajitesh Kumar · Leave a comment

This blog represents high-level concepts on HBase architecture components. Following diagram represents the same: HBase Architecture Components – Key Building Blocks Following diagram represents the same: Pay attention to some of the following in relation to above diagram: HMaster: Responsible for coordinating the region servers including assigning regions on startup as well as recovery, and, monitoring region servers using Zookeeper Region Servers: Manages one or more regions Zookeeper: Zookeeper is used as a distributed coordination service for maintaining the server state of the cluster. Regions: Records in HBase tables are split horizontally based on the key range. Each of these splits can be called as Regions. A region contains all rows in …

Continue reading →

Posted in Big Data. Tagged with big data, hbase.

When a Spark application starts on Spark Standalone Cluster?

January 4, 2017 by Ajitesh Kumar · Leave a comment

This article represents detailed view on what happens when a driver program (spark application) is started on one of the worker node when working with Spark standalone cluster. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Snapshot into what happens when Spark Standalone Cluster Starts? Snapshot into what happens when a spark application (Spark Shell) starts on one of the worker nodes? Snapshot into what happens when a spark application (Spark Shell) stops on the worker node? Snapshot into what happens when Spark Standalone Cluster Starts? In our …

Continue reading →

Posted in Big Data, Dockers. Tagged with big data, spark.

Hello World with Apache Spark Standalone Cluster on Docker

December 30, 2016 by Ajitesh Kumar · 1 Comment

This article presents instructions and code samples for Docker enthusiasts to quickly get started with setting up Apache Spark standalone cluster with Docker containers. Thanks to the owner of this page for putting up the source code which has been used in this article. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Basic concepts on Apache Spark Cluster Steps to setup the Apache spark standalone cluster Code sample for Setting up Spark Code sample for Docker-compose to start the cluster Code sample for starting the Driver program using Spark …

Continue reading →

Posted in Big Data. Tagged with big data, spark.

Top 5 Pages listing Big Data Conferences in 2016

January 22, 2016 by Ajitesh Kumar · Leave a comment

This article represents top 5 pages listing global big data conferences coming up in 2016. Please feel free to comment/suggest if I missed to mention any other important pages. Also, sorry for the typos. Following are the top 5 pages: Global Big Data Conference KDNuggets List of Meetings/Conferences on Analytics, Big Data, Data Mining, Data Science Important Big Data events coming up in 2016 Big Data conference directory listing big data conferences happening around the world. O’Reilly List of conferences of on various topics including Big Data

Posted in Big Data. Tagged with big data.

Big Data – How Data is Retrieved and Written from/to HDFS?

November 30, 2015 by Ajitesh Kumar · Leave a comment

This blog represents my notes on how data is read and written from/to HDFS. Please feel free to suggest if it is done otherwise. Following are steps using which clients retrieve data from HDFS: Clients ask Namenode for a file/data block Name-node returns data node information (ID) where the file/data blocks are located Client retrieves data directly from the data node. Following are steps in which data is written to HDFS: Clients ask Name-node that they want to write one or more data blocks pertaining to a file. Name-node returns data nodes information to which these data blocks needs to be written Clients write each data block to the data nodes suggested. The …

Continue reading →

Posted in Big Data. Tagged with big data, Hadoop, HDFS.

Top 5 Usecases of Solr to Power Your Web & Mobile Search

May 13, 2015 by Ajitesh Kumar · Leave a comment

This article represents top 5 usecases for using Solr to power your web and mobile search. Note that in case of mobile search requirements, Solr exposes APIs that could be used to retrieve data from Solr index server and serve to mobile client. It also presents a classification of websites which are using Solr to fulfill their search requirements. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Top 5 Usecases for Solr Search Different Classes of Websites using Solr to Power Search Engines Top 5 Usecases for Solr Search Search Engine: Many …

Continue reading →

Posted in Big Data. Tagged with big data.

Key Training Topics for Hadoop Developer

May 12, 2015 by Ajitesh Kumar · Leave a comment

This article represents key topics that one would want to learn in order to become a Hadoop Developer. One may also check these topics against topics provider by the training vendor. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key areas tof focus for learning/training which are described later in this article: Java Essentials Hadoop Essentials Java Essentials As Hadoop is based on Java programming language, one would want to get expertise of at least intermediary level to do good with Hadoop development. Following are some of the key concepts that one would want to …

Continue reading →

Posted in Career Planning. Tagged with big data, Hadoop.

Dummies Notes on How Distributed Computing Works using Hadoop

May 11, 2015 by Ajitesh Kumar · Leave a comment

This article intends to present dummies notes on how distributed computing works using Hadoop. As Hadoop is inspired by Google GFS/Map-Reduce/BigTable paper,I have tried and refer to GFS/Map-Reduce/BigTable in this article appropriately wherever possible. One must note that distributed computing paradigm has become mainstream given the advent of Big Data related large scale project implementation going on in several companies. Please feel free to shout if you find discrepancies with my understanding and help me correct the mistakes. Simply speaking, distributed computing refers to the computing paradigm in which processing happens on multiple different boxes consisting of data and, the result is, then, aggregated appropriately to display the final result. In traditional …

Continue reading →

Posted in Big Data, Dummies. Tagged with big data.

Document Search Architecture to Search Millions of Documents

April 27, 2015 by Ajitesh Kumar · Leave a comment

This article represents different document search architectural models using which one could create a search architecture that could search through 100s of millions of documents in faster time (milliseconds) with most up-to-date and fresh results. If you are planning to create a document search infrastructure which could search millions of documents, and shows up results in less than a second time, go ahead and explore different models and adopt the one that suits your needs at this stage. Note that the models given below could scale to multiple data centers. In this blog, we shall try and examine different architecture models that could achieve the search timing of less than a …

Continue reading →

Posted in Big Data. Tagged with big data, nosql.

Machine Learning – Top 16 Learning Resources on Statistics

April 23, 2015 by Ajitesh Kumar · Leave a comment

This article represents some of the top learning resources (webpages, videos etc) on my frequent visit list. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key categories of webpages/videos that are expanded later in this article: Websites Quora Youtube Videos Coursera courses Khan Academy Top 16 Learning Resources on Statistics Folllowing is the list of URLs for these learning resources: Websites on Statistics Stattrek.com Elementary Statistics with R StatsDirect.com Usable Stats Quora.com Statistics Channel Probability & Statistics Statistics (Acacedmic Discipline) Bayesian Inference Youtube Videos Playlists on Statistics Brandon Foltz StatisticsFun JBStatistics Quantitative Specialists Coursera Courses …

Continue reading →

Posted in Big Data. Tagged with big data, machine learning.

Data Science – 175 Probability & Statistics Interview Questions

April 20, 2015 by Ajitesh Kumar · 1 Comment

This article presents URL and short description of around 175 probability & statistics objective questions which could prove very useful and helpful for those who are planning to attend one or more data scientist interviews in time to come. These tests/quizzes were created when I was learning probability and statistics some time back and, found various concepts interesting enough to be converted into quizzes for my future references. As probability & statistics form key to data science, it may be worth spending some time on these tests and check your understanding. You may also use this for your future reference. These questions could also be used for checking your concepts …

Continue reading →

Posted in Big Data, Career Planning, Interview questions. Tagged with big data, Data Science, interview, Interview questions.

Quick Cheat Sheet for Big Data Technologies

April 19, 2015 by Ajitesh Kumar · Leave a comment

This article represents quick details on some of the key open-source technologies (tools & frameworks) associated with Big Data. The objective of this article is to present quick details on open-source tools & frameworks in a well-categorized manner using top-down approach where data engineering and data science aspects of Big Data is associated with relevant tools & framework. Most of these tools and frameworks could be found with commercial Hadoop distributions such as Cloudera, Hortonworks, MapR etc. Please feel free to comment/suggest if I missed to mention one or more important frameworks. Also, sorry for the typos. Following is the key classication of tools/frameworks that have been briefed later in …

Continue reading →

Posted in Big Data. Tagged with big data.

I found it very helpful. However the differences are not too understandable for me

Very Nice Explaination. Thankyiu very much,

in your case E respresent Member or Oraganization which include on e or more peers?

Such a informative post. Keep it up

Thank you....for your support. you given a good solution for me.

Tag Archives: big data

CEP vs Traditional Database Examples

Hypothesis Testing Steps & Examples

85+ Free Online Books, Courses – Machine Learning & Data Science

HBase Architecture Components for Beginners

When a Spark application starts on Spark Standalone Cluster?

Hello World with Apache Spark Standalone Cluster on Docker

Top 5 Pages listing Big Data Conferences in 2016

Big Data – How Data is Retrieved and Written from/to HDFS?

Top 5 Usecases of Solr to Power Your Web & Mobile Search

Key Training Topics for Hadoop Developer

Dummies Notes on How Distributed Computing Works using Hadoop

Document Search Architecture to Search Millions of Documents

Machine Learning – Top 16 Learning Resources on Statistics

Data Science – 175 Probability & Statistics Interview Questions

Quick Cheat Sheet for Big Data Technologies

ChatGPT Prompts (250+)

Recent Posts

Data Science / AI Trends

Free Online Tools

Newsletter

Recent Comments

Tag Archives: big data

ChatGPT Prompts (250+)

Recent Posts

Data Science / AI Trends

Free Online Tools

Newsletter

Tag Cloud

Recent Comments