Tag Archives: Hadoop

Hadoop Map-Reduce Explained with an Example

This article represents key steps of Hadoop Map-Reduce Jobs using a word count example. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key steps of how Hadoop MapReduce works in a word count problem: Input is fed to a program, say a RecordReader, that reads data line-by-line or record-by-record. Mapping process starts which includes following steps: Combining: Combines the data (word) with its count such as 1 Partitioning: Creates one partition for each word occurence Shuffling: Move words to right partition Sorting: Sort the partition by word Last step is Reducing which comes up with …

Continue reading

Posted in Big Data. Tagged with , , .

Big Data – How Data is Retrieved and Written from/to HDFS?

This blog represents my notes on how data is read and written from/to HDFS. Please feel free to suggest if it is done otherwise. Following are steps using which clients retrieve data from HDFS: Clients ask Namenode for a file/data block Name-node returns data node information (ID) where the file/data blocks are located Client retrieves data directly from the data node. Following are steps in which data is written to HDFS: Clients ask Name-node that they want to write one or more data blocks pertaining to a file. Name-node returns data nodes information to which these data blocks needs to be written Clients write each data block to the data nodes suggested. The …

Continue reading

Posted in Big Data. Tagged with , , .

Key Training Topics for Hadoop Developer

hadoop training

This article represents key topics that one would want to learn in order to become a Hadoop Developer. One may also check these topics against topics provider by the training vendor. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key areas tof focus for learning/training which are described later in this article: Java Essentials Hadoop Essentials Java Essentials As Hadoop is based on Java programming language, one would want to get expertise of at least intermediary level to do good with Hadoop development. Following are some of the key concepts that one would want to …

Continue reading

Posted in Career Planning. Tagged with , .

Big Data – Free Hadoop Online Training Course from MapR

This article represents quick information on free Hadoop online on-demand training that has been announced yesterday by MapR Technologies, the Hadoop distribution specialist. I took Hadoop Essentials course and I must say that I liked the training session. The downside of these training sessions is that you would soon hit MapR related technologies in relation with MapReduce, HBase, HDFS. However, that said, its worth giving a shot. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.   Training Courses for Hadoop Developer, Hadoop Administrator & Data Analyst The training includes topics related with a range of Hadoop technologies for …

Continue reading

Posted in Big Data, Career Planning. Tagged with , , .

Google Glass & Big Data – Boon for Crime Control

A class of bloggers & writers have been writing about the google glass hurting the privacy. Thus, this may pose barrier to widespread acceptance of google glass device. However, google glass shall surely act as a boon to crime control and sooner than later, government will get on board for acceptance for glass device for police personnel.   Google Glass for Capturing Pictures from Crime Spot However, to think of one of the out-of-box benefits provided by google glass, which is “take a picture”, this may prove to be a boon to police department across the globe. Imagine police personnel start wearing a cool glass device. They could easily capture …

Continue reading

Posted in Big Data, Google Glass. Tagged with , , .

Google Glass to Revolutionize Big Data

Google glass project, once in full swing and with full acceptance by consumers, will turn out to be a biggest source of data which could be treated best by applying big data technologies. Simply speaking, Big Data is data set having following characteristics: Volume Velocity Variety Veracity That said, Google Glass will add variety of data in greater volume at much greater velocity. Some of the existing big data technologies that can be thought to help great deal to store and process data acquired by Google Glass are following: Hadoop (HDFS & MapReduce) HBase for non-relational database to work with data stored with Hadoop Hive for business analytics Solr (Lucene) …

Continue reading

Posted in Big Data, Google Glass. Tagged with , , , .