Following are the key steps of how Hadoop MapReduce works in a word count problem:
- Input is fed to a program, say a RecordReader, that reads data line-by-line or record-by-record.
- Mapping process starts which includes following steps:
- Combining: Combines the data (word) with its count such as 1
- Partitioning: Creates one partition for each word occurence
- Shuffling: Move words to right partition
- Sorting: Sort the partition by word
- Last step is Reducing which comes up with the result such as word count for each occurence of word.
Following diagram represents above steps.
- Map: This phase processes data in form of key-value pairs
- Partitioning/Shuffling/Sorting: This groups similar keys together and sort them
- Reduce: This places final result with the key.
- KNN vs Logistic Regression: Differences, Examples - December 2, 2023
- Linear Regression vs Logistic Regression: Differences - December 1, 2023
- 6 Types of Brainstorming Techniques for Ideas Generation - December 1, 2023