An-Introduction-to-Apache.pptx
Download
Report
Transcript An-Introduction-to-Apache.pptx
Apache Hadoop MapReduce
What is it ?
Why use it ?
How does it work
Some examples
Big users
MapReduce – What is it ?
Processing engine of Hadoop
Developers create Map and Reduce jobs
Used for big data batch processing
Parallel processing of huge data volumes
Fault tolerant
Scalable
MapReduce – Why use it ?
Your data in Terabyte / Petabyte range
You have huge I/O
Hadoop framework takes care of
Job and task management
Failures
Storage
Replication
You just write Map and Reduce jobs
MapReduce – How does it work ?
Take word counting as an example, something that Google does
all of the time.
MapReduce – How does it work ?
Input data split into shards
Split data mapped to key,value pairs i.e. Bear,1
Mapped data shuffled/sorted by key i.e. Bear
Sorted data reduced i.e. Bear, 2
Final data stored on HDFS
There might be extra map layer before shuffle
JobTracker controls all tasks in job
TaskTracker controls map and reduce
MapReduce - Some examples
A visual example with colours to show you the cycle
Split -> Map -> Shuffle -> Reduce
MapReduce - Some examples
A visual example of MapReduce with job and task trackers added to
individual map and reduce jobs.
Hadoop MapReduce – Big users
Users
Facebook
Yahoo
Amazon
Ebay