An-Introduction-to-Apache.pptx

Transcript An-Introduction-to-Apache.pptx

Apache Hadoop MapReduce

What is it ?

Why use it ?

How does it work

Some examples

Big users
MapReduce – What is it ?

Processing engine of Hadoop

Developers create Map and Reduce jobs

Used for big data batch processing

Parallel processing of huge data volumes

Fault tolerant

Scalable
MapReduce – Why use it ?

Your data in Terabyte / Petabyte range

You have huge I/O

Hadoop framework takes care of


Job and task management

Failures

Storage

Replication
You just write Map and Reduce jobs
MapReduce – How does it work ?
Take word counting as an example, something that Google does
all of the time.
MapReduce – How does it work ?

Input data split into shards

Split data mapped to key,value pairs i.e. Bear,1

Mapped data shuffled/sorted by key i.e. Bear

Sorted data reduced i.e. Bear, 2

Final data stored on HDFS

There might be extra map layer before shuffle

JobTracker controls all tasks in job

TaskTracker controls map and reduce
MapReduce - Some examples
A visual example with colours to show you the cycle
Split -> Map -> Shuffle -> Reduce
MapReduce - Some examples
A visual example of MapReduce with job and task trackers added to
individual map and reduce jobs.
Hadoop MapReduce – Big users

Users


Facebook

Yahoo

Amazon

Ebay

An-Introduction-to-Apache.pptx

Transcript An-Introduction-to-Apache.pptx

Directory