Map-Reduce Tutorial

Download Report

Transcript Map-Reduce Tutorial

Cloud Computing: Project Tutorial
Hadoop Map-Reduce Programming
Wei Zhu
([email protected])
Department of Computer Science
University of Texas at Dallas
Agenda
 Map Reduce Environment Configuration
 Map Reduce Structure




Mapper Configuration
Combiner Configuration
Partitioner Configuration
Reducer Configuration
 Useful Logs
7/12/2016
Cloud Computing: Project Tutorial Hadoop MapReduce Programming
2 of 13
Map Reduce Environment Configuration
 Configuration for Ubuntu ( /etc/hadoop/conf )
 hadoop-env.sh
 log4j.properties
 where is the log
 hdfs-site.xml
 hadoop cluster configuration
 mapred-site.xml
 information about the job
 ……
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
3 of --
Map Reduce Structure
 Programmers must specify:
 map (k, v) → <k’, v’>*
 reduce (k’, v’) → <k’, v’>*
 All values with the same key are reduced together
 Optionally, also:
 partition (k’, number of partitions) → partition for k’
 Often a simple hash of the key, e.g., hash(k’) mod n
 Divides up key space for parallel reduce operations
 combine (k’, v’) → <k’, v’>*
 Mini-reducers that run in memory after the map phase
 Used as an optimization to reduce network traffic
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
4 of --
Map Reduce Structure
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
5 of --
Map Reduce Structure
 Class Mapper :
 setup (Mapper.Context context)
 Called once at the beginning of the task
 map (k, v) → <k’, v’>*
 cleanup (Mapper.Context context)
 Called once at the end of the task.
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
6 of --
Mapper Configuration
 How many maps?
 Number of Maps
 The number of maps is usually driven by the total size of the
inputs, that is, the total number of blocks of the input files.
 The right level of parallelism for maps seems to be around 10100 maps per-node
 setNumMapTasks(int)
 which only provides a hint to the framework is used to set it
even higher.
 Only existing in an old API JobConf
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
7 of --
Mapper Configuration
 How many maps?
 FileInputFormat
 mapred.max.split.size by setMaxInputSplitSize(Job, long)
 mapred.min.split.size by setMinInputSplitSize(Job, long)
 HDFS block: set the size to a smaller value for small
data using dfs.block.size in hdfs-site.xml
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
8 of --
Combiner Configuration
 Class Combiner
 Semi-reducer in mapreduce
 same interface with Reducer
 reduce()
 Process the output of map tasks before submitting to the
reducers
 Works on a single mapper
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
9 of --
Partitioner Configuration
 Class Partitioner
 Partitioning
 determining which reducer instance will receive which
intermediate keys and values.
 getPartition()
 receives a key and a value and the number of partitions to split
the data across
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
10 of --
Reducer Configuration
 Class Reducer:
 Job. setReducerClass(YourReducer.class)
 setup *
 Called once at the beginning of the task
 reduce (k, v) → <k’, v’>*
 cleanup *
 Called once at the end of the task
 Number of Reducer
 Job.setNumReduceTasks(int);;
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
11 of --
Useful Logs
 Node Resource Manager Logs
 /var/log/hadoop-yarn/yarn
 yarn-site.xml
 Application Name, Start date, User name, Hadoop
queue, Job outcome (success or failure), Duration,
Maximum memory allocated, Percent of cluster used by
the job, Details of job executed……
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
12 of --
Useful Logs
 Job History Logs
 /mr-history/done
 mapred-site.xml
 These files contain a wealth of performance data on the
execution of Mappers and Reducers, including HDFS
statistics, data volume processed, memory allocated etc.
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
13 of --
Useful Logs
 See Logs from GUI.
7/12/2016
Cloud Computing: Project One Tutorial Hadoop
Map-Reduce
14 of --