Learn Hadoop Online

Download Report

Transcript Learn Hadoop Online

Learn
Hadoop Online training course is designed to enhance your knowledge and skills to
become a successful Hadoop developer and In-depth knowledge of core concepts
will be covered in the course along with implementation on varied industry use-cases.
take a look on HADOOP ADMIN AND DEVELOPER COURSE content
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Introduction to Hadoop
 What is Hadoop?
 The Hadoop Distributed File System
 Hadoop Map Reduce Works
 Anatomy of a Hadoop Cluster
 Master Daemons
 Name node
 Job Tracker
 Secondary name node
 Slave Daemons
 Job tracker
 Task tracker
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
HDFS (Hadoop Distributed File System)
 Blocks and Splits
 Input and HDFS Splits
 Data Replication
 Hadoop Rack Aware
 Data high availability
 Programming Practices
 Developing MapReduce Programs in
 Running without HDFS and MapReduce
 Running all daemons in a single node
 Running daemons on dedicated nodes
 Data Integrity
 Cluster architecture and block placement
 Accessing HDFS
 JAVA & CLI Approach
 Local Mode
 Pseudo-distributed Mode
 Fully distributed mode
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Setup Hadoop cluster of Apache, Cloudera and Horton Works
 Make a fully distributed Hadoop cluster on a single laptop/desktop
 Name Node in Safe mode
 Meta Data Backup
 Integrating Kerberos security in hadoop
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Writing a MapReduce Program
 Examining a Sample MapReduce Program, with several examples
 Basic API Concepts
 The Driver Code
 The Mapper
 The Reducer
 Hadoop's Streaming API
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Performing several hadoop jobs
 The configure and close Methods
 Sequence Files
 Record Reader
 Record Writer
 Role of Reporter
 Output Collector
 Processing XML files
 Counters
 Directly Accessing HDFS
 Tool Runner
 Using The Distributed Cache
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Common MapReduce Algorithms
 Sorting and Searching
 Creating an Inverted Index
 Indexing
 Identity Mapper
 Classification/Machine Learning
 Term Frequency - Inverse Document Frequency
 Word Co-Occurrence
 Identity Reducer
 MapReduce applications
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Debugging MapReduce Programs
 Testing with MRUnit
 Logging
 Other Debugging Strategies
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Advanced MapReduce Programming
 A Recap of the MapReduce Flow
 The Secondary Sort
 Customized Input Formats and Output Formats
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Monitoring and debugging on a Production Cluster
 Counters
 Skipping Bad Records
 Rerunning failed tasks with Isolation Runner
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Tuning for Performance in MapReduce
 Reducing network traffic with combiner
 Partitioners
 Using Compression
 Reusing the JVM
 Running with speculative execution
 Refactoring code and rewriting algorithms Parameters affecting Performance
 Other Performance Aspects
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
HBase
 HBase concepts
 HBase architecture
 Create database
 Region server architecture
 Develop and run sample applications
 File storage architecture
 Access data stored in HBase using clients
like Java, Python and Pearl
 HBase basics
 Column access
 HBase and Hive Integration
 Scans
 HBase admin tasks
 HBase use cases
 Defining Schema and basic operation
 Install and configure HBase on a multi node cluster
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
PIG
 Pig basics
 Install and configure PIG on a cluster
 PIG Vs MapReduce and SQL
 Pig Vs Hive
 Write sample Pig Latin scripts
 Modes of running PIG
 Running in Grunt shell
 Programming in Eclipse
 Running as Java program
 PIG UDFs
 Pig Macros
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Flume, Chukwa, Avro, Scribe, Thrift
 Flume and Chukwa concepts
 Use cases of Thrift
 Avro and scribe
 Install and configure flume on cluster
 Create a sample application to capture logs from Apache using flume
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
CDH4 Enhancements
 Name Node High – Availability
 Name Node federation
 Fencing
 YARN
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Hadoop Challenges
 Hadoop disaster recovery
 Hadoop suitable cases
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222