Learn Hadoop Online
Download
Report
Transcript Learn Hadoop Online
Learn
Hadoop Online training course is designed to enhance your knowledge and skills to
become a successful Hadoop developer and In-depth knowledge of core concepts
will be covered in the course along with implementation on varied industry use-cases.
take a look on HADOOP ADMIN AND DEVELOPER COURSE content
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Introduction to Hadoop
What is Hadoop?
The Hadoop Distributed File System
Hadoop Map Reduce Works
Anatomy of a Hadoop Cluster
Master Daemons
Name node
Job Tracker
Secondary name node
Slave Daemons
Job tracker
Task tracker
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
HDFS (Hadoop Distributed File System)
Blocks and Splits
Input and HDFS Splits
Data Replication
Hadoop Rack Aware
Data high availability
Programming Practices
Developing MapReduce Programs in
Running without HDFS and MapReduce
Running all daemons in a single node
Running daemons on dedicated nodes
Data Integrity
Cluster architecture and block placement
Accessing HDFS
JAVA & CLI Approach
Local Mode
Pseudo-distributed Mode
Fully distributed mode
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Setup Hadoop cluster of Apache, Cloudera and Horton Works
Make a fully distributed Hadoop cluster on a single laptop/desktop
Name Node in Safe mode
Meta Data Backup
Integrating Kerberos security in hadoop
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Writing a MapReduce Program
Examining a Sample MapReduce Program, with several examples
Basic API Concepts
The Driver Code
The Mapper
The Reducer
Hadoop's Streaming API
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Performing several hadoop jobs
The configure and close Methods
Sequence Files
Record Reader
Record Writer
Role of Reporter
Output Collector
Processing XML files
Counters
Directly Accessing HDFS
Tool Runner
Using The Distributed Cache
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Common MapReduce Algorithms
Sorting and Searching
Creating an Inverted Index
Indexing
Identity Mapper
Classification/Machine Learning
Term Frequency - Inverse Document Frequency
Word Co-Occurrence
Identity Reducer
MapReduce applications
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Debugging MapReduce Programs
Testing with MRUnit
Logging
Other Debugging Strategies
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Advanced MapReduce Programming
A Recap of the MapReduce Flow
The Secondary Sort
Customized Input Formats and Output Formats
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Monitoring and debugging on a Production Cluster
Counters
Skipping Bad Records
Rerunning failed tasks with Isolation Runner
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Tuning for Performance in MapReduce
Reducing network traffic with combiner
Partitioners
Using Compression
Reusing the JVM
Running with speculative execution
Refactoring code and rewriting algorithms Parameters affecting Performance
Other Performance Aspects
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
HBase
HBase concepts
HBase architecture
Create database
Region server architecture
Develop and run sample applications
File storage architecture
Access data stored in HBase using clients
like Java, Python and Pearl
HBase basics
Column access
HBase and Hive Integration
Scans
HBase admin tasks
HBase use cases
Defining Schema and basic operation
Install and configure HBase on a multi node cluster
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
PIG
Pig basics
Install and configure PIG on a cluster
PIG Vs MapReduce and SQL
Pig Vs Hive
Write sample Pig Latin scripts
Modes of running PIG
Running in Grunt shell
Programming in Eclipse
Running as Java program
PIG UDFs
Pig Macros
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Flume, Chukwa, Avro, Scribe, Thrift
Flume and Chukwa concepts
Use cases of Thrift
Avro and scribe
Install and configure flume on cluster
Create a sample application to capture logs from Apache using flume
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
CDH4 Enhancements
Name Node High – Availability
Name Node federation
Fencing
YARN
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222
Hadoop Challenges
Hadoop disaster recovery
Hadoop suitable cases
www.vibloo.com/Hadoop-Online-Training
Skype Id: info.vibloo
Email: [email protected]
USA: +1-248-809-1418
IND: +91-40-3296-5222