Hadoop-HBase-Tutorial - CSE Labs User Home Pages

Download Report

Transcript Hadoop-HBase-Tutorial - CSE Labs User Home Pages

Gowtham Rajappan

   HDFS – Hadoop Distributed File System modeled on Google GFS.

Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google Bigtable

   Master: hadoop01.cselabs.umn.edu

Slaves: hadoop02 – hadoop05.cselabs.umn.edu

You will require cselabs account to access this cluster. You can login to any of these machines from any cs/cselabs machine.

  Data is divided into various tables Table is composed of columns, columns are grouped into column-

families

  Partitioning  A table is horizontally partitioned into regions, each region is composed of sequential range of keys  Each region is managed by a RegionServer, a single RegionServer may hold multiple regions Persistence and data availability  HBase stores its data in HDFS, it doesn't replicate RegionServers and relies on HDFS replication for data availability.

 Region data is cached in-memory  Updates and reads are served from in-memory cache (MemStore)   MemStore is flushed periodically to HDFS Write Ahead Log (stored in HDFS) is used for durability of updates

 HBase shell provides interactive commands for manipulating database  Create/delete tables   Insert/update/read from tables Manage regions

  Hbase provides single row atomic operations  CheckAndPut – Similar to test-and-set  CheckAndDelete  All row operations are atomic no matter how many columns are involved. Hbase also provides row level exclusive locks  You can use these locks to implement single row level transactions

 HBase stores multiple versions of a column in a row. Each version is identified by a integer timestamp  By default system time is used as version timestamps. However user can specify a logical timestamp for versioning   Each update to a row creates a new version, for the specified column.

A version can be accessed or deleted using its timestamp. HBase allows to obtain list of all the versions.

  

Hadoop Home - http://hadoop.apache.org/ Hbase - http://hbase.apache.org/ API

http://hbase.apache.org/apidocs/

http://hadoop.apache.org/