Transcript Document

Introduction of HBase
Reporter: Hu Yi
2009-3-11
Overview
HBase is an Apache open source project
whose goal is to provide storage for the
Hadoop Distributed Computing
Environment.
Data is logically organized into tables,
rows and columns.
Outline
 Data Model
 Architecture and Implementation
 Examples & Tests
<family>:<label>
Conceptual View
Row key
 A data row has a
sortable row key
and an arbitrary
number of
columns.
 A Time Stamp is
designated
automatically if not
artificially.
 <family>:<label>
“com.apach
e.www”
“com.cnn.w
ww”
Time
Column
Stamp “contents:”
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache.
com”
“APACHE”
t15
“anchor:cnnsi.com”
“CNN”
t13
“anchor:my.look.c
a”
“CNN.com”
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
HStore
Physical Storage View
 Physically, tables are
stored on a per-column
family basis.
 Empty cells are not
stored in a columnoriented storage format.
 Each column family is
managed by an HStore.
Data MapFile
Index MapFile
Memcache
Column
“contents:”
Row key
TS
“com.apache.w
ww”
t12
“<html>…”
t11
“<html>…”
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
“com.cn.www”
Row key
“com.apache.
www”
Key/Value
TS
t10
HStore
Column “anchor:”
“anchor:
apache.com”
“APACHE”
t9
“anchor:
cnnsi.com”
“CNN”
t8
“anchor:
my.look.ca”
“CNN.co
m”
Index key com.cn.www”
Time
Column
Stamp “contents:”
Row Ranges: Regions
Row key
Column “anchor:”
anchor:cc
value
anchor:cd
value
aaac
anchor:be
value
aaad
anchor:ad
value
t15
 Row key/ Column ascending,
t13
Timestamp descending
t12
aaaa
 Physically, tables are broken
t11
into row ranges contain rows
from start-key to end-keyt10
aaab
ba
bb
bc
t14
t5
ae
t3
af
aaae
Outline
 Data Model
 Architecture and Implementation
 Examples & Tests
Three major components
 The HBaseMaster
 The HRegionServer
 The HBase client
Master
HBaseMaster
2 META Region
2 META Region
2 META Region
2 META Region
1 ROOT Region
 Assign regions to
HRegionServers.
1. ROOT region locates all the
META regions.
2. META region maps a number of
user regions.
3. Assign user regions to the
HRegionServers.
 Enable/Disable table and
change table schema
 Monitor the health of each
Server
Server
Server
Server
Server
USER Region
META Region
USER Region
ROOT Region
META Region
USER Region
Server
ROOT/META Table
 Each row in the ROOT and META tables is
approximately 1KB in size. At the default size of
256MB.
1ROOTtable  2 METAregions
18
 2  2 USERregions
18
18
 254 KB  264 bytes
224TB
HRegionServer write
 Write Requests
 Read Requests
 Cache Flushes
 Compactions
 Region Splits
Mapfile1.1
Mapfile1.2
Memcache1
Row key
“com.apac
he.ww
w”
“com.cnn.w
ww”
HLog
Time
Column
Stam
“contents:”
p
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache
.com”
“APACH
E”
t9
“anchor:cnnsi.co
m”
“CNN”
t8
“anchor:my.look.
ca”
“CNN.co
m”
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
Hstore1
Memcache2
Hstore2
HRegionServer
 Write Requests
 Read Requests
 Cache Flushes
 Compactions
 Region Splits
Read
Row key
“com.apach
e.www”
“com.cnn.w
ww”
Mapfile1.1
Mapfile1.2
Memcache1
Hstore1
Time
Stam
p
Column
“contents:”
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache.
“APACHE”
com”
t9
“anchor:cnnsi.co
m”
“CNN”
“anchor:my.look.c
“CNN.com”
a”
t8
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
HRegionServer
 Write Requests Row key
 Read Requests
 Cache Flushes “com.apach
e.www”
 Compactions
 Mapfile1.1
Region Splits
Mapfile1.2
Mapfile1.1
Mapfile1.3
“com.cnn.w
ww”
Mapfile1.2
Memcache1
Hstore1
Cache Flushes
HLog
Time
Stam
p
Column
“contents:”
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache.
“APACHE”
com”
t9
“anchor:cnnsi.co
m”
“CNN”
“anchor:my.look.c
“CNN.com”
a”
t8
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
HRegionServer
 Write Requests Row key
 Read Requests
 Cache Flushes “com.apach
e.www”
 Compactions
 Region Splits
“com.cnn.w
ww”
Mapfile1.1
Mapfile1
Mapfile1.2
Memcache1
Hstore1
Compaction
s
Time
Stam
p
Column
“contents:”
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache.
“APACHE”
com”
t9
“anchor:cnnsi.co
m”
“CNN”
“anchor:my.look.c
“CNN.com”
a”
t8
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
HRegionServer
 Write Requests
 Read Requests
 Cache Flushes
 Compactions
 Region Splits
Row key
“com.apac
he.ww
w”
“com.cnn.w
ww”
Mapfile1
Memcache1
Hstore1
Region Splits
Time
Stam
p
Column
“contents
:”
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache
.com”
“APACH
E”
t9
“anchor:cnnsi.co
m”
“CNN”
t8
“anchor:my.look.
ca”
“CNN.co
m”
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
HBase Client
HBase Client
ROOT Region
HBase Client
META Region
HBase Client
Information cached
User Region
Outline
 Data Model
 Architecture and Implementation
 Examples & Tests
Create MyTable
Row Key
Timestamp columnFamily1: columnFamily2:
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new
HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);
Insert Values
BatchUpdate batchUpdate = new
BatchUpdate("myRow",timestamp);
batchUpdate.put("columnFamily1:labela",Bytes.toBytes("l
abela value"));
batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“l
abelb value"));
table.commit(batchUpdate);
Row Key
Timestamp
columnFamily1:
ts1
labela
labela value
ts2
labelb
labelb value
myRow
Insert
160000
140000
120000
100000
80000
Hbase
60000
40000
20000
0
100000
10000
1000
100
10
1
1
10
100
1000
10000
100000
Insert
1000000
10000
1000
100
10
00
10
00
0
10
00
10
00
10
0
1
10
time(ms)
100000
Row*10 Column=1
Hbase
MySQL
Search
Row key
Select value from table where
key=‘com.apache.www’ AND
label=‘anchor:apache.com’
Time
Stamp
Column “anchor:”
t12
“com.apache.www”
t11
t10
“anchor:apache.com”
“APACHE”
t9
“anchor:cnnsi.com”
“CNN”
t8
“anchor:my.look.ca”
“CNN.com”
“com.cnn.www”
t6
t5
t3
Search Scanner
Row key
Time
Stamp
Select value from table
where anchor=‘cnnsi.com’
Column “anchor:”
t12
“com.apache.www”
t11
t10
“anchor:apache.com”
“APACHE”
t9
“anchor:cnnsi.com”
“CNN”
t8
“anchor:my.look.ca”
“CNN.com”
“com.cnn.www”
t6
t5
t3
Summary
Column-oriented modification more flexible.
Higher performance on row key clusters.
Future work
More test work
Optimization on search
Thank you