Transcript Document
Introduction of HBase
Reporter: Hu Yi
2009-3-11
Overview
HBase is an Apache open source project
whose goal is to provide storage for the
Hadoop Distributed Computing
Environment.
Data is logically organized into tables,
rows and columns.
Outline
Data Model
Architecture and Implementation
Examples & Tests
<family>:<label>
Conceptual View
Row key
A data row has a
sortable row key
and an arbitrary
number of
columns.
A Time Stamp is
designated
automatically if not
artificially.
<family>:<label>
“com.apach
e.www”
“com.cnn.w
ww”
Time
Column
Stamp “contents:”
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache.
com”
“APACHE”
t15
“anchor:cnnsi.com”
“CNN”
t13
“anchor:my.look.c
a”
“CNN.com”
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
HStore
Physical Storage View
Physically, tables are
stored on a per-column
family basis.
Empty cells are not
stored in a columnoriented storage format.
Each column family is
managed by an HStore.
Data MapFile
Index MapFile
Memcache
Column
“contents:”
Row key
TS
“com.apache.w
ww”
t12
“<html>…”
t11
“<html>…”
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
“com.cn.www”
Row key
“com.apache.
www”
Key/Value
TS
t10
HStore
Column “anchor:”
“anchor:
apache.com”
“APACHE”
t9
“anchor:
cnnsi.com”
“CNN”
t8
“anchor:
my.look.ca”
“CNN.co
m”
Index key com.cn.www”
Time
Column
Stamp “contents:”
Row Ranges: Regions
Row key
Column “anchor:”
anchor:cc
value
anchor:cd
value
aaac
anchor:be
value
aaad
anchor:ad
value
t15
Row key/ Column ascending,
t13
Timestamp descending
t12
aaaa
Physically, tables are broken
t11
into row ranges contain rows
from start-key to end-keyt10
aaab
ba
bb
bc
t14
t5
ae
t3
af
aaae
Outline
Data Model
Architecture and Implementation
Examples & Tests
Three major components
The HBaseMaster
The HRegionServer
The HBase client
Master
HBaseMaster
2 META Region
2 META Region
2 META Region
2 META Region
1 ROOT Region
Assign regions to
HRegionServers.
1. ROOT region locates all the
META regions.
2. META region maps a number of
user regions.
3. Assign user regions to the
HRegionServers.
Enable/Disable table and
change table schema
Monitor the health of each
Server
Server
Server
Server
Server
USER Region
META Region
USER Region
ROOT Region
META Region
USER Region
Server
ROOT/META Table
Each row in the ROOT and META tables is
approximately 1KB in size. At the default size of
256MB.
1ROOTtable 2 METAregions
18
2 2 USERregions
18
18
254 KB 264 bytes
224TB
HRegionServer write
Write Requests
Read Requests
Cache Flushes
Compactions
Region Splits
Mapfile1.1
Mapfile1.2
Memcache1
Row key
“com.apac
he.ww
w”
“com.cnn.w
ww”
HLog
Time
Column
Stam
“contents:”
p
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache
.com”
“APACH
E”
t9
“anchor:cnnsi.co
m”
“CNN”
t8
“anchor:my.look.
ca”
“CNN.co
m”
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
Hstore1
Memcache2
Hstore2
HRegionServer
Write Requests
Read Requests
Cache Flushes
Compactions
Region Splits
Read
Row key
“com.apach
e.www”
“com.cnn.w
ww”
Mapfile1.1
Mapfile1.2
Memcache1
Hstore1
Time
Stam
p
Column
“contents:”
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache.
“APACHE”
com”
t9
“anchor:cnnsi.co
m”
“CNN”
“anchor:my.look.c
“CNN.com”
a”
t8
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
HRegionServer
Write Requests Row key
Read Requests
Cache Flushes “com.apach
e.www”
Compactions
Mapfile1.1
Region Splits
Mapfile1.2
Mapfile1.1
Mapfile1.3
“com.cnn.w
ww”
Mapfile1.2
Memcache1
Hstore1
Cache Flushes
HLog
Time
Stam
p
Column
“contents:”
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache.
“APACHE”
com”
t9
“anchor:cnnsi.co
m”
“CNN”
“anchor:my.look.c
“CNN.com”
a”
t8
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
HRegionServer
Write Requests Row key
Read Requests
Cache Flushes “com.apach
e.www”
Compactions
Region Splits
“com.cnn.w
ww”
Mapfile1.1
Mapfile1
Mapfile1.2
Memcache1
Hstore1
Compaction
s
Time
Stam
p
Column
“contents:”
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache.
“APACHE”
com”
t9
“anchor:cnnsi.co
m”
“CNN”
“anchor:my.look.c
“CNN.com”
a”
t8
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
HRegionServer
Write Requests
Read Requests
Cache Flushes
Compactions
Region Splits
Row key
“com.apac
he.ww
w”
“com.cnn.w
ww”
Mapfile1
Memcache1
Hstore1
Region Splits
Time
Stam
p
Column
“contents
:”
t12
“<html>…”
t11
“<html>…”
Column “anchor:”
t10
“anchor:apache
.com”
“APACH
E”
t9
“anchor:cnnsi.co
m”
“CNN”
t8
“anchor:my.look.
ca”
“CNN.co
m”
t6
“<html>…”
t5
“<html>…”
t3
“<html>…”
HBase Client
HBase Client
ROOT Region
HBase Client
META Region
HBase Client
Information cached
User Region
Outline
Data Model
Architecture and Implementation
Examples & Tests
Create MyTable
Row Key
Timestamp columnFamily1: columnFamily2:
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new
HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);
Insert Values
BatchUpdate batchUpdate = new
BatchUpdate("myRow",timestamp);
batchUpdate.put("columnFamily1:labela",Bytes.toBytes("l
abela value"));
batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“l
abelb value"));
table.commit(batchUpdate);
Row Key
Timestamp
columnFamily1:
ts1
labela
labela value
ts2
labelb
labelb value
myRow
Insert
160000
140000
120000
100000
80000
Hbase
60000
40000
20000
0
100000
10000
1000
100
10
1
1
10
100
1000
10000
100000
Insert
1000000
10000
1000
100
10
00
10
00
0
10
00
10
00
10
0
1
10
time(ms)
100000
Row*10 Column=1
Hbase
MySQL
Search
Row key
Select value from table where
key=‘com.apache.www’ AND
label=‘anchor:apache.com’
Time
Stamp
Column “anchor:”
t12
“com.apache.www”
t11
t10
“anchor:apache.com”
“APACHE”
t9
“anchor:cnnsi.com”
“CNN”
t8
“anchor:my.look.ca”
“CNN.com”
“com.cnn.www”
t6
t5
t3
Search Scanner
Row key
Time
Stamp
Select value from table
where anchor=‘cnnsi.com’
Column “anchor:”
t12
“com.apache.www”
t11
t10
“anchor:apache.com”
“APACHE”
t9
“anchor:cnnsi.com”
“CNN”
t8
“anchor:my.look.ca”
“CNN.com”
“com.cnn.www”
t6
t5
t3
Summary
Column-oriented modification more flexible.
Higher performance on row key clusters.
Future work
More test work
Optimization on search
Thank you