Breaking with Relational DBMS and Dating with Hbase
Download
Report
Transcript Breaking with Relational DBMS and Dating with Hbase
Gaurav Kohli
Breaking with
Dating with
DBMS and
1
Xebia
me
Gaurav Kohli
[email protected]
Consultant
Xebia IT Architects
2
Why are we here ?
Something about RDBMS
Limitations of RDBMS
Why Hbase or any NoSql solution
Overview of Hbase
Specific Use cases
Paradigm shift in Schema Design
Architecture of Hbase
Hbase Interface – Java API, Thrift
Conclusion
3
Databases
4
Relational Databases have a lot of
5
Data Set going into PetaBytes
RDBMS don't scale inherently
Scale up/Scale out ( Load Balancing + Replication)
Hard to shard / partition
Both read / write throughput not possible
Transactional / Analytical databases
Specialized Hardware …... is very expensive
Oracle clustering
6
Master
Replication
Slave
7
Master
Writes
Reads
Slave nodes
MySQL master becomes a problem
All Slaves must have the same write capacity as master
Single point of failure, no easy failover
8
Master
Master
Replication
Slave
9
10
11
2006.11
Google releases paper on BigTable
2007.2
Initial HBase prototype created as Hadoop contrib.
2007.10
First usable HBase
2008.1
Hadoop become Apache top-level project and HBase becomes
subproject
2010.5~
Hbase becomes Apache top-level project
2010.6
Hbase 0.26.5 released.
2010.10
12
HBase 0.89.2010092 – third developer release
Distributed
uses HDFS for storage
Column-Oriented
Multi-Dimensional
versions
High-Availability
High-Performance
Storage System
13
Hbase is
A Sql Database
No Joins, no query engine, no datatypes, no sql
No Schema
Denormalized data
Wide and sparsely populated data structure(keyvalue)
No DBA needed
14
Bigness
Big data, big number of users, big number of computers
Massive write performance
Facebook needs 135 billion messages a month
Twitter stores 7 TB data per day
Fast key-value access
Write availability
No Single point of failure
15
Specific
Managing large streams of non-transactional data: Apache
logs, application logs, MySQL logs, etc.
Real-time inserts, updates, and queries.
Fraud detection by comparing transactions to known
patterns in real-time.
Analytics - Use MapReduce, Hive, or Pig to perform
analytical queries
16
Column-oriented database
Table are sorted by Row
Table schema only defines Column families
column family can have any number of columns
Each cell value has a timestamp
17
18
19
Sorted Map(
RowKey, List(
SortedMap(
Column, List(
value, Timestamp
)
)
)
)
SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))
20
A BIG SORTED MAP
Row Key+ Column Key + timestamp => value
Student table
Sorted by
Row key and
column key
Column family
Row Key
Column Key
Timestamp
Value
1
info:name
1273516197868
Gaurav
1
info:age
1273871824184
28
1
info:age
1273871823022
34
1
info:sex
1273746281432
Male
2
info:name
1273863723227
Harsh
3
Info:name
1273822456433
Raman
Column Qualifier/Name
21
Timestamp is a long value
2 Versions
of this row
Example of a Student and Subject
Student Table
Subject Table
PK
PK
id
m
name
age
sex
n
Student-Subject Table
student_id
subject_id
type
22
id
title
introduction
teacher_id
RDBMS
Example of a Student and Subject
Student table
key
name
age
sex
1
Gaurav
28
Male
Subject table
id
title
introduction
teacher_id
1
Hbase
Hbase is cool
10
Student-Subject table
student_id
subject_id
type
1
1
elective
23
Hbase
Student-Subject schema - Hbase
Student table
Row Key
Column family Column Keys
student_id
info
name, age, sex
student_id
subjects
Subject Id's as qualifier(key)
Subject table
Row Key
Column family Column Keys
subject_id
info
title, introduction, teacher_id
subject_id
students
Student id's as qualifier(key)
24
Hbase
Student-Subject schema - Hbase
Student table
key
info
subjects
1
info:name=Gaurav
info:age=28
info:sex=Male
subjects:1=”elective”
subjects:2=”main”
key
info
students
1
info:title=Hbase
info:introduction=Hbase is cool
info:teacher_id=10
students:1
students:2
Subject table
25
Attribute
Possible Values
Default
COMPRESSION
NONE,GZ,LZO
NONE
VERSIONS
1+
3
TTL
1-2147483647(seconds)
2147483647
BLOCKSIZE
1 byte – 2 GB
64k
IN_MEMORY
true,false
false
BLOCKCACHE
true,false
true
26
Region: Contiguous set of lexicographically sorted
rows
hbase.hregion.max.filesize (default:256 Mb)
Region hosted by Region Servers
Each Table is partitioned into Regions
27
Regions and
row1
row200
row201
row500
new row
28
Regions and
row1
row200
row201
row350
row 351
row 501
29
Master
Zookeeper
RegionServers
HDFS
MapReduce
30
31
– Java API, Thrift...
32
– Java API, Thrift...
Java
Thrift ( Ruby, Php, Python, Perl, C++... )
REST
Groovy DSL
MapReduce
Hbase Shell
33
– Java API, Thrift...
Java
Get
Put
Delete
Scan
IncrementalColumnValue
34
35
Hbase v/s RDBMS
Not a replacement
Solves only a small subset(~5%)
36
Where Sql makes life easy
Joining
Secondary Indexing
Referential Integrity (updates)
ACID
Where Hbase makes life easy
Dataset scale
Read/Write scale
Replication
Batch analysis
37
38
39
Hbase Apache (http://hbase.apache.org/)
Hbase Wiki (wiki.apache.org/hadoop/Hbase)
Hbase blog (blog.hbase.org)
Images from Google Search
http://www.larsgeorge.com/2009/10/hbasearchitecture-101-storage.html
http://highscalability.com/blog/2010/12/6/what-theheck-are-you-actually-using-nosql-for.html
40