Breaking with Relational DBMS and Dating with Hbase

Download Report

Transcript Breaking with Relational DBMS and Dating with Hbase

Gaurav Kohli
Breaking with
Dating with
DBMS and
1
Xebia
me
Gaurav Kohli
[email protected]
Consultant
Xebia IT Architects
2

Why are we here ?

Something about RDBMS

Limitations of RDBMS

Why Hbase or any NoSql solution

Overview of Hbase

Specific Use cases

Paradigm shift in Schema Design

Architecture of Hbase

Hbase Interface – Java API, Thrift

Conclusion
3
Databases
4
Relational Databases have a lot of
5

Data Set going into PetaBytes

RDBMS don't scale inherently

Scale up/Scale out ( Load Balancing + Replication)

Hard to shard / partition

Both read / write throughput not possible


Transactional / Analytical databases
Specialized Hardware …... is very expensive

Oracle clustering
6
Master
Replication
Slave
7
Master
Writes
Reads
Slave nodes



MySQL master becomes a problem
All Slaves must have the same write capacity as master
Single point of failure, no easy failover
8
Master
Master
Replication
Slave
9
10
11






2006.11
 Google releases paper on BigTable
2007.2
 Initial HBase prototype created as Hadoop contrib.
2007.10
 First usable HBase
2008.1
 Hadoop become Apache top-level project and HBase becomes
subproject
2010.5~
 Hbase becomes Apache top-level project
2010.6


Hbase 0.26.5 released.
2010.10

12
HBase 0.89.2010092 – third developer release

Distributed

uses HDFS for storage

Column-Oriented

Multi-Dimensional

versions

High-Availability

High-Performance

Storage System
13
Hbase is

A Sql Database

No Joins, no query engine, no datatypes, no sql

No Schema

Denormalized data

Wide and sparsely populated data structure(keyvalue)

No DBA needed
14

Bigness


Big data, big number of users, big number of computers
Massive write performance

Facebook needs 135 billion messages a month

Twitter stores 7 TB data per day

Fast key-value access

Write availability

No Single point of failure
15
Specific

Managing large streams of non-transactional data: Apache
logs, application logs, MySQL logs, etc.

Real-time inserts, updates, and queries.

Fraud detection by comparing transactions to known
patterns in real-time.

Analytics - Use MapReduce, Hive, or Pig to perform
analytical queries
16

Column-oriented database

Table are sorted by Row

Table schema only defines Column families


column family can have any number of columns
Each cell value has a timestamp
17
18
19
Sorted Map(
RowKey, List(
SortedMap(
Column, List(
value, Timestamp
)
)
)
)
SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))
20
A BIG SORTED MAP
 Row Key+ Column Key + timestamp => value

Student table
Sorted by
Row key and
column key
Column family
Row Key
Column Key
Timestamp
Value
1
info:name
1273516197868
Gaurav
1
info:age
1273871824184
28
1
info:age
1273871823022
34
1
info:sex
1273746281432
Male
2
info:name
1273863723227
Harsh
3
Info:name
1273822456433
Raman
Column Qualifier/Name
21
Timestamp is a long value
2 Versions
of this row

Example of a Student and Subject
Student Table
Subject Table
PK
PK
id
m
name
age
sex
n
Student-Subject Table
student_id
subject_id
type
22
id
title
introduction
teacher_id
RDBMS
Example of a Student and Subject

Student table
key
name
age
sex
1
Gaurav
28
Male
Subject table
id
title
introduction
teacher_id
1
Hbase
Hbase is cool
10
Student-Subject table
student_id
subject_id
type
1
1
elective
23
Hbase

Student-Subject schema - Hbase
Student table
Row Key
Column family Column Keys
student_id
info
name, age, sex
student_id
subjects
Subject Id's as qualifier(key)
Subject table
Row Key
Column family Column Keys
subject_id
info
title, introduction, teacher_id
subject_id
students
Student id's as qualifier(key)
24
Hbase
Student-Subject schema - Hbase

Student table
key
info
subjects
1
info:name=Gaurav
info:age=28
info:sex=Male
subjects:1=”elective”
subjects:2=”main”
key
info
students
1
info:title=Hbase
info:introduction=Hbase is cool
info:teacher_id=10
students:1
students:2
Subject table
25
Attribute
Possible Values
Default
COMPRESSION
NONE,GZ,LZO
NONE
VERSIONS
1+
3
TTL
1-2147483647(seconds)
2147483647
BLOCKSIZE
1 byte – 2 GB
64k
IN_MEMORY
true,false
false
BLOCKCACHE
true,false
true
26

Region: Contiguous set of lexicographically sorted
rows

hbase.hregion.max.filesize (default:256 Mb)

Region hosted by Region Servers

Each Table is partitioned into Regions
27
Regions and
row1
row200
row201
row500
new row
28
Regions and
row1
row200
row201
row350
row 351
row 501
29

Master

Zookeeper

RegionServers

HDFS

MapReduce
30
31
– Java API, Thrift...
32
– Java API, Thrift...

Java

Thrift ( Ruby, Php, Python, Perl, C++... )

REST

Groovy DSL

MapReduce

Hbase Shell
33
– Java API, Thrift...

Java

Get

Put

Delete

Scan

IncrementalColumnValue
34
35

Hbase v/s RDBMS

Not a replacement

Solves only a small subset(~5%)
36


Where Sql makes life easy

Joining

Secondary Indexing

Referential Integrity (updates)

ACID
Where Hbase makes life easy

Dataset scale

Read/Write scale

Replication

Batch analysis
37
38
39

Hbase Apache (http://hbase.apache.org/)

Hbase Wiki (wiki.apache.org/hadoop/Hbase)

Hbase blog (blog.hbase.org)

Images from Google Search

http://www.larsgeorge.com/2009/10/hbasearchitecture-101-storage.html

http://highscalability.com/blog/2010/12/6/what-theheck-are-you-actually-using-nosql-for.html
40