Database overview
Download
Report
Transcript Database overview
Part II NoSQL Database
(BigTable and Hbase)
Yuan Xue
([email protected])
Introduction
BigTable Background
Development began in 2004 at Google (published 2006)
need to store/handle large amounts of (semi)-structured data
Many Google projects store data in BigTable
Google’s web crawl
Google Earth
Google Analytics
HBase Background
open-source implementation of BigTable built on top of HDFS
Initial HBase prototype in 2007
Hadoop become Apache top-level project and HBase becomes subproject
in 2008
Road Map
Database User/Application Developer: How to use?
Database System Designer: How to design?
(Logic) data model and CRUD operations
Under the hood: (Physical) data model and distribution algorithm
Database Designer: How to link application needs with database design
Schema design
Data Model
A sparse, distributed, persistent multidimensional sorted map
Map indexed by a row key, column key, and a timestamp
(row:string, column:string, time:int64) uninterpreted byte array
Rows maintained in sorted lexicographic order based on row key
A row key is an arbitrary string
Every read or write of data under a single row is atomic.
Row ranges dynamically partitioned into tablets
Unit of distribution and load balancing
Applications can exploit this property for efficient row scans
Data Model
A sparse, distributed, persistent multidimensional sorted map
Map indexed by a row key, column key, and a timestamp
(row:string, column:string, time:int64) uninterpreted byte array
Columns grouped into column families
Column key = family:qualifier
Column family must be created before data can be stored in a column key.
Column families provide locality hints
Unbounded number of columns
Data Model
A sparse, distributed, persistent multidimensional sorted map
Map indexed by a row key, column key, and a timestamp
(row:string, column:string, time:int64) uninterpreted byte array
Timestamps
64 bit integers , Assigned by:
Bigtable: real-time in microseconds,
Client application: when unique timestamps are a necessity.
Items in a cell are stored in decreasing timestamp order.
Application specifies how many versions (n) of data items are maintained in a cell.
Bigtable garbage collects obsolete versions.
Data Model – MiniTwitter Example
View as a Map of Map
Operations & APIs in Hbase
Create and delete tables and column families; Modify meta-data
Operations are based on row keys
Single-row operations:
Multi-row operations:
Put
Get
Delete
Scan
MultiPut
Atomic R-M-W sequences on data stored in a single row key (No support for
transactions across multiple rows).
No built-in joins
Can be done in the application
Using scan() and get() operations
Using MapReduce
Creating a Table
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new
HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);
10
Altering a Table
Disable the table before changing the schema
11
Single-row operations: Put()
Insert a new record (with a new key), Or
Insert a record for an existing key
Implicit version number
(timestamp)
Explicit version number
12
Put() in MiniTwitter
Update information
Single-row operations: Get()
•
Given a key return corresponding record
For each value return the highest version
Can control the number of versions you want
15
Get() in MiniTwitter
Single-row operations: Delete()
Marking table cells as deleted
Multiple levels
Can mark an entire column family as deleted
Can make all column families of a given row as deleted
Delete d = new Delete(Bytes.toBytes(“rowkey”));
userTable.delete(d);
Delete d = new Delete(Bytes.toBytes(“rowkey”));
d.deleteColumns(
Bytes.toBytes(“cf”),
Bytes.toBytes(“attr”));
userTable.delete(d);
17
Multi-row operations: Scan()
18
Road Map
Database User/Application Developer: How to use?
(Logic) data model and CRUD operations
Database System Designer: How to design?
Under the hood: (Physical) data model and distribution algorithm
Single Node
Write, Read, Delete
Distributed System
Database Designer: How to link application needs with database design
Schema design
Basics
Terms
BigTable
Hbase
SSTable
HFile
memtable
MemStore
tablet
region
tablet server
RegionServer
HFile/SSTable
BigTable
Hbase
SSTable
HFile
memtable
MemStore
tablet
region
Basic building block of Bigtable
tablet server
RegionServer
Persistent, ordered immutable map from keys to values
Sequence of blocks on disk plus an index for block lookup
Stored in GFS (HDFS)
Can be completely mapped into memory
Supported operations:
Look up value associated with key
Iterate key/value pairs within a key range
64K
block
64K
block
64K
block
SSTable
Index
HDFS: Hadoop Distributed File Systems
How Hfile/SSTable is stored? On HDFS
HDFS as a reliable storage layer for Hbase
Handles checksums, replications, failover
Each file has three copies by default
Design
Client requests meta data about a file from namenode
Data is served directly from datanode
File Read/Write in HDFS
File Read
1. open
HDFS
client
3. read
6. close
File Write
2. get block locations
Distributed
FileSystem
NameNode
FSData
InputStream
1. create
HDFS
client
name node
3. write
7. close
client JVM
client JVM
client node
client node
FSData
OutputStream
2. create
NameNode
8. complete
name node
4. get a list of 3 data nodes
5. write packet
4. read from the closest node
Distributed
FileSystem
6. ack packet
5. read from the 2nd closest node
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
data node
data node
data node
data node
data node
data node
If a data node crashed, the crashed node is removed, current block receives a newer id so as to delete the
partial data from the crashed node later, and Namenode allocates an another node.
23
Hbase: Logic storage vs Physical storage
Tablet (Region)
Dynamically partitioned range of rows
Built from multiple SSTables
Column-Family oriented storage
Tablet
64K
block
Start:Alice00
64K
block
64K
block
BigTable
Hbase
SSTable
HFile
memtable
MemStore
tablet
region
tablet server
RegionServer
End:Dave11
SSTable
Index
64K
block
64K
block
64K
block
SSTable
Index
Table (HTable)
BigTable
Hbase
SSTable
HFile
memtable
MemStore
tablet
region
tablet server
Multiple tablets make up the table
The entire BigTable is split into tablets of contiguous ranges of rows
Approximately 100MB to 200MB each
Tablets are split as their size grows
SSTables can be shared
Tablet
Alice00
SSTable
RegionServer
Tablet
Dave11
SSTable
Emily
Darth
SSTable
HTable
SSTable
•Each column family is stored in a separate file
•Key & Version numbers are replicated with each column family
•Empty cells are not stored
Source: Graphic from slides by Erik Paulson
Tablet1
Tablet2
Table to Region
Physical Storage: MiniTwitter Example
HTable
Write Path in HBase
Hlog
(append only WAL on HDFS
One per RS)
Read Path in Hbase
Deletion and Compaction in HBase
Delete() will mark the record for deletion
A new “tombstone” record is written for that value
BigTable
Hbase
Merging
compaction
Minor compaction
Minor compaction
flush
Major compaction
Major compaction
Data Distribution and Serving -- Big Picture
32
Placement of Tablets and Data Serving
A tablet is assigned to one tablet server at a time.
Metadata for tablet locations and start/end row are stored in a special Bigtable cell
Master maintains:
The set of live tablet servers,
Current assignment of tablets to tablet servers (including the unassigned ones)
Region Servers – Physical Layout
RegionServer and DataNode
RegionServer and DataNode
Interacting with Hbase
Hbase Schema Design
How many column families should the table have?
What data goes into what column family?
How many columns should be in each column family?
What should the column names be? Although column names don’t
have to be defined on table creation, you need to know them when
you write or read data.
What information should go into the cells?
How many versions should be stored for each cell?
What should the row key structure be, and what should it contain?
MiniTwitter Review
Read operation
Whom does TheFakeMT follow?
Does TheFakeMT follow TheRealMT?
Who follows TheFakeMT?
Does TheRealMT follow TheFakeMT?
Write operation
A user follows someone
A user unfollows someone
MiniTwitter- Version 1
Version 2
Read operation
How many people
a user follows?
Atomic operation!
Version 3
Get rid of the
counter
Problem
Row access
overhead
Key Cardinality
Version 4
Wide table vs
tall table
Version 4 – client code
Version 5
Trick with hash code
Normalization vs Denormalization
Additional Reading and Reference