Study of Hbase

Download Report

Transcript Study of Hbase

Evaluation of Hbase Read/Write

(A study of Hbase and it’s benchmarks)

B Y V A I B H A V N A C H A N K A R A R V I N D D W A R A K A N A T H

Recap of Hbase

 Hbase is an open-source, distributed, column oriented and sorted-map data storage.

 It is a Hadoop Database; sits on HDFS.

 Hbase can support reliable storage and efficient access of a huge amount of structured data

Hbase Architecture

Recap of Hbase (contd.)

      Modeled after BigTable.

Map/reduce with Hadoop. Optimizations for real time queries.

No single point of failure.

Random access performance is like MySQL.

Application : Facebook Messaging Database.

Hbase Benchmark Techniques

 ‘Hadoop Hbase-0.20.2 Performance Evaluation’ by D. Carstoiu, A. Cernian, A. Olteanu. University of Bucharest.

 STRATEGY: Uses random read, writes to test and benchmark Hadoop with Hbase.

Hbase Benchmark Techniques (contd.)

 ‘Hadoop Hbase-0.20.2 Performance Evaluation’ by Kareem Dana at Duke University. It shows a varied set of test cases for executions to test HBase.

 STRATEGY: Tested on column families, columns, Sort and interspersed read/writes.

Yahoo! Cloud Serving Benchmark (YCSB)



‘Benchmarking Cloud Serving Systems with YCSB’ by Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears.

 This paper/project is designed to benchmark existing and newer cloud storage technologies.

 The benchmark is done so far on Hbase, Cassandra, MongoDb, Project Voldemort and SQL.

YCSB

 The benchmark tool uses Workload files and the workload files can be customized according to users.

 You can specify 50/50 read/write, 95/5 r/w and so on.

 The code for the project is available on Github.

https://github.com/brianfrankcooper/YCSB.git

Example of a Workload

# Yahoo! Cloud System Benchmark # Workload A: Update heavy workload # Application example: Session store recording recent actions # # Read/update ratio: 50/50 # Default data size: 1 KB records (10 fields, 100 bytes each, plus key) # Request distribution: zipfian recordcount=1000 operationcount=1000 workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=true readproportion=0.5

updateproportion=0.5

scanproportion=0 insertproportion=0

Example of a Workload

# Yahoo! Cloud System Benchmark # Workload B: Read mostly workload # Application example: photo tagging; add a tag is an update, but most operations are to read tags # # Read/update ratio: 95/5 # Default data size: 1 KB records (10 fields, 100 bytes each, plus key) # Request distribution: zipfian recordcount=1000 operationcount=1000 workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=true readproportion=0.95

updateproportion=0.05

scanproportion=0 insertproportion=0

Our Project

 Install Hbase and get Hadoop to interface with it. Study benchmark techniques.

 Build a suite of codes and get it to run on Hadoop/Hbase.

 Include basic get, put, scan operations.

 Extend Word Count’s map-reduce to add to Hbase.

 Compare with Brisk Cassandra.

About Brisk

 Cassandra is a No-SQL BigTable-based database.

 Datastax enterprise built Brisk to interface Hadoop with Cassandra  Hadoop + Cassandra = Brisk!!

Brisk Architecture

Challenges Faced

 Configuration of Hbase is a tedious job! Not for the weak of will!

 Hbase subsequent releases do not keep the APIs consistent. So we ran into a lot of ‘deprecated API’ error messages.

 Hadoop compatibility with Hbase has to be verified before we proceed with installations.

Challenges Faced (contd.)

 Very few documents on installation details of Hbase.

 Even fewer for Brisk!

Performance for Word Count (2 nodes/2 cores each) 46 45 44 43 42 41 49 Time in secs 48 47 1 2

1 mapper/ 3 reducer

3 4

Average = 45.484

1 mapper/ 3 reducer 5 Number of readings

Performance for Word Count (contd.)

2 mapper/ 3 reducers

52,5 Time in secs 52 51,5 51 50,5 50 49,5 49 48,5 48 47,5 1 2 3 4

Average = 49.664

2 mapper/ 3 reducers 5 Number of readings

Performance for Word Count (contd.)

60 Time in secs 50 40 10 0 30 20 1 2

2 mapper/ 2 reducers

3 4

Average = 43.7008

2 mapper/ 2 reducers 5 Number of readings

Performance for a simple get/put/scan (2 nodes/ 2 core) 2,5 Average for get, scan and put are 1.841.6266 and 1.71.

2 1,5 1 0,5 0 1 2 3 4 get scan put 5 Number of readings

Performance for Word Count (3 nodes/2 cores each)

1 mapper/ 3 reducers

31 30 29 35 34 33 32 37 Time in secs 36 1 2 3 4 5

Average = 34.047

1 mapper/ 3 reducers Number of Readings

Performance for Word Count (contd.)

2 mappers/ 3 reducers

38,5 Time in secs 38 37,5 37 36,5 36 35,5 35 34,5 34 33,5 33 1 2 3 4 5

Average = 36.1012

2 mappers/ 3 reducers Number of Readings

Performance for Word Count (contd.)

2 mappers/ 2 reducers

40 35 30 25 20 15 10 5 0 50 Time in secs 45 1 2 3 4 5

Average = 37.4358

2 mappers/ 2 reducers Number of readings

Conclusions

 Brisk seems a lot more promising tool; as it integrates Cassandra and Hadoop together without much ado.

 Hbase/Hadoop APIs have to be made consistent. With standardization, it would be easier to work with them.

 Hbase Reads are faster than the Writes.

Thank You

Questions??

Study of Hbase

Transcript Study of Hbase

Evaluation of Hbase Read/Write

Recap of Hbase

Hbase Architecture

Recap of Hbase (contd.)

Hbase Benchmark Techniques

Hbase Benchmark Techniques (contd.)

Yahoo! Cloud Serving Benchmark (YCSB)

YCSB

Example of a Workload

Example of a Workload

Our Project

About Brisk

Brisk Architecture

Challenges Faced

Challenges Faced (contd.)

Performance for Word Count (contd.)

Performance for Word Count (contd.)

Performance for Word Count (contd.)

Performance for Word Count (contd.)

Conclusions

Thank You

Directory