Performance Evaluation on Hadoop Hbase

Download Report

Transcript Performance Evaluation on Hadoop Hbase

Performance Evaluation on
Hadoop Hbase
By
Abhinav Gopisetty
Manish Kantamneni
Introduction
• HBase is an open source, non-relational, distributed database modeled.
• HBase is a clone of Google’s BigTable and is written in Java
• It is developed as part of Apache Software Foundation's Apache
Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem),
providing BigTable-like capabilities for Hadoop.
• It has a Column oriented semi-structured data store.
• That is, it provides a fault-tolerant way of storing large quantities of sparse
data and also provides strong consistancy.
Interesting facts
• Facebook's Messaging Platform is
built using HBase.
• Twitter runs HBase across its entire
Hadoop cluster.
• Yahoo! Uses HBase to store
document fingerprints for detecting
Duplicates.
HBase
• HBase acts as the input/output for MapReduce jobs run in Hadoop
• Accessed through Java API
• Also accessed through Avro and REST Serialization systems
Related Work
• Previous evaluations of HBase – versions 0.19, 0.20, 0.89
• Research since 2007.
• There hasn’t been significant performance evaluation on
HBase 0.90
• FutureGrid project
• LZO Compression - It’s a real time compression library
LZO Compression
•
•
•
apt-get install liblzo2-dev
> cp build/hadoop-gpl-compression-0.1.0-dev/hadoop-gpl-compression-0.1.0dev.jar $HBASE_HOME/lib/
> tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar -xBvf - -C
$HBASE_HOME/lib/native
•
•
To compile it:
$ export CFLAGS="-m64“
•
•
Now Using LZO, we can access the database like this:
create 'mytable', {NAME=>'colfam:', COMPRESSION=>'lzo'}
Compatibility Issue
• All HBase and Hadoop versions aren’t compatible
with each other.
• Hence we’re focusing on evaluating the HBase 0.90.2
on Hadoop 0.20
Implementation
• A blackbox approach is not enough. Performance
testing helps determine the cost of the system.
• As a data store for loading/inserting large datasets
• Store large datasets analyzed by MapReduce jobs
• Real-time query services
Implementation.
• Install and configure Hadoop and HBase.
• Study the Hadoop/HBase API and write several HBase test
programs to demonstrate functionality.
• To run a Performance evaluation HBase by performing
different data model operations. which are get, put scans and
delete.
Data Operations
• Get Returns attributes for a specific row
• Put Add new rows to a table or updates existing rows.
• Scans Allows iteration over multiple rows for specified
attributes.
• Delete Removes a row from the table
To do a performance analysis by varying the data size and the
number of nodes to observe the behavior.
Thank you