NoSQL Databases: MongoDB vs Cassandra
Download
Report
Transcript NoSQL Databases: MongoDB vs Cassandra
NOSQL DATABASES:
MONGODB VS CASSANDRA
INTRODUCTION
What is a Database?
“… a repository with organized and structured data, … “
(Abramova & Bernardino, 2013-07)
Data can be accessed using DBMS (DataBase
Management System)
What is DBMS?
“ DBMS can be defined as a collection of mechanisms
that enables storage, edit and extraction of data” (Abramova
& Bernardino, 2013-07)
SQL
SQL: Structured Query Language
Became standard for:
Data interaction
Data manipulation
Data Stored as set of tables
Accessing data from different tables at the same time
is possible.
NOSQL
Carlo Strozzi presented NoSQL in 1980, back then, it
refers to an open source database that didn’t use SQL
interface.
Carlo Strozzi preferred to call it “noseequel” or
“NoRel”
Principle Difference
Popular after San Francisco conference held 2009
Why do we need NoSQL?
In SQL ,efficiency in information extraction is affected by
the growth of data stored & used
CAP THEOREM
Based from CAP theorem, the following
guarantees can be defined:
Consistency
Availability
Partition tolerance
CAP theorem derives Relational and NoSQL
principles
ACID
“ACID is a principle based on CAP theorem and
used as set of rules for relational database
transactions.“ (Abramova & Bernardino, 2013-07)
ACID guarantees:
Atomic
Consistent
Isolated
Durable
What if the amount of data is large?
ACID may be hard to accomplish!
BASE PRINCIPLE & NOSQL
BASE principle:
Basically Available
Soft state
Eventually consistent
BASE still follows CAP theorem.
Two of the three guarantees should be selected if the
system is distributed.
TYPES OF NOSQL DATABASES
More than 150 different NoSQL databases
Based on same principles
Has some different characteristics.
Categories:
Key-value Store
Document Store
Column-family
Graph database
KEY-VALUE STORE
Data is stored as a group of key and value
All keys are unique
Data Access is done by relating those keys to
values
Hash contains all keys in order to provide
information when needed
DOCUMENT STORE
Databases are defined as set of Key-value stores
that gets transformed into documents.
Each document is identified by unique key
Data access can be done using:
key
specific value
COLUMN FAMILY
Similar to relational database model
Structure:
Column
Super-Column
Column family
Structure of database is defined by supercolumns and column families.
Data access is accomplished by specifying column
family, key and column in order to get value,
using following structure:
<columnFamily>.<key>.<column> = <value>
GRAPH DATABASE
Those databases are used when data can be
represented as graph, for example, social
networks.
MONGODB
“MongoDB is an open source NoSQL database
developed in C++” (Abramova & Bernardino, 2013-07).
MongoDB is a document store database
Documents are gathered into groups according to
their structure
CAP theorem
Consistency
Partition tolerance
MONGODB (CONT.)
Description
Data is sent to disc every 60 seconds.
Everything is flushed to disc once new files are
created
Each document is identified by “id” field
An index for the “id” field is created
Characteristics
Durability
Concurrency
MONGODB CHARACTERISTICS
Durability
Durability of data is accomplished by the creation of
replicas.
Master-Slave technique
Master: read & write
Slave: read
Slave with recent data becomes Master if the Master goes
down
Replicas are asynchronous
Concurrency
Locks
CASSANDRA
“Cassandra is a NoSQL database developed by Apache
Software Foundation; written in Java” (Abramova & Bernardino, 2013-07)
Similar to the usual relational model
Difference is that stored data can be:
semi structured
unstructured.
CAP theorem
Partition tolerance
High Availability
Designed to save large amount of data and deal with
huge volumes in an efficient way.
CASSANDRA (CONT.)
Peer-to-peer architecture (NO MASTER)
High availability
High scalability
Replicates data over multiple nodes in a cluster.
Replication Factor: Total number of replicas.
RF(1): 1 copy of each row on 1 node
RF(2): 2 copies of same records on 2 nodes
Fail nodes are replaced with no downtime, and
they are detected using “gossip” protocols
CASSANDRA (CONT.)
Replication Strategy:
Simple: single data center
Network Topology: multiple data centers
Cassandra Characteristics:
Durability:
Two replication types:
Synchronous
Asynchronous
All writes & redundancies are known using a commit log.
Indexing:
“Each node maintains the indexes of the table it manages”
Data is manipulated using CQL
YCSB
“The YCSB – Yahoo! Cloud Serving Benchmark
is one of the most used benchmarks to test
NoSQL databases” (Abramova & Bernardino, 2013-07).
YCSB has a client that consists of two parts:
Workload generator
Set of workloads.
Workloads are combinations of:
read
Write
update
operations are done on randomly chosen records.
WORKLOAD A: 50%READS & 50% UPDATES
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 19
WORKLOAD B: 95% READS & 5%UPDATES
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
WORKLOAD C: 100% READS
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
WORKLOAD F: READ-MODIFY-WRITE
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
WORKLOAD G: 5% READS 95% UPDATES
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
WORKLOAD H: 100% UPDATES
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 21