NoSQL Databases: MongoDB vs Cassandra

Download Report

Transcript NoSQL Databases: MongoDB vs Cassandra

NOSQL DATABASES:
MONGODB VS CASSANDRA
INTRODUCTION

What is a Database?
 “… a repository with organized and structured data, … “
(Abramova & Bernardino, 2013-07)


Data can be accessed using DBMS (DataBase
Management System)
What is DBMS?
 “ DBMS can be defined as a collection of mechanisms
that enables storage, edit and extraction of data” (Abramova
& Bernardino, 2013-07)
SQL

SQL: Structured Query Language

Became standard for:
Data interaction
 Data manipulation


Data Stored as set of tables

Accessing data from different tables at the same time
is possible.
NOSQL


Carlo Strozzi presented NoSQL in 1980, back then, it
refers to an open source database that didn’t use SQL
interface.
Carlo Strozzi preferred to call it “noseequel” or
“NoRel”

Principle Difference

Popular after San Francisco conference held 2009

Why do we need NoSQL?

In SQL ,efficiency in information extraction is affected by
the growth of data stored & used
CAP THEOREM

Based from CAP theorem, the following
guarantees can be defined:
Consistency
 Availability
 Partition tolerance


CAP theorem derives Relational and NoSQL
principles
ACID


“ACID is a principle based on CAP theorem and
used as set of rules for relational database
transactions.“ (Abramova & Bernardino, 2013-07)
ACID guarantees:
Atomic
 Consistent
 Isolated
 Durable


What if the amount of data is large?

ACID may be hard to accomplish!
BASE PRINCIPLE & NOSQL

BASE principle:
Basically Available
 Soft state
 Eventually consistent


BASE still follows CAP theorem.

Two of the three guarantees should be selected if the
system is distributed.
TYPES OF NOSQL DATABASES

More than 150 different NoSQL databases
Based on same principles
 Has some different characteristics.


Categories:
Key-value Store
 Document Store
 Column-family
 Graph database

KEY-VALUE STORE

Data is stored as a group of key and value

All keys are unique


Data Access is done by relating those keys to
values
Hash contains all keys in order to provide
information when needed
DOCUMENT STORE

Databases are defined as set of Key-value stores
that gets transformed into documents.

Each document is identified by unique key

Data access can be done using:
key
 specific value

COLUMN FAMILY
Similar to relational database model
 Structure:

Column
 Super-Column
 Column family

Structure of database is defined by supercolumns and column families.
 Data access is accomplished by specifying column
family, key and column in order to get value,
using following structure:
 <columnFamily>.<key>.<column> = <value>

GRAPH DATABASE

Those databases are used when data can be
represented as graph, for example, social
networks.
MONGODB


“MongoDB is an open source NoSQL database
developed in C++” (Abramova & Bernardino, 2013-07).
MongoDB is a document store database


Documents are gathered into groups according to
their structure
CAP theorem
Consistency
 Partition tolerance

MONGODB (CONT.)

Description
Data is sent to disc every 60 seconds.
 Everything is flushed to disc once new files are
created
 Each document is identified by “id” field
 An index for the “id” field is created


Characteristics
Durability
 Concurrency

MONGODB CHARACTERISTICS

Durability
Durability of data is accomplished by the creation of
replicas.
 Master-Slave technique

Master: read & write
 Slave: read
 Slave with recent data becomes Master if the Master goes
down



Replicas are asynchronous
Concurrency
 Locks
CASSANDRA


“Cassandra is a NoSQL database developed by Apache
Software Foundation; written in Java” (Abramova & Bernardino, 2013-07)
Similar to the usual relational model

Difference is that stored data can be:



semi structured
unstructured.
CAP theorem
Partition tolerance
 High Availability


Designed to save large amount of data and deal with
huge volumes in an efficient way.
CASSANDRA (CONT.)

Peer-to-peer architecture (NO MASTER)


High availability
High scalability

Replicates data over multiple nodes in a cluster.

Replication Factor: Total number of replicas.
RF(1): 1 copy of each row on 1 node
 RF(2): 2 copies of same records on 2 nodes


Fail nodes are replaced with no downtime, and
they are detected using “gossip” protocols
CASSANDRA (CONT.)

Replication Strategy:
Simple: single data center
 Network Topology: multiple data centers


Cassandra Characteristics:

Durability:
Two replication types:
 Synchronous
 Asynchronous
 All writes & redundancies are known using a commit log.


Indexing:


“Each node maintains the indexes of the table it manages”
Data is manipulated using CQL
YCSB


“The YCSB – Yahoo! Cloud Serving Benchmark
is one of the most used benchmarks to test
NoSQL databases” (Abramova & Bernardino, 2013-07).
YCSB has a client that consists of two parts:



Workload generator
Set of workloads.
Workloads are combinations of:
read
 Write
 update
operations are done on randomly chosen records.

WORKLOAD A: 50%READS & 50% UPDATES
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 19
WORKLOAD B: 95% READS & 5%UPDATES
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
WORKLOAD C: 100% READS
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
WORKLOAD F: READ-MODIFY-WRITE
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
WORKLOAD G: 5% READS 95% UPDATES
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
WORKLOAD H: 100% UPDATES
Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 21