Template - The MariaDB Blog

Download Report

Transcript Template - The MariaDB Blog

ScaleDB: Persistence for Stream Data
Data Velocity
(Driven by Performance)
ScaleDB: Big Fast Data w/MariaDB
2
In-Memory
SAP HANA
BigQuery
High-Velocity / Disk
ScaleDB
Disk
MariaDB, Oracle,
SQL Server, etc.
Data Volume
(Driven by Cost – DRAM vs. Disk)
Disk
Hadoop
Demo
• Payment Table
• P.K. * FK: Account, Time, * Fields: Store, Amount, Coupon
•
•
•
•
3
Inserts
Lookup by Primary Key
Lookup by Account (Foreign Key)
Complex queries - BI & analytics
Demo
© Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.
ScaleDB’s Solution
• 1M Inserts/Second (indexed) with Simultaneous Queries
• Commodity “Cloud” Instance Total: 6 Nodes, 48 cores, 0.2TB main memory
• ~1M inserts/second, cost is less than $15,000
• SAP HANA (In memory DBMS)
• Cluster total: 100 Nodes, 4,000 cores, 100TB of main memory
• “1.5M inserts/second” (Vishal Sikka, SAP TechED)
• In Memory: DRAM cost alone is ~ $2M
More Than 2 Orders of Magnitude Cost Advantage
5
Data Volumes are Exploding
Tweets per Day
iPhone Downloads
AWS S3 & Dropbox Data Objects
…Driven by new data sources and data types
Devices
6
Social
Log Files
Analytics
Business
Faster Insights = More Value
(Complements Kinesis, Storm, etc.)
Twitter Storm
Response Latency
0 ms
Milliseconds to minutes
Later. Possibly much later
Lower
Higher
7
Value of the Data to Users/Advertisers
Big Data
Fast Data
Twitter Storm
MillWheel
Big Data
Fast Data
• Real-Time Data BigQuery
• Ad Hoc (SQL) Processing
• ScaleDB & Stream Processors
• Pools of Data at Rest
• Batch (programmatic) Processing
• Hadoop
8
Hadoop’s Batch Processing
“…MapReduce technologies are good at handling large
volumes of data. But they are fundamentally batch-based, and
struggle with enabling real-time decisions on a neverending—and never fully complete—stream of data.”
Terry Hanold
Vice President of New Business Initiatives
Amazon AWS
9
Fast Data: The Car Metaphor
Limited View / Real-Time Data
No Historical View
10
Historical View
“Batch Lag”
Real-Time Data
Historical View
SQL Support
DRAM Too Expensive for Stream Data
Media Costs Based upon Data Volume (DRAM vs. Disk)
1TB
DRAM
10TB
$20,000
Disk
$200,000
$43
100TB
1 Petabyte
$2,000,000
$20,000,000
$430
$4,300
$43,000
This is why Amazon uses disk-based S3 (non-DBMS) for Kinesis
• 1M inserts/second (100 byte rows), 24 hours = >8.5 TB/Day
• Disk Media Cost = ~ $370
• DRAM Media Cost = ~ $172,800 (>450X more)
11
But Data Volumes Increase 78% CAGR
According to IDC1 and Gartner2 data volumes have been
measured to increase ten-fold every five years.
1. Gantz, John F. The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth
Through 2011. Tech. An IDC White Paper
12
2. Paquet, Raymond. “Technology Trends You Can’t Afford to Ignore.” Lecture. Gartner Webinar. Gartner.com. Gartner Inc.,
Jan. 2010.
In-Memory & Big Data
12
10
8
Data Volumes (78%)
6
DRAM Prices (-30%)
4
DRAM Affordability
(30%)
2
0
1
2
3
Years
13
4
5
Increase Multiplier (Volume/Affordability)
Increase Multiplier (Volume/Affordability)
Data Volume Growth Dramatically Outpaces DRAM Affordability
200
180
160
140
120
100
80
60
40
20
0
1
2
3
4
5
6
Years
7
8
9
10
ScaleDB: Big Fast Data w/MariaDB
Data Velocity
(Driven by Performance)
1,000,000 Inserts per second
14
In-Memory
SAP HANA
BigQuery
High-Velocity / Disk
ScaleDB
Disk
MariaDB, Oracle,
SQL Server, etc.
Data Volume
(Driven by Cost – DRAM vs. Disk)
Disk
Hadoop
BigQuery Cost: $86,400/day
ScaleDB Cost*: $46/day
* AWS: $28 for 8.4TB storage, $18 for 6
instances of heavy usage EBS optimized
How it Works
© Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.
Scaling the Database
MariaDB
DBMS
Instance
MariaDB
MyIsam
InnoDB
ScaleDB
Storage
Data
ScaleDB
16
Storage
Instance
Scaling the Database Tier
DBMS
Instance
Cluster
Manager
Storage
Instance
17
Storage
Instance
DBMS
Instance
DBMS
Instance
DBMS
Instance
Scaling the Storage Tier
DBMS
Instance
DBMS
Instance
DBMS
Instance
DBMS
Instance
Cluster
Manager
Storage
Instance
18
Storage
Instance
Storage
Instance
Storage
Instance
Storage
Instance
High-Availability
DBMS
Instance
DBMS
Instance
DBMS
Instance
DBMS
Instance
Cluster
Manager
Mirrored
Volumes
19
Storage
Instance
Storage
Instance
Storage
Instance
Storage
Instance
Storage
Instance
NoSQL v. MySQL
Function
Transactions
Joins
Data Consistency
SQL Support
ACID Compliant
Mature Ecosystem (e.g. MySQL tools, apps, developers)
Optimal for Analytics / BI / Reporting
Disk-Based Insert Performance
Ideal Use Case
20
NoSQL
No
No
No (Eventual)
No
No
No
No
25,000-40,000/second
Storing/Accessing
Individual Objects
ScaleDB
Yes
Yes
Yes
Yes
Yes
Yes
Yes
1,000,000/second
Processing Large
Quantities of Data
Push-Down: Distributed Parallel Processing
Query Response
Push Processing to the Data
MariaDB
21
Result: High-Performance
Parallel Processing
Similar to Map/Reduce
Query
Response
ScaleDB
Query
Response
Query
Response
ScaleDB
ScaleDB
ScaleDB
Storage
Storage
Storage
Customer Success Story
© Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.
Customer Success Story: Statricks
Target:
300M-450M
Listings per Day
From: eBay, Craigslist ….
Processing:
• Price trends
• Listing Longevity
• Spam Detection
• Ad Metrics
• Price Trend Time Series
• Statistical Analysis
23
Thank You
© Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.