Oracle for data management and physics databases

Download Report

Transcript Oracle for data management and physics databases

Storage for Data Management
and Physics Databases
Luca Canali, IT-DM
After-C5 Presentation
CERN, May 8th, 2009
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Outline
• An overview of the challenges for data
management and physics DB regarding
storage
– Description of main architectural details
– Production experience
– Our working model for storage sizing,
architecture and rollout
• Our activities in testing new storage
solutions
– Results detailed for FC and iSCSI
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 2
Data management and storage
• Service managers handle a ‘deep technology
stack’. Example, DBAs at CERN:
– More involved in lower stack than ‘traditional DBAs’
– Running DB Service (match users requirements,
help app developers to optimize applications)
– Keep system up, tune SQL execution, backup,
security, replication
– Database software installation, monitoring,
patching
– DBAs are involved in activities related to HW
provisioning and setup, tuning, monitoring:
Servers, Storage, Network
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 3
Enterprise-class vs. commodity HW
RAC
Grid-like, Scale OUT
ASM
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 4
A real-world example, RAC7
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 5
Why storage is a very interesting
area in the coming years
• Storage market is very conservative
– Few vendors share market for large enterprise
solutions
– Enterprise storage has typically a high premium
• Opportunities
– Commodity HW/grid-like solutions provide order
of magnitude gain in cost/performance
– New products coming to the market promise
many changes:
– Solid state disks, high capacity disks, high
performance and low cost interconnects
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 6
Commodity HW for critical data
handling services
• High-end and low-end storage can be easily
bought and used out of the box
• Low-end storage for critical services
requires customization
• Recipe for production rollout:
–
–
–
–
–
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Understand requirements
Consult storage and HW procurement experts
Decide on suitable architecture
Test and Measure (and learn from production)
Deploy the ‘right’ hardware and software to
achieve desired level of high availability and
performance and share with Tier1s and online
Storage for Data Management and Physics Databases, Luca Canali - 7
HW layer – HD, the basic element
• Hard disk technology
– Basic block of storage since 40 years
– Main intrinsic limitation: latency
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 8
HD specs
• HDs are limited
– In particular seek time is unavoidable (7.2k to 15k
rpm, ~2-10 ms)
– 100-200 IOPS
– Throughput ~100MB/s, typically limited by
interface
– Capacity range 300GB -2TB
– Failures: mechanical, electric, magnetic, firmware
issues.
– In our experience with ~2000 disks in prod we
have about 1 disk failure per week
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 9
Enterprise disks
• Performance
– Enterprise disks offer more ‘performance’
– They spin faster and have better interconnect
protocols (e.g. SAS vs SATA)
– Typically of low capacity
– Our experience: often not competitive in cost/perf
vs. SATA
• Reliability
– Evidence that low-end and high end disks don’t
differ significantly
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 10
HD wrap-up
• HD is a old but evergreen technology
– In particular disk capacities have increased of one
order of magnitude in just a few years
– At the same time prices have gone down (below
0.1 USD per GB for consumer products)
– 1.5 TB consumer disks, and 450GB enterprise
disks are common
– 2.5’’ drives are becoming standard to reduce
power consumption
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 11
Scaling out the disk
• The challenge for storage systems
–
–
–
–
–
Scale out the disk performance to meet demands
Throughput
IOPS
Latency
Capacity
• Sizing storage systems
– Must focus on critical metric(s)
– Avoid ‘capacity trap’
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 12
RAID and redundancy
• Storage arrays with RAID are the traditional
approach
– implement RAID to protect data.
– Parity based: RAID5, RAID6
– Stripe and mirror: RAID10
• Scalability problem of RAID
– For very large configurations the time between two
failures can become close to RAID volume rebuild
time (!)
– That’s also why RAID6 is becoming more popular
than RAID5
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 13
Beyond RAID
• Google and Amazon don’t use RAID
• Main idea:
– Divide data in ‘chunks’
– Write multiple copies of the chunks
– Examples: Google file system, Amazon S3
• Additional advantages:
– Removes the constraint of locally storing
redundancy inside one storage arrays
– Facilitates moving, refreshing, and relocating data
chunks
– Allows to deploy low-cost arrays
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 14
Our experience
• Physics DB storage uses Oracle ASM
– Volume manager and cluster file system
integrated with Oracle
– Soon to serve also a general-purpose cluster file
system (involved in 11gR2 beta testing)
– Oracle files are divided in chunks
– Chunks are distributed evenly across storage
– Chunks are written in multiple copies (2 or 3 it
depends on file type and configuration)
– Allows the use of low-cost storage arrays: does
not need RAID support (JBOD is enough)
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 15
The interconnect
• Several technologies available with different
characteristics
–
–
–
–
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
SAN
NAS
iSCSI
Direct attach
Storage for Data Management and Physics Databases, Luca Canali - 16
The interconnect
• Throughput challenge
– It takes 1 day to copy/backup 10 TB over 1
GBPS network
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 17
Fiber channel SAN
• FC SAN is currently most used architecture
for enterprise level storage
– Fast and low overhead on server CPU
• Used for physics DBs and Tier1s
• SAN networks with max 64 ports at low cost
– Measured: 8 Gbps transfer rate (4+4 dual ported
HBAs for redundancy and load balancing)
– Proof of concept FC backup (LAN free) reached
full utilization of tape heads
– Scalable: proof of concept ‘Oracle supercluster’ of
410 SATA disks, and 14 dual quadcore servers
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 18
Case Study: the largest cluster I have
ever installed, RAC5
• The test used:14 servers
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 19
Multipathed fiber channel
• 8 FC switches: 4Gbps (10Gbps uplink)
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 20
Many spindles
• 26 storage arrays (16 SATA disks each)
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 21
Case Study: I/O metrics for the RAC5
cluster
• Measured, sequential I/O
– Read: 6 GB/sec
– Read-Write: 3+3 GB/sec
• Measured, small random IO
– Read: 40K IOPS (8 KB read ops)
• Note:
– 410 SATA disks, 26 HBAS on the storage arrays
– Servers: 14 x 4+4Gbps HBAs, 112 cores, 224
GB of RAM
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 22
Testing storage
• ORION
– Oracle provides a testing utility that has proven
to give same results as more complex SQL
based tests
– Sharing experience: it’s not a DBA tool, it can be
used to test storage for other purposes
– Used for stress testing (our experience:
identified controller problems in RAC5 in 2008)
• In the following some examples of results
– Metrics measured for various disk types
– FC results
– iSCSI 1 Gbps and 10 GigE results
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 23
Metrics of interest
• Basic IO metrics measured by ORION
– IOPS for random I/O (8KB)
– MBPS for sequential I/O (in chunks of 1 MB)
– Latency associated with the IO operations
• Simple to use
– Get started: ./orion_linux_em64t -run
simple -testname mytest -num_disks 2
– More info:
https://twiki.cern.ch/twiki/bin/view
/PSSGroup/OrionTests
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 24
ORION output, an example
Small Random IOPS (8kB, read-only, 128 LUNs)
14000
12000
IOPS
10000
8000
6000
4000
2000
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
95
0
85
0
75
0
65
0
55
0
45
0
35
0
25
0
15
0
1
0
Load generated by Orion
Storage for Data Management and Physics Databases, Luca Canali - 25
ORION results,
small random read IOPS
Disks Used
128x SATA
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Array
Infortrend
16-bay
120x Raptor
Infortrend
2.5’’
12-bay
144xWD
Infortrend
‘Green disks’ 12-bay
96x Raptor
Infortrend
3.5’’cmsonline 16-bay
80x SAS
Netapps
RAID-DP
Pics
IOPS
12000
IOPS / Mirrored
DISK Capacity
100
24 TB
17600
150
18 TB
10300
12600
16000
70
90
160
72 TB
22 TB
6.5 TB
17000
210
7.5 TB
Storage for Data Management and Physics Databases, Luca Canali - 26
iSCSI
• iSCSI is interesting for cost reduction
– Get rid of ‘specialized’ FC network
• Many concerns on performance though, due
to
– IP interconnect throughput
– CPU usage
– Adoption seems to be only for low-end systems at
the moment
– 10GigE tests very promising
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 27
iSCSI 1 Gbps, Infortrend
• Scalability tests IOPS, FC vs. iSCSI
Small random (8k) IO
1600
1400
iSCSI Gbps (Infortrend)
IOPS
1200
FC 4Gbps (infortrend)
1000
800
600
400
200
0
1
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
2
3
4
5
6
7
8
9
10
11
12
Data: D. Wojcik
Storage for Data Management and Physics Databases, Luca Canali - 28
iSCSI 1 Gbps, Infortrend
• Scalability tests IOPS, FC vs. iSCSI
Throughput (MBPS)
Sequential IO
450
400
350
300
250
200
150
100
50
0
iSCSI Gbps (Infortrend)
FC 4Gbps (infortrend)
1
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
2
3
4
5
6
7
8
9
10
11
12
Data: D. Wojcik
Storage for Data Management and Physics Databases, Luca Canali - 29
iSCSI tests, 10 GigE
• Recent ORION tests on 10 GigE iSCSI
– ‘CERN-made’ disk servers that export storage as
iSCSI over 10 GigE
– Details of HW in next slide
– ORION tests up to 3 disk arrays (of 14 drives)
– Almost linear scalability
– Up to 42 disks tested -> 4000 IOPS at saturation
– 85% CPU idle during test
– IOPS of a single disk: ~110 IOPS
– Overall, these are preliminary test data
Data: A. Horvath
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 30
iSCSI on 10 GigE HW details
• Test HW installed by IT-FIO and IT-CS
–
–
–
–
–
–
2 Clovertown quad-core processors of 2.00 GHz
Blackford mainboard
8 GB of RAM
16 SATA-2 drives of 500 GB, 7'200 rpm
RAID controller 3ware 9650SE-16ML
Intel 10GigE dual port server adapter
PCIexpress (EXPX9502CX4 - Oplin)
– HP Procurve 10GigE switch
Data: H. Meinhard
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 31
NAS
• IT-DES experience of using NAS for
databases
• Netapp filer can use several protocols, the
main being NFS
– Compared to FC throughput is limited by Gbps
Ethernet, trunking or use of 10 GigE also possible
• Overall different solution from SAN and iSCSI
– The filer contains a server with CPU and OS
– In particular the proprietary WAFL filesystem is
capable of creating read-only snapshots
– Proprietary Data ONTAP OS runs on the filer box
– Higher cost due to high-end features
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 32
The quest for ultimate latency
reduction
• Solid state disks provide unique specs
– Seek time are at least one order of magnitude
better than best HDs
– A single disk can provide >10k random read IOPS
– High read throughput
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 33
SSD (flash) problems
• Flash based SSD still suffer from major
problems for enterprise solutions
–
–
–
–
–
Cost/GB: more than 10 times vs. ‘normal HDs’
Small capacity compared to HDs
They have several issues with write performance
Limited number of erase cycles
Need to write entire cells (issue for transactional
activities)
– Some workarounds for write performance and cell
lifetime improvements are being implemented,
different quality from different vendors and grade
– A field in rapid evolution
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 34
Conclusions
• Storage technologies are in a very interesting
evolution phase
• On one side ‘old-fashioned storage technologies’
give more capacity and performance for a lower
price every year
– currently used for production by physics DB services
(offline and online) and Tier1s
• New ideas and implementations are emerging for
scaling out very large data sets without RAID
– Google file system, Amazon S3, SUN’s ZFS
– Oracle’s ASM (which is in production at CERN and Tier1s)
• 10 GigE Ethernet and SSD are new players in the
storage game with high potential
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
– 10 GigE iSCSI tests with FIO and CS are very promising
Storage for Data Management and Physics Databases, Luca Canali - 35
Acknowledgments
• Many thanks to Dawid, Jacek, Maria
• Andras, Andreas, Helge, Tim Bell, Bernd
• Eric, Nilo
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Storage for Data Management and Physics Databases, Luca Canali - 36