Transcript Document

Storage Tank in Data Grid
August 23, 2003
Shin, SangYong(syshin, #6468)
IBM Grid Computing
Storage Architecture Model
Application
- files stored on block storage
File System
Block Virtualization
Storage Devices
- all managed by storage mgmt s/w
Block subsystem
Storage Management
- app. data is in files
Block Virtualization
Today
No common view of block storage
Server impact on storage change
Emerging
Common view of block storage
No server impact on storage change
SAN
SAN
Block Virtualization
- IBM block virtualization is Lodestone
Extending Lodestone for Grid
LVE
RAID
Brick
LVE
...
Host
Host
Host
Application
LVE
High-end
Disk array
Shark, Brand X
LVE
Functions
Providing Virtual Disks
Online- Dynamic Volume Sizing
Advanced Copy Functions
Economic Disaster Recovery Solutions.
Different Level of Performance
Data Backup with low price disk
No Service Downtime
etc
Midrange
Disk array
FastT, Brand Y
LVE = Lodestone Virtualization Engine
File Systems - Current Capabilities Vs Grid Requirements
SAN
GPFS
HPC, Engineering, Digital Media
Access from servers in a cluster
Concurrent multiple I/Os
AIX and Linux OS only
No access to other FS data
Storage Tank
Commercial, file sharing, DB serving
Access from servers on SAN
All servers and OSes
No access to other FS data
Grid requirements
Access from any machine, any OS, anywhere
Access to all file system data
Planned Approach:
Allow remote access to our file systems
Provide multi-site support
Integrate data from other sources
NFSv4 support for our file systems
We believe NFSV4 will be an important protocol for the grid
• has the necessary extensions for robust security and WAN access
• is the first NFS protocol to come through the standards process
• proposed standard in Dec. 2002; expected to be draft standard by 4Q03
Our plan is to provide NFSv4 support for our file systems (J2, GPFS and
Storage Tank)
• Best case will be late 2004
Storage Tank (ST) - a SAN file system
GridFTP, NFS
LAN
ST Clients
file attributes, file location info, control info
Win2K
AIX
ST agent
ST agent
Solaris
ST agent
Linux
ST agent
ST
Server
ST
Server
data
Data
SN
Data Backup
ST
Server
Metadata
Meta
Prototypes: 2H02-1H03
Customer: CERN
Capabilities:

access to ST data through Globus GridFTP interface

register ST files in Globus Replica Location Service

enabled to support OGSA services (e.g. replication)
 centralized, policy-based storage management
 cross-platform file sharing
 performance comparable to local file system with direct client-to-storage data path
CERN Requirements
Data analysis of Large Hadron Collider (LHC) experiments
• Basic unit of data is an LHC event
• data represents physical collision between 2 protons
• 1 to few MBs
• stored within 1 GB files
• event metadata stored in an RDBMS
Tiered structure
• CERN is Tier 0
• event data and metadata distributed to Tier 1 centers
• physicists at Tier 2 centers analyze data at Tier 1 centers
2.4 PB of disk and 14 PB of tape by 2007
Grid access (AFS/DFS like), simple storage management
IP SANs, not FC
Our Proposal
Use Storage Tank for basic storage infrastructure
Use iSCSI disks
• FAStT with iSCSI gateway or 200i
DB2 for event metadata
Research extensions
• NAS head for Storage Tank
• Grid access to Storage Tank
• Object Store prototype for disks
Extend ST to Multiple Sites – Distributed Storage Tank
Single namespace across multiple sites
- Replication of files for good performance
- Extended protocols for consistency across replicas
- Joint research w/ Johns Hopkins underway
ST Extensions
Prototype: 1H04
Customer: CERN, JHU
Branch office
Meta-data
Server
Cluster
Integrated ST/NAS
Appliance
Control Network (IP)
Win2K
AIX
Solaris
Linux
ST Agent
ST Agent
ST Agent
ST Agent
data
Control Network (IP)
Meta-data
Server
Cluster
SAN
Meta-data
Server
Cluster
Tank
SFO
NYC
Fargo
Win2K
AIX
Solaris
Linux
ST Agent
ST Agent
ST Agent
ST Agent
data
SAN
Ultimate Vision for Federated Grid File Systems
Client
Client
Client
Client
Organization 1
Access
Server
Organization 2
Proxy
Server
exporter
exporter
Access
Server
...
exporter
exporter
...
file sources
Extend ST to access data from other file systems/sources
Client
Client
Client
Client
GridFTP
NFS
Grid data
repository
Control Network (IP)
NFS
Win2K
AIX
Solaris
Linux
ST Agent
ST Agent
ST Agent
ST Agent
NFS
NAS data
repository
Control Network (IP)
data
SAN
Meta-data
Server
Cluster
Meta-data
Server
Cluster
Win2K
AIX
Solaris
Linux
ST Agent
ST Agent
ST Agent
ST Agent
data
SAN
Storage Management in Grid Computing Environment
 IBM storage management products
today (TSM, TSRM, ITSANM) and
planned products (Merlot) cover a
reasonable set of functions
 We are converging, with the industry,
on CIM/XML as the standard for
storage device management
 In support of grid, we expect:
• to convert our management
solutions to Web/OGSA services
• to enhance functionality
Applications
CIM/XML
OGSA
OGSA-CIM
Wrapper
Storage Management
Services
CIM/XML
CIM Provider Interface
Lodestone
CIM Provider Interface
CIM Provider Interface
Storage Tank Shark, Tape, etc.
We are just starting to focus on grid implications for storage management
Summary of Data Grid
Support OGSA upper interface
Support CIM lower interface
Application
File System
Block Virtualization
CIM
Lodestone
Storage Devices
Block subsystem
Block subsystem
Storage Management
Extend ST & GPFS
OGSA