Transcript Document
Storage Tank in Data Grid August 23, 2003 Shin, SangYong(syshin, #6468) IBM Grid Computing Storage Architecture Model Application - files stored on block storage File System Block Virtualization Storage Devices - all managed by storage mgmt s/w Block subsystem Storage Management - app. data is in files Block Virtualization Today No common view of block storage Server impact on storage change Emerging Common view of block storage No server impact on storage change SAN SAN Block Virtualization - IBM block virtualization is Lodestone Extending Lodestone for Grid LVE RAID Brick LVE ... Host Host Host Application LVE High-end Disk array Shark, Brand X LVE Functions Providing Virtual Disks Online- Dynamic Volume Sizing Advanced Copy Functions Economic Disaster Recovery Solutions. Different Level of Performance Data Backup with low price disk No Service Downtime etc Midrange Disk array FastT, Brand Y LVE = Lodestone Virtualization Engine File Systems - Current Capabilities Vs Grid Requirements SAN GPFS HPC, Engineering, Digital Media Access from servers in a cluster Concurrent multiple I/Os AIX and Linux OS only No access to other FS data Storage Tank Commercial, file sharing, DB serving Access from servers on SAN All servers and OSes No access to other FS data Grid requirements Access from any machine, any OS, anywhere Access to all file system data Planned Approach: Allow remote access to our file systems Provide multi-site support Integrate data from other sources NFSv4 support for our file systems We believe NFSV4 will be an important protocol for the grid • has the necessary extensions for robust security and WAN access • is the first NFS protocol to come through the standards process • proposed standard in Dec. 2002; expected to be draft standard by 4Q03 Our plan is to provide NFSv4 support for our file systems (J2, GPFS and Storage Tank) • Best case will be late 2004 Storage Tank (ST) - a SAN file system GridFTP, NFS LAN ST Clients file attributes, file location info, control info Win2K AIX ST agent ST agent Solaris ST agent Linux ST agent ST Server ST Server data Data SN Data Backup ST Server Metadata Meta Prototypes: 2H02-1H03 Customer: CERN Capabilities: access to ST data through Globus GridFTP interface register ST files in Globus Replica Location Service enabled to support OGSA services (e.g. replication) centralized, policy-based storage management cross-platform file sharing performance comparable to local file system with direct client-to-storage data path CERN Requirements Data analysis of Large Hadron Collider (LHC) experiments • Basic unit of data is an LHC event • data represents physical collision between 2 protons • 1 to few MBs • stored within 1 GB files • event metadata stored in an RDBMS Tiered structure • CERN is Tier 0 • event data and metadata distributed to Tier 1 centers • physicists at Tier 2 centers analyze data at Tier 1 centers 2.4 PB of disk and 14 PB of tape by 2007 Grid access (AFS/DFS like), simple storage management IP SANs, not FC Our Proposal Use Storage Tank for basic storage infrastructure Use iSCSI disks • FAStT with iSCSI gateway or 200i DB2 for event metadata Research extensions • NAS head for Storage Tank • Grid access to Storage Tank • Object Store prototype for disks Extend ST to Multiple Sites – Distributed Storage Tank Single namespace across multiple sites - Replication of files for good performance - Extended protocols for consistency across replicas - Joint research w/ Johns Hopkins underway ST Extensions Prototype: 1H04 Customer: CERN, JHU Branch office Meta-data Server Cluster Integrated ST/NAS Appliance Control Network (IP) Win2K AIX Solaris Linux ST Agent ST Agent ST Agent ST Agent data Control Network (IP) Meta-data Server Cluster SAN Meta-data Server Cluster Tank SFO NYC Fargo Win2K AIX Solaris Linux ST Agent ST Agent ST Agent ST Agent data SAN Ultimate Vision for Federated Grid File Systems Client Client Client Client Organization 1 Access Server Organization 2 Proxy Server exporter exporter Access Server ... exporter exporter ... file sources Extend ST to access data from other file systems/sources Client Client Client Client GridFTP NFS Grid data repository Control Network (IP) NFS Win2K AIX Solaris Linux ST Agent ST Agent ST Agent ST Agent NFS NAS data repository Control Network (IP) data SAN Meta-data Server Cluster Meta-data Server Cluster Win2K AIX Solaris Linux ST Agent ST Agent ST Agent ST Agent data SAN Storage Management in Grid Computing Environment IBM storage management products today (TSM, TSRM, ITSANM) and planned products (Merlot) cover a reasonable set of functions We are converging, with the industry, on CIM/XML as the standard for storage device management In support of grid, we expect: • to convert our management solutions to Web/OGSA services • to enhance functionality Applications CIM/XML OGSA OGSA-CIM Wrapper Storage Management Services CIM/XML CIM Provider Interface Lodestone CIM Provider Interface CIM Provider Interface Storage Tank Shark, Tape, etc. We are just starting to focus on grid implications for storage management Summary of Data Grid Support OGSA upper interface Support CIM lower interface Application File System Block Virtualization CIM Lodestone Storage Devices Block subsystem Block subsystem Storage Management Extend ST & GPFS OGSA