High Performance Presentation: 5 slides/Minute? (65 slides / 15 minutes) IO and DB “stuff” for LSST A new world record? Jim Gray Microsoft Research.

Download Report

Transcript High Performance Presentation: 5 slides/Minute? (65 slides / 15 minutes) IO and DB “stuff” for LSST A new world record? Jim Gray Microsoft Research.

High Performance Presentation:
5 slides/Minute?
(65 slides / 15 minutes)
IO and DB “stuff” for LSST
A new world record?
Jim Gray
Microsoft Research
1
TerraServer Lessons Learned
•
•
•
•
Hardware is 5 9’s (with clustering)
9
9
9
9
9
Software is 5 9’s (with clustering)
9
9
9
9
9
Admin
is 4 9’s (offline maintenance)
9999
Network is 3 9’s (mistakes, environment)
999
• Simple designs are best
• 10 TB DB is management limit
1 PB = 100 x 10 TB DB
this is 100x better than 5 years ago.
(yahoo!, HotMail are 300TB, Google! Is 2PB)
• Minimize use of tape
– Backup to disk (snapshots)
– Portable disk TBs
2
Serving BIG images
• Break into tiles (compressed):
– 10KB for modems
– 1MB for LANs
• Mosaic the tiles for pan, crop
• Store image pyramid for zoom
– 2x zoom only adds 33% overhead
1 + ¼ + 1/16 + …
• Use a spatial index
to cluster & find objects
.2x.2 km2 tile
.4x.4 km2 image
.8x.8 km2 image
1.6x1.6 km2 image
3
Economics
• People are more than 50% of costs
• Disks are more than 50% of capital
• Networking is the other 50%
– People
– Phone bill
– Routers
• Cpus are free (they come with the disks)
4
SkyServer/ SkyQuery Lessons
• DB is easy
• Search
– It is BEST to index
– You can put objects and attributes in a row
(SQL puts big blobs off-page)
– If you can’t index, you can extract attributes and quickly compare
– SQL can scan at 5M records/cpu/second
– Sequential scans are embarrassingly parallel
• Web services are easy
• XML Data Sets :
– a universal way to represent answers
– minimize round trips: 1 request/response
– Diffgrams allow disconnected update
5
How Will We Find Stuff?
Put everything in the DB (and index it)
•
•
Need dbms features: Consistency, Indexing,
Pivoting, Queries, Speed/scalability, Backup,
replication
If you don’t use one, you’r creating one!
Simple logical structure:
– Blob and link is all that is inherent
– Additional properties (facets == extra tables)
and methods on those tables (encapsulation)
•
•
•
•
•
•
•
More than a file system
Unifies data and meta-data
Simpler to manage
Easier to subset and reorganize
Set-oriented access
Allows online updates
Automatic indexing, replication
SQL
6
How Do We Represent Data
To The Outside World?
<?xml version="1.0" encoding="utf-8" ?>
- <DataSet xmlns="http://WWT.sdss.org/">
- <xs:schema id="radec" xmlns=""
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xmlmsdata">
<xs:element name="radec" msdata:IsDataSet="true">
<xs:element name="Table">
<xs:element name="ra" type="xs:double"
minOccurs="0" />
•
•
•
File metaphor too primitive: just a blob
Table metaphor too primitive: just records
Need Metadata describing data context
–
–
–
–
–
•
•
•
•
<xs:element name="dec" type="xs:double"
minOccurs="0" />
…
- <diffgr:diffgram
xmlns:msdata="urn:schemas-microsoftcom:xml-msdata"
xmlns:diffgr="urn:schemas-microsoftcom:xml-diffgram-v1">
- <radec xmlns="">
- <Table diffgr:id="Table1"
msdata:rowOrder="0">
<ra>184.028935351008</ra>
<dec>-1.12590950121524</dec>
</Table>
…
- <Table diffgr:id="Table10"
msdata:rowOrder="9">
<ra>184.025719033547</ra>
Format
Providence (author/publisher/ citations/…)
Rights
History
Related documents
<dec>-1.21795827920186</dec>
</Table>
</radec>
</diffgr:diffgram>
</DataSet>
schema
Data or
difgram
In a standard format
XML and XML schema
DataSet is great example of this
World is now defining standard schemas
7
Emerging Concepts
• Standardizing distributed data
–
–
–
–
–
–
Web Services, supported on all platforms
Custom configure remote data dynamically
XML: Extensible Markup Language
SOAP: Simple Object Access Protocol
WSDL: Web Services Description Language
DataSets: Standard representation of an answer
• Standardizing distributed computing
–
–
–
–
Grid Services
Custom configure remote computing dynamically
Build your own remote computer, and discard
Virtual Data: new data sets on demand
8
Szalay’s Law:
The utility of N comparable datasets is N2
• Metcalf’s law applies to telephones, fax, Internet.
• Szalay argues as follows:
Each new dataset gives new information
2-way combinations give new information.
• Example: Combine these 3 datasets
– (ID, zip code)
– (ID, birth day)
– (ID, height)
• Other example:
quark star:
Chandra Xray +
Hubble optical,
+600 year old records..
Drake, J. J. et al.
Is RX J185635-375 a Quark Star?.
Preprint, (2002).
Crab star
1053 AD
X-ray,
optical,
infrared, and
radio
views of the nearby
Crab Nebula, which is
now in a state of
chaotic expansion after
a supernova explosion
first sighted in 1054
A.D. by Chinese
Astronomers.
9
Science is hitting a wall
FTP and GREP are not adequate
•
•
•
•
You can GREP 1 MB in a second•
You can GREP 1 GB in a minute •
You can GREP 1 TB in 2 days •
You can GREP 1 PB in 3 years. •
You can FTP 1 MB in 1 sec
You can FTP 1 GB / min (= 1 $/G
…
2 days and 1K$
…
3 years and 1M$
• Oh!, and 1PB ~10,000 disks
• At some point you need
indices to limit search
parallel data search and analysis
search and analysis tools
• This is where databases can help
10
Networking:
Great hardware & Software
•
•
WANs @ 5GBps (1 = 40 Gbps)
GbpsEthernet common (~100 MBps)
– Offload gives ~2 hz/Byte
– Will improve with RDMA & zero-copy
•
– 10 Gbps mainstream by 2004
Faster I/O
– 1 GB/s today (measured)
– 10 GB/s under development
– SATA (serial ATA) 150MBps/device
11
Bandwidth:
3x bandwidth/year for 25 more years
• Today:
– 40 Gbps per channel (λ)
– 12 channels per fiber (wdm): 500 Gbps
– 32 fibers/bundle = 16 Tbps/bundle
• In lab 3 Tbps/fiber (400 x WDM)
• In theory 25 Tbps per fiber
• 1 Tbps = USA 1996 WAN bisection bandwidth
• Aggregate bandwidth doubles every 8 months!
1 fiber = 25 Tbps
12
Hero/Guru Networking
Redmond/Seattle, WA
Information Sciences Institute
Microsoft
Qwest
University of Washington
Pacific Northwest Gigapop
New York
HSCC (high speed connectivity consortium)
DARPA
Arlington, VA
San Francisco,
CA
5626 km
10 hops
13
Real Networking
• Bandwidth for 1 Gbps “stunt” cost 400k$/month
– ~ 200$/Mbps/m (at each end + hardware + admin)
– Price not improving very fast
– Doesn’t include operations / local hardware costs
• Admin… costs more ~1$/GB to 10$/GB
• Challenge: Go home and FTP from a “fast”server
• The Guru Gap: FermiLab <-> JHU
–
–
–
–
Both “well connected”
vBNS, NGI, Internet2, Abilene,….
Actual desktop-to-desktop ~ 100KBps
12 days/TB (but it crashes first).
• The reality: to move 10GB, mail it!
TeraScale Sneakernet 
14
How Do You Move A Terabyte?
Context
Speed
Rent
$/TB
$/Mbps
Mbps $/month
Sent
Time/TB
Home phone 0.04
40
1,000
3,086
6 years
Home DSL
0.6
70
117
360
5 months
T1
1.5
1,200
800
2,469
2 months
T3
43
28,000
651
2,010
2 days
OC3
155
49,000
316
976
14 hours
OC 192
9600
1,920,000
200
617
14 minutes
100 Mpbs
100
1 day
Gbps
1000
2.2 hours
Source: TeraScale Sneakernet, Microsoft Research, Jim Gray et. all
15
There Is A Problem
Niklaus Wirth:
Algorithms + Data Structures = Programs
• GREAT!!!!
– XML documents are portable objects
– XML documents are complex objects
– WSDL defines the methods on objects
(the class)
• But will all the implementations match?
– Think of UNIX or SQL or C or…
• This is a work in progress.
16
Changes To DBMS’s
• Integration of Programs and Data
– Put programs inside the database
allows OODB
– Gives you parallel execution
•
•
•
•
Integration of Relational, Text, XML, Time
Scaleout (even more)
AutoAdmin (“no knobs”)
Manage Petascale databases
(utilities, geoplex, online, incremental)
17
Publishing Data
Roles
Traditional
Emerging
Authors
Scientists
Collaborations
Publishers
Journals
Project web site
Curators
Libraries
Data+Doc Archives
Archives
Archives
Digital Archives
Consumers Scientists
Scientists
18
The Core Problem:
No Economic Model
• The archive user has not yet been born.
How can he pay you to curate the data?
• The Scientist gathered data for his own purpose
Why should he pay (invest time) for your needs?
• Answer to both:
that’s the scientific method
• Curating data
(documenting the design, the acquisition and the processing)
Is very hard and there is no reward for doing it.
The results are rewarded, not the process of getting them.
• Storage/archive NOT the problem (it’s almost free)
• Curating/Publishing is expensive.
19
SDSS Data Inflation – Data Pyramid
• Level 1A
Grows 5TB pixels/year
growing to 25TB
~ 2 TB/y compressed
growing to 13TB
~ 4 TB today
(level 1A in NASA terms)
• Level 2
Derived data products ~10x smaller
But there are many catalogs.
• Publish new edition each year
– Fixes bugs in data.
– Must preserve old editions
– Creates data pyramid
• Store each edition
– 1, 2, 3, 4… N ~ N2 bytes
• Net: Data Inflation: L2 ≥ L1
Level 1A
4 editions of Level 2 products
E4
E3
time
E2
E1
4 editions of
level 1A data
(source data)
4 editions of level 2 derived data products. Note that each derived product is
small, but they are numerous. This proliferation combined with the data
pyramid implies that level2 data more than doubles the total storage volume.
20
What’s needed?
(not drawn to scale)
Miners
Scientists
Science Data
& Questions
Data Mining
Algorithms
Plumbers
Database
To store data
Execute
Queries
Question &
Answer
Visualization
Tools
21
Scientists
Science Data
& Questions
CS Challenges For
Astronomers
• Objectify your field:
–
–
–
–
Precisely define what you are talking about.
Objects and Methods / Attributes
This is REALLY difficult.
UCDs are a great start but, there is a long way to go
• “Software is like entropy, it always increases.”
-- Norman Augustine, Augustine’s
Laws
–
–
–
–
Beware of legacy software – cost can eat you alive
Share software where possible.
Use standard software where possible.
Expect it will cost you 25% to 40% of project. 
• Explain what you want to do with the VO
– 20 queries or something like that.
22
Data
Mining
Algorithm
s
Challenge to Data Miners:
Linear and Sub-Linear Algorithms
Miners
Techniques
• Today most correlation / clustering algorithms
are polynomial N2 or N3 or…
• N2 is VERY big when N is big (1018 is big)
• Need sub-linear algorithms
• Current approaches are near optimal
given current assumptions.
• So, need new assumptions
probably heuristic and approximate
23
Data
Mining
Algorithm
s
Miners
Challenge to Data Miners:
Rediscover Astronomy
• Astronomy needs deep
understanding of physics.
• But, some was discovered
as variable correlations
then “explained” with physics.
• Famous example:
Hertzsprung-Russell Diagram
star luminosity vs color (=temperature)
• Challenge 1 (the student test):
How much of astronomy can data mining discover?
• Challenge 2 (the Turing test):
24
Can data mining discover NEW correlations?
Plumbers
Database
To store
data
Execute
Queries
Plumbers:
Organize and Search Petabytes
• Automate
– instrument-to-archive pipelines
It is is a messy business – very labor intensive
Most current designs do not scale (too many manual steps)
BaBar (1TB/day) and ESO pipeline seem promising.
A job-scheduling or workflow system
– Physical Database design & access
• Data access patterns are difficult to anticipate
• Aggressively and automatically use indexing, sub-setting.
• Search in parallel
• Goals
– Answer easy queries in 10 seconds.
– Answer hard queries (correlations) in 10 minutes.
25
Scaleable Systems
Scale UP • Scale UP: grow by
adding components
to a single system.
• Scale Out: grow by
adding more systems.
Scale OUT
26
What’s New – Scale Up
•
•
•
•
•
64 bit & TB size main memory
SMP on chip: everything’s smp
32… 256 SMP: locality/affinity matters
TB size disks
High-speed LANs
27
Who needs 64-bit addressing?
You! Need 64-bit addressing!
•
•
•
•
•
640K ought to be enough for anybody.
Bill Gates, 1981
But that was 21 years ago
== 221/3 = 14 bits ago.
20 bits + 14 bits = 34 bits so..
16GB ought to be enough for anybody
Jim Gray, 2002
34 bits > 31 bits so…
34 bits == 64 bits
YOU need 64 bit addressing!
28
64 bit – Why bother?
•
•
1966 Moore’s law:
4x more RAM every 3 years.
1 bit of addressing every 18 months
36 years later: 236/3 = 24 more bits
Not exactly right, but…
32 bits not enough for servers
32 bits gives no headroom for clients
So, time is running out ( has run out )
•
Good news:
Itanium™ and Hammer™ are maturing
And so is the base software (OS, drivers, DB, Web,...)
Windows & SQL @ 256GB today!
29
•
64 bit – why bother?
Memory intensive calculations:
– You can trade memory for IO and processing
•
Example: Data Analysis & Clustering a JHU
•
in memory CPU time is
Memory in GB
100000.0
1
~NlogN , N ~ 100M
4
10000.0
yea
32
Disk M chunks
r
256
2
→ time ~ M
1000.0
month
must run many times
100.0
Now running on
day
10.0
HP Itanium
Windows.Net Server 2003
1.0
0
10
20
30
40
SQL Server
•
•
CPU time (hrs)
•
decade
week
50
60
70
80
90
No of galaxies in Millions
30
Graph courtesy of Alex Szalay & Adrian Pope of Johns Hopkins University
100
Amdahl’s balanced System Laws
•
•
1 mips needs 4 MB ram and needs 20 IO/s
At 1 billion instructions per second
need 4 GB/cpu
need 50 disks/cpu!
4 GB
RAM
•
1 bips
cpu
64 cpus … 3,000 disks
50 disks
10,000 IOps
7.5 TB
31
The 5 Minute Rule – Trade RAM
for Disk Arms
•
•
If data re-referenced every 5 minutes
It is cheaper to cache it in ram
than to get it from disk
A disk access/second ~ 50$ or
~ 50MB for
1 second or
~ 50KB for 1,000 seconds.
Each app has a memory “knee”
Up to the knee,
more memory helps a lot.
32
64 bit Reduces IO, saves disks
Large memory reduces IO
64-bit simplifies code
Processors can be faster (wider word)
Ram is cheap (4 GB ~ 1k$ to 20k$)
Can trade ram for disk IO
Three TPC Benchmarks:
GBs help a LOT!
Better response time.
even if cpu clock is slower
Example
100,000
Transactions Per
Second
•
•
•
•
•
•
•
75,000
– tpcC
•
•
•
4x1Ghz Itanium2 vs
4x1.6Ghz IA32
40 extra GB
→ 60% extra throughput
50,000
25,000
4x1.6Ghz 4x1.6Ghz
IA32
IA32
8GB
32GB
4x1 Ghz
IA64
48GB
4x1.6Ghz IA32+8GB 4x1.6Ghz IA32+32GB
4x1Ghz Itanium2 +
48GB
0
33
AMD Hammer™ Coming Soon
•
•
•
•
•
•
AMD Hammer™ is 64bit capable
2003: millions of Hammer™ CPUs will ship
2004: most AMD CPUs will be 64bit
4GB ram is less than 1,000$ today
less than 500$ in 2004
Desktops (Hammer™)
and servers (Opteron™).
You do the math,…
Who will demand 64bit capable software?
34
A 1TB Main Memory
•
•
•
Amdahl’s law: 1mips/MB , now 1:5
so ~20 x 10 Ghz cpus need 1TB ram
1TB ram
~ 250k$ … 2m$ today
~ 25k$ … 200k$ in 5 years
128 million pages
– Takes a LONG time to fill
– Takes a LONG time to refill
•
•
•
Needs new algorithms
Needs parallel processing
Which leads us to…
– The memory hierarchy
– smp
– numa
35
Hyper-Threading: SMP on chip
•
If cpu is always waiting for memory
Predict memory requests and prefetch
– done
•
If cpu still always waiting for memory
Multi-program it (multiple hardware threads per cpu)
– Hyper Threading: Everything is SMP
– 2 now more later
– Also multiple cpus/chip
•
If your program is single threaded
– You waste ½ the cpu and memory bandwidth
– Eventually waste 80%
•
App builders need to plan for threads.
36
The Memory Hierarchy
•
•
•
•
Locality REALLY matters
CPU 2 G hz, RAM at 5 Mhz
RAM is no longer random access.
Organizing the code gives 3x (or more)
Organizing the data gives 3x (or more)
•
•
•
•
•
•
•
Level
Registers
L1
L2
L3
Near RAM
Far RAM
latency (clocks)
size
1
1 KB
2
32 KB
10
256 KB
30
4 MB
100
16 GB
300
64 GB
37
Remote RAM
Remote RAM
Other Cpus
RAM
Other Cpus
Other Cpus
Remote cache
Other Cpus
The Bus
L2 cache
Off chip
L1 cache
Dcache
Icache
registers
Arithmatic Logical Unit
38
Scaleup Systems
Non-Uniform Memory Architecture (NUMA)
Coherent but… remote memory is even slower
CPU
I/O
Mem
CPU
I/O
Mem
CPU
I/O
CPU
CPU
Mem Mem Mem
CPU
CPU
CPU
Service
Processor
Mem Mem Mem
Config DB
CPU
Service
Processor
CPU
CPU
Chipset
CPU
CPU
CPU
Slow local main memory
Slower remote main memory
Scaleup by adding cells
Chipset
Mem Mem Mem
Mem
Partition
manager
Chipset
Mem
I/O
All cells see a common memory
CPU
CPU
Chipset
Mem Mem Mem
System interconnect
Crossbar/Switch
Planning for 64 cpu, 1TB ram
Interconnect,
Service Processor,
Partition management
are vendor specific
Several vendors doing this
Itanium and Hammer
39
Changed Ratios Matter
•
•
If everything changes by 2x,
Then nothing changes.
So, it is the different rates that matter.
Improving FAST
CPU speed
Memory & disk size
Network Bandwidth
Slowly changing
Speed of light
People costs
Memory bandwidth
WAN prices
40
Disks are becoming tapes
•
Capacity:
– 150 GB now,
300 GB this year,
1 TB by 2007
•
150 IO/s 40 MBps
Bandwidth:
–
•
150 GB
40 MBps now
150 MBps by 2007
1 TB
200 IO/s 150 MBps
Read time
– 2 hours sequential, 2 days random now
4 hours sequential, 12 days random by 2007
41
Disks are becoming tapes
Consequences
•
•
•
Use most disk capacity for archiving
Copy on Write (COW) file system
in Windows and other OSs.
RAID10 saves arms, costs space (OK!).
Backup to disk
Pretend it is a 100GB disk + 1 TB disk
– Keep hot 10% of data on fastest part of disk.
– Keep cold 90% on colder part of disk
•
Organize computations to read/write disks
sequentially in large blocks.
42
Wiring is going serial
and getting FAST!
•
•
•
•
Gbps Ethernet and SATA
built into chips
Raid Controllers: inexpensive and fast.
1U storage bricks @ 2-10 TB
SAN or NAS
(iSCSI or CIFS/DAFS)
Enet
100MBps/link
43
NAS – SAN Horse Race
•
•
•
•
Storage Hardware
1k$/TB/y
Storage Management 10k$...300k$/TB/y
So as with Server Consolidation
Storage Consolidation
Two styles:
NAS (Network Attached Storage)
File Server
SAN (System Area Network)
Disk Server
I believe NAS is more manageable.
44
SAN/NAS Evolution
Monolithic
Modular
Sealed
45
IO Throughput
K Access Per Second Vs. RPM
Kaps vs. RPM
200
160
Kaps
120
80
40
0
0
5000
10000
15000
20000
46
Comparison Of Disk Cost
$’s for similar performance
Seagate Disk Prices*
Model #
Connect.
Cost
$/K Rev
40 GB 5400 RPM
ATA
$86
$15.9
ATA 1000 40 GB 7200 RPM
ATA
$101
$14.0
36 ES 2
36.7 GB 10K RPM
SCSI
$325
$32.5
X15 36LP 36.7 GB 15K RPM
SCSI
$455
$29.7
X15 36LP 36.7 GB 15K RPM
Fibre
$455
$29.7
ATA 100
Size
Speed
*Source: Seagate online store, quantity one prices
47
Comparison Of Disk Costs
¢/MB for different systems
Mfg.
Size
Type
Cost
Cost/MB
Dell
80 GB
Int. ATA
$115
1.4¢
WD
120 GB
Ext. ATA
$276
2.3¢
Seagate
181 GB
Int SCSI
$1155
6.4¢
EMC
XX GB
SAN
Source: Dell
xx¢
48
Why Serial ATA Matters
• Modern interconnect
• Point-to-point drive connection
– 150Mbs –> 300Mbs
• Facilitates ATA
disk arrays
• Enables inexpensive
“cool” storage
49
Performance (on Y2k SDSS data)
– (10 mips/IO, 200 ins/byte)
– 2.5 m rec/s/cpu
seconds
1000
~1,000 IO/cpu sec
1,000 IOs/cpu
~ sec
64 MB IO/cpu sec
1E+3
1E+2
1E+1
0.01
10
1
cpu vs IO
time vs queryID
1E+0
cpu
elapsed
100
IO count
• Run times: on 15k$ HP Server
(2 cpu, 1 GB , 8 disk)
• Some take 10 minutes
1E+7
• Some take 1 minute
1E+6
• Median ~ 22 sec.
1E+5
• Ghz processors are fast! 1E+4
0.1
1. CPU sec 10.
100.
1,0
ae
Q08
Q01
Q09
Q10A
Q19
Q12
Q10
Q20
Q16
Q02
Q13
Q04
Q06
Q11
Q15B
Q17
Q07
Q14
Q15A
Q05
Q03
Q18
50
NVO: How Will It Work?
Define commonly used `atomic’ services
Build higher level toolboxes/portals on top
We do not build `everything for everybody’
Use the 90-10 rule:
1
– Define the standards and interfaces
– Build the framework
– Build the 10% of services
that are used by 90%
– Let the users build the rest
from the components
0.9
0.8
0.7
# of users
•
•
•
•
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
# of s e rvice s
51
Data Federations of Web Services
• Massive datasets live near their owners:
–
–
–
–
Near the instrument’s software pipeline
Near the applications
Near data knowledge and curation
Super Computer centers become Super Data Centers
• Each Archive publishes a web service
– Schema: documents the data
– Methods on objects (queries)
• Scientists get “personalized” extracts
• Uniform access to multiple ArchivesFederation
– A common global schema
52
Grid and Web Services Synergy
• I believe the Grid will be many web services
share data (computrons are free)
• IETF standards Provide
– Naming
– Authorization / Security / Privacy
– Distributed Objects
Discovery, Definition, Invocation, Object Model
– Higher level services: workflow, transactions, DB,..
• Synergy: commercial Internet & Grid tools
53
Web Services: The Key?
• Web SERVER:
– Given a url + parameters
– Returns a web page (often dynamic)
Your
program
Web
Server
• Web SERVICE:
– Given a XML document (soap msg)
– Returns an XML document
– Tools make this look like an RPC.
• F(x,y,z) returns (u, v, w)
– Distributed objects for the web.
– + naming, discovery, security,..
• Internet-scale
distributed computing
Your
program
Data
In your
address
space
Web
Service
54
Grid?
• Harvesting spare cpu cycles is not important
– They are “free” (1$/cpu day)
– They need applications and data (which are not free)
(1$/GB shipped)
• Accessing distributed data IS important
– Send the programs to the data
– Send the questions to the databases.
• Super Computer Centers become
Super Data Centers
Super Application Centers
55
The Grid: Foster & Kesselman
(Argonne National Laboratory)
Internet computing and GRID technologies promise to change
the way we tackle complex problems. They will enable largescale aggregation and sharing of computational, data and
other resources across institutional boundaries …. Transform
scientific disciplines ranging from high energy physics to the
life sciences
56
Grid/Globus
• Leader of the pack for GRID middleware
• Layered software toolkit
– 1: Grid Fabric (OS, TCP)
– 2: Grid Services
Globus Resource Allocation Manager
Globus Information Service (meta-computing directory
service)
Grid Security Infrastructure
GridFTP
– 3: Application Toolkits
Job submission
MPICH-G2 message passing interface
– 4:Specific Applications
OVERFLOW Navier-Stokes flow solver
57
Globus in gory detail
SHELL SCRIPTS
globus-mds-search
'(&(hn=denali.mcs.anl.gov)(objectclass=GlobusSy
stemDynamicInformation))' cpuload1 |\
sed -n -e '/^hn=/p' -e '/^cpuload1=/p' |\
sed -e 's/,.*$//' -e 's/=/ /g' |\
awk '/^hn/{printf "%s", $2} /^cpuload/{printf
" %s\n", $2}‘
if [ $# -eq 0 ]; then
echo "provide argument <number of processes to
start>" 1>&2
exit 1
fi
if [ -z "$GRAMCONTACT" ] ; then
GRAMCONTACT="`globus-hostname2contacts -type
fork pitcairn.mcs.anl.gov`"
fi
pwd=`/bin/pwd`
rsl="&(executable=${pwd}/myjobtest)(count=$1)"
arch=`${GLOBUS_INSTALL_PATH}/sbin/config.guess`
${GLOBUS_INSTALL_PATH}/tools/${arch}/bin/globusrun
-o -r "${GRAMCONTACT}" "${rsl}"
LIBRARIES
/* get process id and hostname */
pid = getpid();
rc = globus_libc_gethostname(hn, 256);
globus_assert(rc == GLOBUS_SUCCESS);
/* get current time and convert to string format. setting
[25] to zero will strip the newline character. */
mytime = time(GLOBUS_NULL);
timestr = globus_libc_ctime_r( &mytime, buf, 30 );
timestr[25] = '\0';
globus_libc_printf("%s : process %d on %s came to \
life\n",timestr, pid, hn);
/*THE BARRIER!!! */
globus_duroc_runtime_barrier();
/*Passed the barrier: get current time again and print it
out.*/
mytime = time(GLOBUS_NULL);
timestr = globus_libc_ctime_r( &mytime, buf, 30 );
globus_libc_printf("%s : process %d on %s passed \
the barrier\n", timestr, pid, hn);
/*TODO 1: get the layout of the DUROC job using first
globus_duroc_runtime_intra_subjob_rank() and then
globus_duroc_runtime_inter_subjob_structure(). */
/* We are done.*/
rc = globus_module_deactivate_all();
globus_assert(rc == GLOBUS_SUCCESS);
return 0;
58
Shielding Users
• Users do not want to deal with XML,
they want their data
• Users do not want to deal with configuring
grid computing, they want results
• SOAP: data appears in user memory, XML
is invisible
• SOAP call: just a remote procedure
59
Atomic Services
• Metadata information about resources
– Waveband
– Sky coverage
– Translation of names to universal dictionary
(UCD)
• Simple search patterns on the resources
– Cone Search
– Image mosaic
– Unit conversions
• Simple filtering, counting, histogramming
60
Higher Level Services
• Built on Atomic Services
• Perform more complex tasks
• Examples
– Automated resource discovery
– Cross-identifications
– Photometric redshifts
– Outlier detections
– Visualization facilities
• Expectation:
61
SkyQuery
• Distributed Query tool using a set of
services
• Feasibility study, built in 6 weeks from
scratch
– Tanu Malik (JHU CS grad student)
– Tamas Budavari (JHU astro postdoc)
SELECT o.objId,
t.objId
• Implemented
in o.r,
C# o.type,
and .NET
FROM SDSS:PhotoPrimary o,
t XML Contest
• Won 2nd TWOMASS:PhotoPrimary
prize of Microsoft
WHERE XMATCH(o,t)<3.5
AND AREA(181.3,-0.76,6.5)
• Allows queries
like:
AND o.type=3 and (o.I - t.m_j)>2
62
Architecture
Web Page
Image cutout
SkyQuery
SkyNode
SDSS
SkyNode
2Mass
SkyNode
First
63
Cross-id Steps
•
•
•
•
•
Parse query
Get counts
Sort by counts
Make plan
Cross-match
SELECT o.objId, o.r,
o.type, t.objId
FROM SDSS:PhotoPrimary o,
TWOMASS:PhotoPrimary t
WHERE XMATCH(o,t)<3.5
AND AREA(181.3,-0.76,6.5)
AND (o.i - t.m_j) > 2
AND o.type=3
– Recursively,
from small to large
• Select necessary attributes only
• Return output
• Insert cutout image
64
Show Cutout Web Service
65