The Sloan Digital Sky Survey Alex Szalay Department of Physics and Astronomy
Download
Report
Transcript The Sloan Digital Sky Survey Alex Szalay Department of Physics and Astronomy
The Sloan Digital
Sky Survey
Alex Szalay
Department of Physics and Astronomy
The Johns Hopkins University
The Sloan Digital Sky Survey
A project run by the Astrophysical Research Consortium (ARC)
The University of Chicago
Princeton University
The Johns Hopkins University
The University of Washington
Fermi National Accelerator Laboratory
US Naval Observatory
The Japanese Participation Group
The Institute for Advanced Study
Max Planck Inst, Heidelberg
SLOAN Foundation, NSF, DOE, NASA
Goal: To create a detailed multicolor map of the Northern Sky
over 5 years, with a budget of approximately $80M
Data Size: 40 TB raw, 2 TB processed
Alex Szalay, JHU
Scientific Motivation
Create the ultimate map of the Universe:
The Cosmic Genome Project!
Study the distribution of galaxies:
What is the origin of fluctuations?
What is the topology of the distribution?
Measure the global properties of the Universe:
How much dark matter is there?
Local census of the galaxy population:
How did galaxies form?
Find the most distant objects in the Universe:
What are the highest quasar redshifts?
Alex Szalay, JHU
Cosmology Primer
The Universe is expanding:
the galaxies move away from us
spectral lines are redshifted
The fate of the universe depends
on the balance between gravity
and the expansion velocity
v = Ho r
Hubble’s law
= density/critical
if <1, expand forever
Most of the mass in the Universe
is dark matter, and it may be
cold (CDM)
The spatial distribution of galaxies
is correlated, due to small ripples
in the early Universe
d> *
P(k): power spectrum
Alex Szalay, JHU
The ‘Naught’ Problem
What are the global parameters of the Universe?
H0
0
0
the Hubble constant
the density parameter
the cosmological constant
55-75 km/s/Mpc
0.25-1
0 - 0.7
Their values are still quite uncertain today...
Goal:
measure these parameters with an accuracy of a few percent
High Precision Cosmology!
Alex Szalay, JHU
The Cosmic Genome Project
The SDSS will create the ultimate map
of the Universe, with much more detail
than any other measurement before
daCosta
etal 1995
deLapparent, Geller and Huchra
1986
Gregory and Thompson 1978
Alex Szalay, JHU
SDSS Collaboration 2002
Area and Size of Redshift Surveys
1.00E+09
SDSS
photo-z
1.00E+08
No of objects
1.00E+07
SDSS
main
SDSS
abs line
1.00E+06
SDSS
red
1.00E+05
CfA+
SSRS
2dF
LCRS
1.00E+04
SAPM
1.00E+03
1.00E+04
2dFR
1.00E+05
1.00E+06
QDOT
1.00E+07
1.00E+08
Volume in M pc 3
Alex Szalay, JHU
1.00E+09
1.00E+10
1.00E+11
Clustering of Galaxies
We will measure the spectrum of the
density fluctuations to high precision
even on very large scales
The error in the amplitude of
the fluctuation spectrum
1970
1990
1995
1998
1999
2002
x100
x2
±0.4
±0.2
±0.1
±0.05
Alex Szalay, JHU
Relevant Scales
Distances measured in Mpc [megaparsec]
1 Mpc
5 Mpc
3000 Mpc
= 3 x 1024 cm
= distance between galaxies
= scale of the Universe
if >200 Mpc
fluctuations have a PRIMORDIAL shape
if <100 Mpc
gravity creates sharp features, like walls,
filaments and voids
Biasing
conversion of mass into light is nonlinear
light is much more clumpy than the mass
Alex Szalay, JHU
The Topology of Local Universe
Measure the Topology of the Universe
Does it consist of walls and voids
or is it randomly distributed?
Alex Szalay, JHU
Finding the Most Distant Objects
Intermediate and high redshift QSOs
Multicolor selection function.
Luminosity functions and spatial clustering.
High redshift QSO’s (z>5).
Alex Szalay, JHU
Features of the SDSS
Special 2.5m telescope, located at Apache Point, NM
3 degree field of view.
Zero distortion focal plane.
Two surveys in one:
Photometric survey in 5 bands.
Spectroscopic redshift survey.
Huge CCD Mosaic
30 CCDs 2K x 2K (imaging)
22 CCDs 2K x 400 (astrometry)
Two high resolution spectrographs
2 x 320 fibers, with 3 arcsec diameter.
R=2000 resolution with 4096 pixels.
Spectral coverage from 3900Å to 9200Å.
Automated data reduction
Over 120 man-years of development effort.
(Fermilab + collaboration scientists)
Very high data volume
Expect over 40 TB of raw data.
About 2 TB processed catalogs.
Data made available to the public.
Alex Szalay, JHU
Apache Point Observatory
Located in New Mexico,
near White Sands National Monument
Alex Szalay, JHU
The Telescope
Special 2.5m telescope
3 degree field of view
Zero distortion focal plane
Wind screen moved separately
Alex Szalay, JHU
The Photometric Survey
Northern Galactic Cap
5 broad-band filters ( u', g', r',
i', z’ )
limiting magnitudes (22.3, 23.3, 23.1, 22.3, 20.8)
drift scan of 10,000 square degrees
55 sec exposure time
40 TB raw imaging data -> pipeline ->
100,000,000 galaxies
50,000,000 stars
calibration to 2% at r'=19.8
only done in the best seeing (20 nights/yr)
pixel size is 0.4 arcsec,
astrometric precision is 60 milliarcsec
Southern Galactic Cap
multiple scans (> 30 times) of the same stripe
Continuous data rate of 8 Mbytes/sec
Alex Szalay, JHU
The Footprint of the Survey
Alex Szalay, JHU
Survey Strategy
Overlapping 2.5 degree wide stripes
Avoiding the Galactic Plane (dust)
Multiple exposures on the three
Southern stripes
Alex Szalay, JHU
The Spectroscopic Survey
Measure redshifts of objects distance
SDSS Redshift Survey:
1 million galaxies
100,000 quasars
100,000 stars
Two high throughput spectrographs
spectral range 3900-9200 Å.
640 spectra simultaneously.
R=2000 resolution.
Automated reduction of spectra
Very high sampling density and completeness
Objects in other catalogs also targeted
Alex Szalay, JHU
Optimal Tiling
Fields have 3 degree diameter
Centers determined by an
optimization procedure
A total of 2200 pointings
640 fibers assigned simultaneously
Alex Szalay, JHU
The Mosaic Camera
Alex Szalay, JHU
Photometric Calibrations
The SDSS will create a new
photometric system:
u' g' r' i' z'
Primary standards:
observed with the USNO
40-inch telescope in Flagstaff
Secondary standards:
observed with the SDSS
20-inch telescope at Apache
Point – calibrating the SDSS
imaging data
Alex Szalay, JHU
The Spectrographs
Two double spectrographs
very high throughput
two 2048x2048 CCD detectors
mounted on the telescope
light fed through slithead
Alex Szalay, JHU
The Fiber Feed System
Galaxy images are captured by optical fibers
lined up on the spectrograph slit
Manually plugged during the day into Al plugboards
640 fibers in each bundle
The largest fiber system today
Alex Szalay, JHU
Spectrograph Status
Spectrographs:
Laboratory observations of solar spectrum
First astronomical observations March 1999
Alex Szalay, JHU
First Light Images
Telescope:
First light May 9th 1998
Equatorial scans
Alex Szalay, JHU
The First Stripes
Camera:
5 color imaging of >100 square degrees
Multiple scans across the same fields
Photometric limits as expected
Alex Szalay, JHU
NGC 2068
Alex Szalay, JHU
UGC 3214
Alex Szalay, JHU
NGC 6070
Alex Szalay, JHU
The First Quasars
The four highest redshift
quasars have been found in the
first SDSS test data !
Alex Szalay, JHU
Methane/T Dwarf
Discovery of several new
objects by SDSS & 2MASS
SDSS T-dwarf
(June 1999)
Alex Szalay, JHU
Detection of Gravitational Lensing
28,000 foreground galaxies and 2,045,000 background galaxies in test data
(McKay etal 1999)
Alex Szalay, JHU
The first 35,000 redshifts
Alex Szalay, JHU
SDSS Data Flow
Alex Szalay, JHU
Data Processing Pipelines
Alex Szalay, JHU
Concept of the SDSS Archive
Operational
Archive
Science Archive
(products accessible to users)
(raw + processed data)
Other
Archives
Other
OtherArchives
Archives
Alex Szalay, JHU
SDSS Data Products
Object catalog
parameters of >108 objects
Redshift Catalog
parameters of 106 objects
400 GB
1 GB
Atlas Images
5 color cutouts of >108 objects
1.5 TB
Spectra
in a one-dimensional form
60 GB
Derived Catalogs
- clusters
- QSO absorption lines
20 GB
4x4 Pixel All-Sky Map
heavily compressed
60 GB
All raw data saved in a tape vault at Fermilab
Alex Szalay, JHU
How will the data be analyzed?
The data are inherently multidimensional
=> positions, colors, size, redshift
Improved classifications result in complex N-dimensional volumes
=> complex constraints, not ranges
Spatial relations will be investigated
=> nearest neighbors
=> other objects within a radius
Data Mining: finding the ‘needle in the haystack’
=> separate typical from rare
=> recognize patterns in the data
Output size can be prohibitively large for intermediate files
=> import output directly into analysis tools
Alex Szalay, JHU
Geometric Approach
The Main Problem:
•fast, indexed, complex searches of Terabytes in k-dim space
•searches are not necessary parallel to the axes
=> traditional indexing (b-tree) does not work
Geometric Approach:
•Use the geometric nature of the k-dimensional data
•Quantize data into containers of ‘friends’:
objects of similar colors
close on the sky
stored together
=> efficient cache performance
•Containers represent a coarse grained density map of the data
multidimensional index tree: k-d tree + r-tree
Alex Szalay, JHU
Geometric Indexing
“Divide and Conquer”
Partitioning
Attributes
Number
Sky Position
Multiband Fluxes
Other
3
N = 5+
M= 100+
3NM
Hierarchical
Triangular
Mesh
Split as k-d tree
Stored as r-tree
of bounding boxes
Alex Szalay, JHU
Using regular
indexing
techniques
Sky coordinates
Stored as Cartesian coordinates:
projected onto a unit sphere
Longitude and Latitude lines:
intersections of planes and the sphere
Boolean combinations:
query polyhedron
Alex Szalay, JHU
Sky Partitioning
Hierarchical Triangular Mesh - based on octahedron
Alex Szalay, JHU
Hierarchical Subdivision
Hierarchical subdivision of spherical triangles
represented as a quadtree
In SDSS the tree is 5 levels deep - 8192 triangles
Alex Szalay, JHU
Result of the Query
Alex Szalay, JHU
Magnitudes and Multicolor Searches
Galaxy fluxes
• large dynamic range
• errors
divergent as x 0 !
m 2.5 log 10 ( f / f 0 ) 2.5 log 10 x
x 2
m
2
2
m
x 2
x
x
2
For multicolor magnitudes
the error contours can be
very anisotropic and skewed,
extremely poor localization!
But: this is an artifact of the logarithm at zero flux,
in flux space the object is well localized
Alex Szalay, JHU
Novel Magnitude Scale
2.5
1 f
sinh
c
ln 10
b
b: softness
c: set to match normal magnitudes
Advantages:
monotonic
degrades gracefully
objects have small error ellipse
unified handling of detections
and upper limits!
Disadvantages:
unusual
(Lupton, Gunn and Szalay, AJ 99)
Alex Szalay, JHU
Flux Indexing
Split along alternating flux directions
Create balanced partitions
Store bounding boxes at each step
Build a 10-12 level tree in each triangle
Alex Szalay, JHU
How to build compact cells?
The SDSS will measure fluxes in 5 bands
=> asinh magnitudes
Axis-parallel splits in median flux,
in 8 separate zones in Galactic latitude
=> 5 dimensional bounding boxes
The fluxes are strongly correlated
=> 2 + dimensional distribution of typical objects
=> widely scattered rare objects
=> large density contrasts
Therefore:
first create a local density and split on its value (Csabai etal 96)
typical (98%)
rare (2%)
Alex Szalay, JHU
Distributed Implementation
User Interface
Analysis Engine
Master
SX Engine
Objectivity Federation
Objectivity
Slave
Slave
Slave
Objectivity
Slave
Objectivity
RAID
Objectivity
RAID
Objectivity
RAID
RAID
Alex Szalay, JHU
Exploring new methods
New spectral classification techniques
galaxy spectra can be expressed as a superposition
of a few (<5) principal components
=> objective classification of 1 million spectra!
Photometric redshifts
galaxy colors systematically change with redshift,
the SDSS photometry works like a 5-pixel spectrograph
=> z=0.05, but with 100 million objects!
Measuring cosmological parameters
before: data analysis was limited by small number statistics
after:
dominant errors are systematic (extinction)
=> new analysis methods are required!
Alex Szalay, JHU
Photometric redshifts
Multicolor photometry maps physical parameters
luminosity L
observed fluxes
redshift z
spectral type T
Inversion: u’,g’,r’,I’,z’ => z, L, T
Redshifts are statistical, with large errors: z0.05
The data set is huge, more than 100 million galaxies
Easy to subdivide into coarse z bins, and by type
=> study evolution
=> enormous volume - 1 Gpc3
Alex Szalay, JHU
Spectra from Photometry
New development:
low resolution spectra from multicolor photometry
many galaxies => oversampling => spectra
(Csabai, Budavari, Connolly, Szalay 99)
Alex Szalay, JHU
Measuring P(k)
Karhunen-Loeve transform:
Signal-to-noise eigenmodes of the redshift survey
Optimal extraction of clustering signal
Maximal rejection of systematic errors
(Vogeley and Szalay 96, Matsubara, Szalay and Landy 99)
8
North
South
22
0.4800..20
22
0.3100..19
06
0.82 00..06
05
0.7500..05
05
0.1500..05
05
0.14 00..05
Combined
15
0.40 00..14
04
0.7800..04
03
0.14 00..03
Pilot project using the Las
Redshift
Survey
WeCampanas
simultaneously
measure
the values of
the redshift-distortion
with
22,000 galaxies parameter (=0.6/b),
the normalization (8 ) and
the CDM shape parameter ( = h).
Alex Szalay, JHU
Trends
• Future dominated by detector improvements
1000
100
10
1
0.1
1970
1975
1980
1985
1990
1995
2000
CCDs
• Moore’s Law growth in
CCD capabilities
• Gigapixel arrays on the
horizon
• Improvements in computing
and storage will track growth
in data volume
• Investment in software is
critical, and growing
Glass
Total area of 3m+ telescopes in the world in m2, total number
of CCD pixels in Megapix, as a function of time. Growth over
25 years is a factor of 30 in glass, 3000 in pixels.
Alex Szalay, JHU
The Age of Mega-Surveys
The next generation of astronomical archives with
Terabyte catalogs will dramatically change astronomy
top-down design
large sky coverage
built on sound statistical plans
uniform, homogeneous, well calibrated
well controlled and documented systematics
The technology to acquire, store and index the data is here
we are riding Moore’s Law
Data mining in such vast archives will be a challenge,
but possibilities are quite unimaginable
Integrating these archives into a single entity is a
project for the whole community
=> National Virtual Observatory
Alex Szalay, JHU
New Astronomy – Different!
Systematic Data Exploration
will have a central role in the New Astronomy
Digital Archives of the Sky
will be the main access to data
Data “Avalanche”
the flood of Terabytes of data is already happening,
whether we like it or not!
Transition to the new
may be organized or chaotic
Alex Szalay, JHU
NVO: The Challenges
Size of the archived data
•
•
•
•
40,000 square degrees is 2 trillion pixels
One band:
4 Terabytes
Multi-wavelength:
10-100 Terabytes
Time dimension:
few Petabytes
The development of
• new archival methods
• new analysis tools
• new standards
(metadata, interchange formats)
Hardware/networking requirements
Training the next generation!
Alex Szalay, JHU
Summary
The SDSS project combines astronomy, physics, and computer science
It promises to fundamentally change our view of the universe
It will determine how the largest structures in the universe were formed
It will serve as the standard astronomy reference for several decades
Its ‘virtual universe’ can be explored by both scientists and the public
Through its archive it will create a new paradigm in astronomy
Alex Szalay, JHU
Alex Szalay, JHU