Transcript Slide 1

Unidata 2008:
Shaping the Future of Data Use in the
Geosciences
Expanding Horizons: Using Environmental Data
for Education, Research, and Decision Making
23 June 2003
Boulder, CO
Mohan Ramamurthy
Unidata Program Center
UCAR Office of Programs
Boulder, CO
Thank you, Dave
 I wish to take this opportunity
to extend my sincerest gratitude
to Dave Fulker, the founding
director of Unidata, for his
distinguished service to the
Unidata Community for nearly
20 years.
 Unidata would not be what it is
today without his vision,
leadership, energy and his many
extraordinary qualities.
 And thank you to Ben Domenico
for his excellent stewardship
during the transition.
The Word of the Day for Jun 23rd is


The Word of the Day for Jun 23 is:
bloviate \BLOH-vee-ayt\ verb
: to speak or write verbosely and windily
[Courtesy: Jo Hansen, Unidata Program Center]
Example sentence:
Mohan can bloviate on a par with the windiest of
professors, but he's also capable of being concise
and getting right to the point. (yeah, right)
Expanding Horizons





New Strategic Plan
New Director
New 5-year proposal
Many new and
exciting initiatives
New logo!
Unidata
 Mission Statement:
Provide data, tools, and community leadership
for enhanced Earth-system education and
research.
At the Unidata Program Center, we
• Facilitate Data Access
• Provide Tools
• Support Faculty and Staff
• Build and Advocate for a Community  user
workshops come under this activity
Technology Portfolio
1) McIDAS: A client/server analysis and display
package, originally developed by U. Wisconsin/SSEC,
that emphasizes image processing of data from
satellite-borne sensors;
2) GEMPAK: An analysis, display, and product
generation package for meteorological data;
3) Local Data Manager: Software for capturing,
disseminating, and organizing data in near-real time; It is the
heart of the Internet Data Distribution (IDD) system;
4) NetCDF: A software interface for platformindependent access to self describing datasets;
5) Integrated Data Viewer: Java-based, platformindependent data analysis and 3D visualization tools;
6) THREDDS: A project to facilitate remote access to
thematic, distributed, interdisciplinary data servers;
Unidata as a Diverse Community
About 150+ sites are participating in
Unidata Internet Data Distribution
(IDD) system
• 120 or so of those sites are in academia
and the rest in government and
research labs
User community is interdisciplinary 2/3rd of sites have users outside
atmospheric sciences
Internet Data Distribution
 Approximately 2 GB
of data injected/hour
from distributed
sources;
Model
Source
LDM
LDM
Satellite
 Unidata IDD/LDM
uses more of the
Internet2 than any
other advanced
application;
 Approx. 5 Terabytes
of data transmitted
each week. (Amount
varies with weather)
Radar
Source
LDM
Source
LDM
LDM
LDM
Internet
LDM
LDM
LDM
By design, the system has
no data center.
Proposed WSR-88D Data Flow
(NWS Plans)
Education Drivers
(a.k.a. A Community-Articulated Need)
 Active, studentcentered learning
 Earth-system science
or “holistic” approach
to education
 Learning science by
doing science
• Observations (data)
• Tools (models,
visualization)
• Discovery
Science Drivers
Grand Challenges in Environmental Sciences
National Research Council
NSF Director Rita Colwell, 1998: "Interdisciplinary connections are absolutely
fundamental. They are synapses in this new capability to look over and
beyond the horizon. Interfaces of the sciences are where the excitement will
be the most intense... ."
Multidisciplinary Problems
 Fire Danger determination requires taking
into account past, present and future
weather, fuel types, and the state of both
live and dead fuel moisture.
• Dead Fuel Moisture
• Live fuel moisture (NDVI)
• Drought conditions
• Atmospheric stability
• Lightning maps
• Lightning ignition efficiency
• Airflow
• Recent rainfall
• Rainfall forecast
Dual-Polarization Radar use in Fire
Weather Management



The differential
reflectivity (ZDR)
values are
noteworthy in the
smoke signal
Many regions show
ZDR >+6 dB.
Suggests flattened
ash particles (like
corn flakes)
Source: CHILL Radar Group, CSU
Flooding due to Tropical Storms
Tropical Storm Allison
Research studies and emergency management of hurricane-induced
flooding involve integrating data from atmospheric sciences,
oceanography, hydrology, geology, geography, and social sciences.
Multidisciplinary Synthesis
 Requires integration of disparate datasets and
databases from diverse sources that are
distributed geographically and disciplinarily;
 Needs integration of Scientific Information
Systems with Geographic Information Systems
 The integration poses numerous challenges;
 However, such integration is critical to solving
societal problems and advancing science.
 Metadata is crucial to achieving integration
Remote Sensing & Data Explosion
 In the next 10 years, about 100 new
satellite instruments will be launched
to monitor the environment
 Five-order magnitude increase in
satellite data is expected during that
period
• GIFTS (Geostationary Imaging Fourier
Transform Spectrometer) will have
about 1700 channels and a resolution
of 4 km
• Each NPOESS satellite will generate
one terabyte of data each day
 Advances in Radar technology
• 28 fold increase in WSR-88D data
volume in 5 years
• Phased-array radars will generate 100
fold increase data
By 2004, NOAA will ingest more data in one
year than was contained in the total archive
in 1998.
Advances in Modeling
 Shift from a purely
deterministic to a more
probabilistic approach,
requiring the use of
ensemble modeling
techniques.
 Growing emphasis on
multidisciplinary studies,
requiring coupled models:
• e.g., Hurricane landfall
flooding problem:
Atmospheric model
(WRF/MM5), Ocean
model (ROMS),
Hydrologic model (HMS)
Local Modeling: A Notable Trend
 Over 30 universities are
now running mesoscale
models locally.
 One can think of this
aggregation as a national
forecasting instrument
 However, only one or two
groups initializing their
model runs with local
observations
 As the scale of these local
model runs becomes finer,
there is a natural desire to
integrate their output with
information from other
sources (e.g., hydrology,
infrastructure, societal
datasets in GIS form)
Iowa St. Linux Cluster
Technology Drivers
 Object-oriented programming
 Open Standards, Interoperability and Open Source
Movement
• Metcalfe's Law: the usefulness, or utility, of a network
increases as the square of the number of users.
 Web services (HTTP, Java, XML, SOAP, UDDI, …)
 Digital libraries (Metadata, discovery, information
services…)
 Grid environments and distributed computing
 Commodity microprocessors
 Cluster computing
 High bandwidth networks: 10GigE, Fast IP, …
 Broadband access
 Wireless networks: 802.11 networks, GPRS, 3G
 IPv6: Next-generation internet protocol
 Collaborative computing
 Scientific data mining and knowledge discovery
Web Services and the Wild and Wooly
World of Markup Languages
Services
 Web services is a
technology and process for
discovery and connection.
Users
Metadata
repository
Collections
(Data, tools,
educational
materials)
 The eXtended Markup
Language, XML, is
accepted as THE emerging
standard for data
interchange on the Web.
 XML allows authors to
create their own markup,
which has led to the
proliferation of “MyOwn
Markup Language”
Five-year Core Funding NSF Proposal
 Title: Unidata 2008: Shaping the Future of Data
Use in the Geosciences
We are moving from an era of data provision to one in
which data- and related web-services are emphasized
 Six endeavors are proposed, focusing on Community and
Support Services and Data Services, Systems, and Tools
 The proposed endeavors will enable the community to
advance scientific exploration, education, and decisionmaking.
“The unanimous finding of the panel is that the Unidata Program
Center program be supported as fully as possible by NSF for the
years 2003-2008.”
Proposed Endeavors
 Endeavor 1. Responding to a broader and more
diverse community.
• Respond to increased emphasis on Earth-system
science (e.g., bring new data sets to the community)
• Establish new partnerships with related communities
(e.g. with Hydrology via CUAHSI)
• Support new tools in technically less-sophisticated
institutions (e.g., community colleges)
 Endeavor 2. Comprehensive support services
• Deploy web-based training modules
• Simplify installation and maintenance for all supported
packages
• Explore new technologies (e.g., Access Grid) to facilitate
remote collaboration
Endeavor 3: Real-time, self-managing
data flows
 More flexibility and control
• Many more feed types for finer control over routing and
subsetting
• Configurable product priorities
 Self-managing data flows (automatic dynamic routing)
• Application-level multicast looks promising for hundreds of
sites (IP multicast not suitable due to limitations)
• NLDM: data flooding via Usenet protocols may provide
practical routing solution (needs more testing)
 Support for new standards
• Use of IP version 6 protocols
• Internet2, Grid and e-services standards (authentication,
resource use, ...)
• Location-transparency for data
LDM-5 Vs. LDM-6 Latencies
CONDUIT Experience
Average delivery time :
~20 seconds to top-tier sites
Endeavor 4. Software to analyze and
visualize geoscience data
 Integrate diverse
datasets
 Support analysis and
visualization of local and
climate modeling efforts
 Develop collaborative
tools to make effective
use of shared
visualizations
 Allow customized user
experiences
 Adapt to GIS frameworks
–
Cloud water isosurface from COMMAS storm model data
(courtesy Adam Houston and Dan Bramer, NCSA/UIUC)
People
Discovery and
Publication Tools
Discovery and
Publication Services
Documents
Analysis and
Visualization Tools
THREDDS
Middleware
Data Services
Data
THREDDS, GIS, DL Interoperability
THREDDS Client
Applications
GIS Client
Applications
OGC or
proprietary GIS
protocols
OGC or OPeNDAP
ADDE. FTP…
protocols
OpenGIS Protocols:
WMS, WFS, WCS
GIS Servers
GIS Server
Demographic,
infrastructure,
GIS Server
societal impacts, …
datasets
Metadata
crosswalk
THREDDS Servers
THREDDS Server
THREDDS
Server
Satellite,
radar,
forecast model output, …
datasets
Metadata
crosswalk
Open Archives Initiative (OAI) Metadata Harvesting
Digital Library Discovery Systems
Endeavor 6: Improved data access
infrastructure
NetCDF-HDF Integration
 Extend netCDF to
high-performance
computing
environment
 Implement parallel
I/O, large grids, etc.
 Work will directly
benefit WRF and CCSM
communities
Proposed
Implementation
Current
Implementation
Application
Application
netCDF
netCDF
HDF5 (serial and/or parallel)
POSIX I/O
POSIX
I/O
Split
files
File
File
Metadata
Raw
data
MPII/O
Custo
m
UserParallel defined
file
device
system
Strea
m
Network or
to/from
another
application
The Visual Geophysical Exploration
Environment (VGEE)
 The VGEE is an integrated
framework in which
students use visualization
tools, data, and curricular
materials to learn basic
physical principles of
atmospheric science
 It includes:
• A learner interface to the IDV
• Java-based concept models to
support physical insight
• A curriculum to guide inquiry
• A catalog of data (THREDDS)
VGEE: An Integrated Framework
Concept Models, which are used to
explore relations in an idealized
context.
Students notice that
the Western Pacific is
considerably warmer
than the East.
Identify
Relate
Explain
Integrate
Concluding Remarks



We live in an exciting moment in the history of
the Earth sciences.
Workshops like this and the diversity of
representation from academia are testimony to
the vibrancy of the community and the program.
The portfolio of tools and technologies within
Unidata, coupled with the energies of a creative
and collaborative community, puts us in an ideal
position to meet the important challenges facing
the education and research communities in the
atmospheric and related sciences.