Presentation Title - Apache Software Foundation

Download Report

Transcript Presentation Title - Apache Software Foundation

A Look into the Apache OODT
Ecosystem
Chris A. Mattmann
NASA JPL/Univ. Southern California/ASF
[email protected] November 9, 2011
And you are?
• Senior Computer Scientist at
NASA JPL in Pasadena, CA
USA
• Software
Architecture/Engineering
Prof at Univ. of Southern
California
• Apache Member involved in
– OODT (VP, PMC), Tika (VP,PMC), Nutch (PMC), Incubator (PMC),
SIS (Mentor), Lucy (Mentor) and Gora (Champion), MRUnit
(Mentor), Airavata (Mentor)
Welcome to the Apache in Space!
(OODT) Track
Agenda
•
•
•
•
•
Overview of OODT and its history
How we got it to Apache
How other projects can follow our model
Existing successful deployments of OODT
Pointers to papers, and more information
including case studies
Lessons from 90’s era missions
• Increasing data volumes (exponential growth)
• Increasing complexity of instruments and algorithms
• Increasing availability of proxy/sim/ancillary data
• Increasing rate of technology refresh
… all of this while NASA Earth Mission funding was decreasing
A data system framework based on a standard architecture and
reusable software components for supporting all future missions.
Enter OODT
Object Oriented Data Technology http://oodt.apache.org
Funded initially in 1998 by NASA’s Office of Space Science
Envisaged as a national software framework for sharing
data across heterogeneous, distributed data repositories
OODT is both an architecture and a reference
implementation providing
Data Production
Data Distribution
Data Discovery
Data Access
OODT is Open Source and available from the
Apache Software Foundation
Apache OODT
•
•
•
•
•
•
Originally funded by NASA to focus on
– distributed science data system
environments
– science data generation
– data capture, end-to-end
– Distributed access to science data
repositories by the community
A set of building blocks/services to exploit
common system patterns for reuse
Supports deployment based on a rich
information model
Selected as a top level Apache Software
Foundation project in January 2011
Runner up for NASA Software of the Year
Used for a number of science data
system activities in planetary, earth,
biomedicine, astrophysics
http://oodt.apache.org
Apache OODT Press
To Host Open Source Summit -- Open Source -- Informati...
Welcome Guest.
Log In
Register
http://www.informationweek.com/news/government/enterprise-app...
Benefits
RSS Feeds
Subscribe
Newsletters
Events
Home
Software
News
Blogs
Security
Hardware
Video
Digital Library
Slideshows
Mobility
Windows
Internet
Global CIO
Government
Healthcare
Financial
SMB
Personal Tech
Cloud Computing
Information Management
Mobile & Wireless
Security
Enterprise Architecture
Leadership
Policy & Regulation
State & Local
2
Lik e
2
Share
Share
Permalink
NASA To Host Open Source Summit
The agency plans to bring experts together March 29-30 to discuss open source policy and how
NASA can better support the community.
By Elizabeth Montalbano InformationWeek
March 15, 2011 11:37 AM
NASA will continue its support of the open source community by hosting its
first-ever summit around the technology.
Cloud
Why Apache and OODT?
• OODT is meant to be a set of tools to
help build data systems
– It’s not meant to be “turn key”
– It attempts to exploit the boundary
between bringing in capability vs.
being overly rigid in science
– Each discipline/project extends
• Apache is the elite open source
community for software developers
– Less than 100 projects have been
promoted to top level (Apache Web
Server, Tomcat, Solr, Hadoop)
– Differs from other open source
communities; it provides a
governance and management
structure
Governance Model+NASA=♥
• NASA and other government
agencies have tons of process
– They like that
Publicly accessible and
searchable archives
• http://svnsearch.org/svnsearch/r
epos/ASF/search?path=%2Foodt
• http://mailarchives.apache.org/mod_mbox/
oodt-dev/
• http://mailarchives.apache.org/mod_mbox/
oodt-user/
• 100+ ML
list subscriptions
Great Metrics and Insight
• http://www.ohloh.net/p/oodt
Movement to the ASF
• Meeting held June 15, 2007 at JPL with
ASF President Justin Erenkrantz
– Develop plan moving forward to bring first
NASA project to Apache
– Discuss obstacles, sponsorship
– Discuss outlook
2007: original goals
• Come up with incubation proposal
– Chris Mattmann was one of the principal contributors
to the proposal for the Tika project, and to other
Incubation activities (Apache SIS)
– Send out emails to the Incubator mailing list
• Look for mentors
• Get sponsorship from ranking Apache PMC
member or board member
– Justin and others
• Top-level project versus sub project outlook
heading out of incubation
OODT Incubator Planning
• Monthly Updates (for first 3 months, then
quarterly)
–
–
–
–
Status
Progress
Community
Acceptance
• Plan for exiting incubation
– How to have a solid user base
– How to operate as a unit in the Apache way
– Maintenance of user interest and community going
forward
OODT’s next steps circa 2007
• JPL to tackle legal issues
– Is OODT releasable as an Apache product
– http://www.apache.org/licenses/software-grant.txt
• This needs to be signed by parties that be by JPL
– Contributor License Agreement
• Do we need a corporate one?
• In parallel to this
– Draft OODT incubation proposal
– Start identifying who would initially be interested
• More external, non-JPL people who are interested, the better
• Justin to get slides from other incubator people
…2 years later
• Worked it out with JPL legal
– Turns out the ALv2 license is extremely friendly and is
something that JPL (note not all of NASA) was
amenable to
• Developed OODT incubator proposal
– http://wiki.apache.org/incubator/OODTProposal
• Found willing Apache mentors besides Justin
– Jean-Frederic Clere, Ross Gardler, Ian Holsman
• …Put OODT at Apache!
Apache OODT Community
• Includes PMC members from
– NASA JPL, Univ. of Southern California, Google, Children’s
Hospital Los Angeles (CHLA), Vdio, South African SKA
Project
• Projects that are deploying it operationally at
– Decadal-survey recommended NASA Earth science
missions, NIH, and NCI, CHLA, USC, South African SKA
project
• Use in the classroom
– My graduate-level software architecture and seach
engines courses
OODT Framework
Andrew Hart and
Emily Law will talk
about these later
OODT/Science
Web Tools
You’ll hear
about this
later today
Archive
Client
Navigation
Service
OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK
Catalog &
Archive
Service
Profile
Service
Product
Service
Bridge to
External
Services
Query
Service
Other
Service 1
Other
Service 2
Profile
XML Data
Data
System 1
Data
System 2
I’ll tell you about these now
Architectural Principles
• Division of Labor
– Don’t make one component the workhorse!
• Technology Independence
– Don’t get bitten in the rear when a software vendor
decides to charge you a lot of $$$ for their previously low
cost technology
• Metadata as a first-class citizen
– Descriptions of resources come in handy
• Separation of software and data models
– Allow each to evolve independently
OODT Architecture
• Reference Architecture
– Four pairs of component types
• Product Client/Server, Profile Client/Server, Query Client/Server,
Catalog and Archive Client/Server
– Two connector types
• Messaging layer discussed in
http://sunset.usc.edu/~mattmann/pubs/ICSE.pdf
• Handler connector (discussed in this presentation)
• Instantiated for different domains using these
fundamental building blocks
Product Client and Server
-Deliver data from underlying
data store
Web site
-Accept uniform query
structure that identifies 0 or
more “products” (data items)
to retrieve
Product
Server
(A)
Product
Product
Server (B)
Server (C)
-Many-to-Many
Product
Product
Client (A)
Product
Client
(B)
Client (C)
RAID
MSSQL
Disk
How about an example of a product?
Profile Client and Server
-Deliver metadata from
underlying metadata store
Web site
-Metadata gives user enough
information about where to
find actual data
Profile
Server
(A)
Profile
ServerProfile
(B)
-Housekeeping
information
Server (C)
-Resource information
-Domain-specific
information
-Many-to-Many
Profile
Profile
Client (A)
Profile
Client
(B)
Client (C)
MSSQL
Oracle
How about an example of a profile?
Attributes
Relationships
Query Client and Server
Profile
Server (B)
-Query Server seeded with initial set of
pointers to Profile Servers
-Profile Servers point to actual resources
(Product Servers, even other Profile Servers)
-Interactive (metadata returned)
and non-interactive (data returned)
-Many-to-Many
Query
Query
ClientQuery
(A)
Client (B)
Client (C)
Query
Query
ServerQuery
(A)
Server (B)
Server (C)
Profile
Discovered
Server (A)
Initial set
Product
Product
Server (A)
Server (B)
Catalog and Archive Client and Server
(CAS)
-Ingest data into repository and metadata
-Serve
into
back Repository data with Product
registry
Server
Repository
Archive
-Run processing
algorithms on data/metadata
-Serve back Registry metadata with Profile
Archive
Client (B)
upon ingestion
Server
Archive
Client (C)
Product
-Workflow support
-Many-to-Many
Server (A)
Server (A)
Registry
Archive
Archive
Server (B)
Client (A)
Profile
Server (A)
Some notes about CAS
•
All Core components implemented as web services
–
XML-RPC used to communicate between components
–
Servers implemented in Java
–
Clients implemented in Java, scripts, Python, PHP and web-apps
–
Service configuration implemented in ASCII and XML files
Credit: D. Woollard
Handler Connectors
-Encapsulate (meta-)data coordination and
communication
DBMS Product
Handlerof
-Allow for dynamic addition
and removal
different classes of back end metadata and
Product/Profile
data stores
Server
Flat File Product Handler
Web Site Product Handler
Product/Profile Server
MSSQL
Web site
RAID
MSSQL
Disk
Example handler connectors
• XMLPS
– http://oodt.apache.org/components/maven/xmlp
s/
– XML config file specifies recipe for extracting
records from an RDBMS and turning them into a
NoSQL repository
•
PS
– XML configurable profile server to unlock
OPeNDAP datasets and pass them to OODT
So, how do you piece them
together: NASA VODC
VODC Search Layer
• NASA’s Virtual
Oceanographic
Data Center (VODC)
• http://vodc.jpl.nasa
.gov
• Information
integration using
OODT components
VODC Solr Layer
(facet-based search)
VODC Free
Text Search
lucene
index
Ocean
Metadata
Crawler
product id or URI
NOAA NODC Gateway
dataset or granule
NOAA
NODC
Product
Server
NOAA
NODC
Profile
Server
dataset
or granule
metadata
VODC Query
Server
(forms-based
search)
VODC Portal and Search
Interface
(free-text, facet and forms
search)
(download of datasets
and granules as zips)
GHRSST Gateway
GHRSST
Product
Server
GHRSST
Profile
Server
dataset
or granule
metadata
dataset or granule
product id or URI
VODC core data access, metadata discovery
Profile, Product, Query, also uses Apache Solr, and Plone
VODC Profile Servers are
driven by Core VODC
data model, unified set of
oceans information
So, how do you piece them
together: JPL’s CDX
• CDX = Climate Data
Exchange
• Provide comparison of
remote sensing data and
model outputs
• Existing systems remain in
place; services expose data
and functions over the
network; support the era of
IPCC 5th assessment and
distributed, petabytes of
data
Who’s doing what?
• Children’s Hospital Los Angeles
– Improving upon XMLPS, and CAS (Andrew Hart + Ricky Nguyen will talk about this)
– Supporting data analytics
• Google
– Brian Foster working on command line improvements and data protocol push/pull
• SKA South Africa
– Deploying file manager and crawler for use in KAT-7 pipeline ingestion
• NIH/NCI
– Maintaining the XMLPS components, and CAS components
– Helping with user interfaces
• Various JPL and NASA research projects
– OPeNDAPps, XMLPS
• Various NASA missions
– Workflow, PCS, services, OPSui, other web apps
Latest release: 0.3
• First appearance of PCS
– Core, Services (JAX-RS)
• Web Applications
– Balance (PHP), and Wicket (Java)-based apps for
file management and workflow monitoring
• First release deployed to Maven Central
– We did backport 0.2 there after this
– Over 60 issues fixed in JIRA
• June 2011: recommended stable release
Working on: 0.4
• Operator Interface (OODT-157)
– Andrew Hart and I will talk about this
• Workflow2 integration (OODT-215) and all of its sub-issues
– Global workflow conditions, dynamic workflows, parallel/sequential
model, new workflow engine, etc.
• OODT RADIX for super easy deployment (OODT-120)
– Paul Ramirez and Cameron Goodale will discuss this
• Solr sync with File Manager (OODT-326)
• Improvements to XMLPS (OODT-333) and new crawler actions
(OODT-33, OODT-34, OODT-35, OODT-36, OODT-37)
• Over 48 issues currently resolved
• Likely to come before end of Q4 2011
Using Apache OODT as a
testbed for software process
• Missions maintain
their own local CMs
• Local mission CMs
contain forks of
existing OSS
software
– Forks can be patch
based or CM
based
• Changes found
particularly effective
are discussed
within the comm.
And eventually
brought before a CCB that
reviews their generality, etc.
Credit: D. Freeborn
36
Some Grand Challenges I’m
interested in: OODT can help!
• How do we handle 700 TB/sec of data coming off the
wire when we actually have to keep it around?
– Required by the Square Kilometre Array
• Joe scientist says I’ve got an IDL or Matlab algorithm
that I will not change and I need to run it on 10 years
of data from the Colorado River Basin and store and
disseminate the output products
– Required by the Western Snow Hydrology project
Some Grand Challenges I’m
interested in: OODT can help!
• How do we compare petabytes of climate
model output data in a variety of formats
(HDF, NetCDF, Grib, etc.) with petabytes of
remote sensing data to improve climate
models for the next IPCC assessment?
– Required by the 5th IPCC assessment and the Earth
System Grid and NASA
• How do we catalog all of NASA’s current
planetary science data?
Key Takeaway
OODT is already doing and/or preparing the world
to handle all of these diverse use cases!
It’s a constantly evolving and improving framework – join up and help.
It’s free and open source from Apache and helping government demonstrate
the public good
OODT Project Contact Info
• Learn more and track our progress at:
– http://oodt.apache.org
– WIKI: https://cwiki.apache.org/OODT/
– JIRA: https://issues.apache.org/jira/browse/OODT
• Join the mailing list:
– [email protected]
• Chat on IRC:
– #oodt on irc.freenode.net
•
Acknowledgements
– Key Members of the OODT teams: Chris Mattmann, Daniel J. Crichton, Steve Hughes, Andrew
Hart, Sean Kelly, Sean Hardman, Paul Ramirez, David Woollard, Brian Foster, Dana Freeborn,
Emily Law, Mike Cayanan, Luca Cinquini, Heather Kincaid
– Projects, Sponsors, Collaborators: Planetary Data System, Early Detection Research Network,
Climate Data Exchange, Virtual Pediatric Intensive Care Unit, NASA SMAP Mission, NASA
OCO-2 Mission, NASA NPP Sounder Peate, NASA ACOS Mission, Earth System Grid
Federation
Alright, I’ll shut up now
• Any questions?
• THANK YOU!
– [email protected]
– @chrismattmann on Twitter