16:00-OODT-Solr

Download Report

Transcript 16:00-OODT-Solr

Having Your Cake and
Eating It Too
With Apache OODT and Apache Solr
Andrew F. Hart
Paul M. Ramirez
About Myself…
• Software Engineer
– NASA Jet Propulsion Laboratory
– “Data Management”
• Committer:
– OODT, SIS, Gora, Streams (Incubating)
• Mentor: Streams (Incubating)
What We’ll Cover
• Overview of OODT & Solr Projects
• Strategies for Combining OODT and
Solr
• Detailed Deployment/Config. Example
• Where to Learn More & Participate
Apache OODT
• Object Oriented Data Technology
• Origin in NASA mission data systems
• Components for
– Information integration
– Data cataloging and archiving
– Configurable workflow processing
Apache OODT
• OODT @ Apache
– Incubation: 2010, Graduation: 2011
– 29 Committers
– Latest Release: 0.5 (Dec. 26, 2012)
Apache OODT
• Karoo Array Telescope (KAT-7)
Apache OODT
• Virtual Pediatric Intensive Care Unit
Apache OODT
• Regional Climate Model Evaluation
System
Apache OODT
• Commonalities between systems
– Lots of data
– Defined processing steps / algorithms
• Archives important (… search
important)
Apache OODT
• Strengths of OODT for the above use
cases
– Loosely coupled components
– Standard protocols, well-defined interfaces
– Highly configurable
– Vetted, reliable code
Apache Solr
• Search + Web Services
– Powerful features
– Flexible formats
– Highly configurable
Apache Solr
• The White House
Apache Solr
• Netflix
Apache Solr
• NASA Planetary Data System
OODT & Solr
• Why use these projects together?
• Archives often need search capability
• Similarities / Compatibilities
– XML-based configuration
– Environment (Java, Tomcat)
Example Integration
“Standard” Data Archive Pipeline
Example Integration
“Standard” Data Archive Pipeline + Search
OODT Products
• Typically 1-1 with Files
• Each uniquely identifiable (GUID)
• Support for higher-level “ProductType”
– A way to define collections
OODT Metadata
•
•
•
•
Annotations for products
Key:{Val|Multival}
Common across all OODT components
Two general classes:
– System
– User
OODT Metadata
• System Metadata
– Added automatically by OODT Components
– Used to track state
– Used to encode relationships between data
OODT Metadata
• User Metadata
– Specified as “policy”
– Can be product-level, or productTypelevel
– Used to extract & persist information from
files as they are ingested (become
products)
OODT Metadata
• Metadata (Policy) Example
(external)
Solr Schema
• XML document
• Define what will be indexed (“Fields”)
• Provide high-level context hints
– Data type, behavior, pre-processing
• Extremely flexible, extensible
Solr Schema
• Solr Schema Example
(external)
Making the Connection
• SolrIndexer Tool
– Part of the File Manager component tools
– Map OODT Metadata to Solr Fields
– Create Solr documents from OODT
products
– Note: only talking about metadata
SolrIndexer Tool
• Org.Apache.Oodt.Cas.Filemgr.Tools
• Available since 0.4 Release
• Recommend to use 0.5+ as some
stability improvements were added
• Several modes of operation
SolrIndexer Tool
SolrIndexerTool
• Invocation Examples: Ingest all
products from the specified File
Manager instance
java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties
-Djava.ext.dirs=/path/to/cas/filemgr/lib/ \
org.apache.oodt.cas.filemgr.tools.SolrIndexer \
--all \
--fmUrl http://localhost:9000 \
--solrUrl http://localhost:8080/solr
\
SolrIndexerTool
• Invocation Examples: Ingest all
products from the specified
ProductType(s)
java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties
-Djava.ext.dirs=/path/to/cas/filemgr/lib/ \
org.apache.oodt.cas.filemgr.tools.SolrIndexer \
--types urn:some:ProductType \
--fmUrl http://localhost:9000 \
--solrUrl http://localhost:8080/solr
\
SolrIndexerTool
• Invocation Examples: Ingest a single
product by its unique product id
java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \
-Djava.ext.dirs=/path/to/cas/filemgr/lib/ \
org.apache.oodt.cas.filemgr.tools.SolrIndexer \
--product 19bcb4b8-7999-11e1-b581-8b771498975d \
[--delete] \
--fmUrl http://localhost:9000 \
--solrUrl http://localhost:8080/solr
SolrIndexerTool
• Invocation Examples: Force
optimization of the Solr index
java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \
-Djava.ext.dirs=/path/to/cas/filemgr/lib/ \
org.apache.oodt.cas.filemgr.tools.SolrIndexer \
--optimize
--solrUrl http://localhost:8080/solr
Indexer.properties
• Configuration file for the SolrIndexer
• Specify mapping between OODT
product metadata and Solr fields
• Additional “pre-processing” features
Indexer.properties
• Example Indexer.properties file
(external)
Use Case I
•
•
•
•
Building a searchable data archive
“Long-term” / “Lights-out” archive
Products & metadata immutable
Many NASA mission data systems use
this model
• Want to make it easily searchable
Use Case I
“Standard” Data Archive Pipeline + Search
Use Cases II
• Building an interactively editable,
searchable data archive
• Data and metadata mutable
• Want to dynamically select product(s)
to edit based on metadata
Use Case II
Interactively Editable Data Archive Pipeline + Search
Use Case II
Interactively Editable Data Archive Pipeline + Search
Solr catalog out of sync!
Synchronization
• Two ways (at least) to solve this:
A. Modify the OODT Curator Services
B. Treat OODT Curator Services as “black
box” and write “wrapper” service to
invoke Curator Services AND update Solr
(via scripted call to SolrIndexer, for
example)
Modify Curator Services
• Services implemented in JAX-RS
• /curator/src/main/java/org/apache/oodt/cas/cu
ration/service
• [curator_url]/services/metadata/update
• Options:
– Utilize Solr Java API
– Wrap call to OODT SolrIndexer tool
Use Case II-A
Modified Curator Services to Simultaneously update Solr
Example
• Interactive event
tagging
Wrap Curator Services
• Curator Service/API is “black box”
• Develop custom service that:
– Issues POST request to Curator service
– Updates Solr index via, e.g.:
• Utilize Solr Java API
• Wrap call to OODT SolrIndexer tool
Use Case II-B
Wrapping OODT Curation Services with Custom UI & Services
Example
Lessons
• Solr compliments OODT File Manager
• RESTful interfaces (Solr + OODT
Curator) allow for great flexibility in
designing services and UI
• “Best” approach depends on situation
Next Steps
• Develop “SolrCatalog” for OODT File
Manager?
– Pros: Reduction in “moving parts”
– Cons: Restrictive?
• Implement Use Case II-A as optional
mode for Curator web service layer
Learning More
• Solr
– http://lucene.apache.org/solr
• [email protected]
• OODT
– http://oodt.apache.org
• https://cwiki.apache.org/confluence/display/O
ODT/Home
• [email protected]
Thanks!
• Questions?