Development of Services in the Fedora Service Framework by Gert Schmeltz Pedersen [email protected] Danmarks Tekniske Universitet / Technical University of Denmark Danmarks Tekniske Videncenter /
Download ReportTranscript Development of Services in the Fedora Service Framework by Gert Schmeltz Pedersen [email protected] Danmarks Tekniske Universitet / Technical University of Denmark Danmarks Tekniske Videncenter /
Development of Services in the Fedora Service Framework
by Gert Schmeltz Pedersen [email protected]
Danmarks Tekniske Universitet / Technical University of Denmark Danmarks Tekniske Videncenter / Technical Knowledge Center of Denmark
DORSDL Workshop, 21 September 2006
Development of Services in the Fedora Service Framework
• Contents
– The Fedora Service Framework – The Fedora Generic Search Service – Considerations about a Peer-to-Peer Service for Fedora – Conclusion
DORSDL Workshop, 21 September 2006
2
The Fedora Service Framework
Flexible Extensible Digital Object Repository Architecture
•
Powerful digital object model
•
Extensible metadata management
•
Expressive inter-object relationships
• •
Services
are stand-alone web applications that run independently of the Fedora repository Two main
benefits
– to the service framework approach: allows
new functionality
to be added as atomic, modular services that can interact with Fedora repositories, yet not be part of the repository, – makes
co-development
of new services for Fedora easier since each service can be independently developed and plugged into the framework.
DORSDL Workshop, 21 September 2006
3
The Fedora Service Framework
•
Fedora Object XML (FOXML) is a simple XML format that directly expresses the Fedora digital object model
DORSDL Workshop, 21 September 2006
4
Development of Services in the Fedora Service Framework
• The Fedora Generic Search Service
– Background • The DEF-XWS project • Zebra at work • Lucene in action – Approach and requirements – Current prototype (fedoragsearch) – Architectural snapshots – Configuration and customization – Further work – The work is funded by DEFF, Denmark's Electronic Research Library .
DORSDL Workshop, 21 September 2006
5
Background - DEF-XWS Eprints
Open Archives Initiative Data Providers M Y S Q L OAI-PMH OAI Harvester OAI Manager E X P O R T E X P O R T Full set Sub set Zebra server Zebra server Z39.50
Web UI w/Z39.50
Web UI w/Z39.50
Librarian
ingest Full text retrieval Zebra server
Eprint Service Provider SOAP/REST Web UI w/SOAP java Web UI w/REST php AppXYZ w/SOAP perl
DORSDL Workshop, 21 September 2006
DEF Portal User InfoNet User DEF-XWS Eprints User DEF-XWS Eprints User AppXYZ User
6
Background - DEF-XWS Eprints
• • – – – Purpose achieved Fedora hands-on and experience web services hands-on and experience • • • DEF-XWS Eprints available from web services http://defxws.cvt.dk:8082/fedora/access/soap?wsdl
http://defxws.cvt.dk:8082/fedora/accessDEF-XWS/soap?wsdl
and to applications combining many web services – – Lesson Do not override field search, provide generic search service instead ...
DORSDL Workshop, 21 September 2006
7
Zebra at work
• • • • • • • • •
Features
Zebra is provided as open source by Index Data.
Written in portable C, so it runs on most Unix-like systems as well as Windows. Modules zebraidx and zebrasrv Searching supports a combination of boolean queries, relevance-ranking, truncation, masking, full regular expression matching and "approximate matching" (eg. spelling mistakes).
Z39.50 protocol support, recently also SRW/SRU and CQL Configurable to understand many input formats... SGML, XML, ISO2709 (MARC), raw text.
Arbitrarily complex records. Robust updating fly”.
records can be added and deleted “on the Very large databases: logical files can be automatically partitioned over multiple disks.
DORSDL Workshop, 21 September 2006
8
”Lucene in Action”
dc.title:"Information retrieval" AND dc.creator:Staples Document Field http://lucene.apache.org/java/docs/queryparsersyntax.html
Figure 1.5 A typical application integration with Lucene 9
DORSDL Workshop, 21 September 2006
Approach and Requirements
• • • • • • • • • Do iterations of requirements analysis and prototype development allow various indexing-and-search engines to be configured or plugged in, initially Lucene and Zebra implement as a webapp within the Fedora Service Framework allow indexing of, and search in, all information in FOXML records for FedoraObjects, including full texts in datastreams and disseminator results define interface for a set of operations, provide REST and SOAP access basic operations: – –
updateIndex
- indexing the contents of the Fedora repository
gfindObjects
- search similar to Fedora findObjects secondary operations: –
browseIndex
- browsing terms in a given index. – –
getRepositoryInfo
- describing the properties of a repository
getIndexInfo
- describing the properties of an index allow multiple repositories to be indexed in one and the same index allow multiple indexes to be generated from one repository
DORSDL Workshop, 21 September 2006
10
Current prototype - updateIndex
DORSDL Workshop, 21 September 2006
Current prototype - gfindObjects
DORSDL Workshop, 21 September 2006
12
Current prototype - gfindObjects
DORSDL Workshop, 21 September 2006
13
Current prototype - browseIndex
DORSDL Workshop, 21 September 2006
14
Current prototype - getRepositoryInfo
DORSDL Workshop, 21 September 2006
15
Current prototype - getIndexInfo
DORSDL Workshop, 21 September 2006
16
- basic Architectural snapshots - fedoragsearch
• Contents
– Lucene – Zebra – fedoragsearch • REST demo • architecture • installation and configuration • further customizations
DORSDL Workshop, 21 September 2006
17
Architectural snapshots - indexing - many-to-many
DORSDL Workshop, 21 September 2006
18
Configuration and customization
Configuration examples:
fedoragsearch.properties
- soapBase = http://HOSTPORT/fedoragsearch/services - repositoryNames = REPOSNAMES - indexNames = INDEXNAMES - mimeTypes = MIMETYPES INDEXNAME/index.properties
- operationsImpl = dk.defxws.fgslucene.OperationsImpl
- defaultQueryFields = dc.description dc.title
REPOSNAME/repository.properties
- soapBase = http://FEDORAHOSTPORT/fedora/services - fedoraObjectDir = FEDORAOBJECTDIR
Customization examples:
demoFoxmlToLucene.xslt