Development of Services in the Fedora Service Framework by Gert Schmeltz Pedersen [email protected] Danmarks Tekniske Universitet / Technical University of Denmark Danmarks Tekniske Videncenter /

Download Report

Transcript Development of Services in the Fedora Service Framework by Gert Schmeltz Pedersen [email protected] Danmarks Tekniske Universitet / Technical University of Denmark Danmarks Tekniske Videncenter /

Development of Services in the Fedora Service Framework

by Gert Schmeltz Pedersen [email protected]

Danmarks Tekniske Universitet / Technical University of Denmark Danmarks Tekniske Videncenter / Technical Knowledge Center of Denmark

DORSDL Workshop, 21 September 2006

Development of Services in the Fedora Service Framework

• Contents

– The Fedora Service Framework – The Fedora Generic Search Service – Considerations about a Peer-to-Peer Service for Fedora – Conclusion

DORSDL Workshop, 21 September 2006

2

The Fedora Service Framework

Flexible Extensible Digital Object Repository Architecture

Powerful digital object model

Extensible metadata management

Expressive inter-object relationships

• •

Services

are stand-alone web applications that run independently of the Fedora repository Two main

benefits

– to the service framework approach: allows

new functionality

to be added as atomic, modular services that can interact with Fedora repositories, yet not be part of the repository, – makes

co-development

of new services for Fedora easier since each service can be independently developed and plugged into the framework.

DORSDL Workshop, 21 September 2006

3

The Fedora Service Framework

Fedora Object XML (FOXML) is a simple XML format that directly expresses the Fedora digital object model

DORSDL Workshop, 21 September 2006

4

Development of Services in the Fedora Service Framework

• The Fedora Generic Search Service

– Background • The DEF-XWS project • Zebra at work • Lucene in action – Approach and requirements – Current prototype (fedoragsearch) – Architectural snapshots – Configuration and customization – Further work – The work is funded by DEFF, Denmark's Electronic Research Library .

DORSDL Workshop, 21 September 2006

5

Background - DEF-XWS Eprints

Open Archives Initiative Data Providers M Y S Q L OAI-PMH OAI Harvester OAI Manager E X P O R T E X P O R T Full set Sub set Zebra server Zebra server Z39.50

Web UI w/Z39.50

Web UI w/Z39.50

Librarian

ingest Full text retrieval Zebra server

Eprint Service Provider SOAP/REST Web UI w/SOAP java Web UI w/REST php AppXYZ w/SOAP perl

DORSDL Workshop, 21 September 2006

DEF Portal User InfoNet User DEF-XWS Eprints User DEF-XWS Eprints User AppXYZ User

6

Background - DEF-XWS Eprints

• • – – – Purpose achieved Fedora hands-on and experience web services hands-on and experience • • • DEF-XWS Eprints available from web services http://defxws.cvt.dk:8082/fedora/access/soap?wsdl

http://defxws.cvt.dk:8082/fedora/accessDEF-XWS/soap?wsdl

and to applications combining many web services – – Lesson Do not override field search, provide generic search service instead ...

DORSDL Workshop, 21 September 2006

7

Zebra at work

• • • • • • • • •

Features

Zebra is provided as open source by Index Data.

Written in portable C, so it runs on most Unix-like systems as well as Windows. Modules zebraidx and zebrasrv Searching supports a combination of boolean queries, relevance-ranking, truncation, masking, full regular expression matching and "approximate matching" (eg. spelling mistakes).

Z39.50 protocol support, recently also SRW/SRU and CQL Configurable to understand many input formats... SGML, XML, ISO2709 (MARC), raw text.

Arbitrarily complex records. Robust updating fly”.

records can be added and deleted “on the Very large databases: logical files can be automatically partitioned over multiple disks.

DORSDL Workshop, 21 September 2006

8

”Lucene in Action”

dc.title:"Information retrieval" AND dc.creator:Staples Document Field http://lucene.apache.org/java/docs/queryparsersyntax.html

Figure 1.5 A typical application integration with Lucene 9

DORSDL Workshop, 21 September 2006

Approach and Requirements

• • • • • • • • • Do iterations of requirements analysis and prototype development allow various indexing-and-search engines to be configured or plugged in, initially Lucene and Zebra implement as a webapp within the Fedora Service Framework allow indexing of, and search in, all information in FOXML records for FedoraObjects, including full texts in datastreams and disseminator results define interface for a set of operations, provide REST and SOAP access basic operations: – –

updateIndex

- indexing the contents of the Fedora repository

gfindObjects

- search similar to Fedora findObjects secondary operations: –

browseIndex

- browsing terms in a given index. – –

getRepositoryInfo

- describing the properties of a repository

getIndexInfo

- describing the properties of an index allow multiple repositories to be indexed in one and the same index allow multiple indexes to be generated from one repository

DORSDL Workshop, 21 September 2006

10

Current prototype - updateIndex

Advanced FO Sample from Apache FOP Distribution Apache Group transformation FedoraObject FO_TO_PDFDOC Advanced FO Sample … Apache Group 11

DORSDL Workshop, 21 September 2006

Current prototype - gfindObjects

DORSDL Workshop, 21 September 2006

12

Current prototype - gfindObjects

DORSDL Workshop, 21 September 2006

13

Current prototype - browseIndex

DORSDL Workshop, 21 September 2006

14

Current prototype - getRepositoryInfo

DORSDL Workshop, 21 September 2006

15

Current prototype - getIndexInfo

DORSDL Workshop, 21 September 2006

16

- basic Architectural snapshots - fedoragsearch

• Contents

– Lucene – Zebra – fedoragsearch • REST demo • architecture • installation and configuration • further customizations

DORSDL Workshop, 21 September 2006

17

Architectural snapshots - indexing - many-to-many

DORSDL Workshop, 21 September 2006

18

Configuration and customization

Configuration examples:

fedoragsearch.properties

- soapBase = http://HOSTPORT/fedoragsearch/services - repositoryNames = REPOSNAMES - indexNames = INDEXNAMES - mimeTypes = MIMETYPES INDEXNAME/index.properties

- operationsImpl = dk.defxws.fgslucene.OperationsImpl

- defaultQueryFields = dc.description dc.title

REPOSNAME/repository.properties

- soapBase = http://FEDORAHOSTPORT/fedora/services - fedoraObjectDir = FEDORAOBJECTDIR

Customization examples:

demoFoxmlToLucene.xslt

demoGfindObjectsToHtml.xslt