Transcript Slide 1
1st European Workshop on the use of information object Repository Systems in Digital Libraries (DORSDL), in conjunction with ECDL2006 Typing OpenDLib Repository Service: Strengths of an Information Object Type Language Leonardo Candela, Donatella Castelli Paolo Manghi, Pasquale Pagano Centro Nazionale delle Ricerche Pisa, Italy DORSDL Workshop - 21th of September, 2006 DB Systems: realizing a DB Application Application System Interface Typed Data Model (Type Language) DBMS DBMS 3 DORSDL Workshop - 21th of September, 2006 DB Systems: type definition Application Managers Projects System Interface Typed Data Model (Type Language) DBMS DBMS 4 DORSDL Workshop - 21th of September, 2006 DB Systems: storage creation Application Projects Managers System Interface Typed Data Model (Type Language) DBMS M P DBMS 5 DORSDL Workshop - 21th of September, 2006 DB Systems: Application Usage Component on Managers and Projects Projects Managers Application System Interface Typed Data Model (Type Language) DBMS M P DBMS 6 DORSDL Workshop - 21th of September, 2006 DB Systems Component on Managers and Projects Projects Managers Application System Interface Typed Data Model (Type Language) DBMS D1 M D2 P DBMS 7 DORSDL Workshop - 21th of September, 2006 DB Systems: additions Component on Budgets Component on Managers and Projects Projects Budgets Managers Application System Interface Typed Data Model (Type Language) D1 M D2 P D3 B DBMS 8 DORSDL Workshop - 21th of September, 2006 Relational DB System Component on Managers and Projects Table Managers Component on Budgets Table Table Projects Budgets Application System Interface Relational Model (SQL schema) D1 TabM D2 TabP D3 TabB Relational DBMS 9 DORSDL Workshop - 21th of September, 2006 Typed Data Models: advantages Application development and maintenance Functionality and content are kept independent from each other Type correctness: components must be typeconformant Modularity Reuse: component-wise and data wise 10 DORSDL Workshop - 21th of September, 2006 Typed Data Models: advantages Type-driven physical storage Data integrity: data can be handled according to their associated structure only Type information can be exploited to optimize space storage and access time 11 DORSDL Workshop - 21th of September, 2006 DL Applications and (Type-less) Repository Services DL Applications are built exploiting Repository Services Repository Services concentrate on physical management of Information objects Based on a Type-less Information Object Model Offer a set of primitives to Manage an Information Space of information objects: add, delete, update, search Manage metadata records: efficient storing (XML), indexing, mapping, harvesting, publishing, etc. Extra features: behaviors, communities, users… Historical reasons: Originally DLs were flat catalogues of pairs file-metadata or metadata only 12 DORSDL Workshop - 21th of September, 2006 Gaps of Type-less Repository Services DL Applications must “encode” Applications are hard to write, maintain, reuse, and extend No type safety and data integrity The notion of typed collection of information objects, seen as a collection of objects with the same structure - Prototypes address this problem (K. Saidis et Al, ECDL2006) The notion of methods (functionality) of the objects of a given typed collection The objects into the store are not aware of their type All the advantages of DB Systems are lost… 13 DORSDL Workshop - 21th of September, 2006 Gaps of Type-less Repository Services Component on Objects and Rep primitives Component: data Typed Collections and methods Application System Interface Information Object Model Objects Repository Service 14 DORSDL Workshop - 21th of September, 2006 Things have changed… DL Applications are becoming common DL specific issues arise, regarding both information spaces and functionalities Need for systematic approaches, in order to maximize reuse and minimize effort Systems for supporting DL-specific, customizable, and optimized functionalities to DL Applications designers and developers 15 DORSDL Workshop - 21th of September, 2006 Things have changed… Information Space Towards richer Information Object Models Collections of Complex objects: MF, files, relations, and behaviors Structured Objects: objects as a compound of other objects, e.g. photo albums, etc. Dynamic Objects: dynamic created content Object features: provenance and preservation Users-Objects relationship: copyrights, access rights, authentication, etc. Others… 16 DORSDL Workshop - 21th of September, 2006 Things have changed… Functionalities Towards system primitives User profiling User recommendations Object Versioning OAI-PMH Harvesting Virtual Object Collection management Others… 17 DORSDL Workshop - 21th of September, 2006 Our goal Design and develop a Typed Repository Service, along the line of DB Systems Typed Information Object Model (OO) Relational Model Type algebra Relational Algebra Collection <Type, Set of information objects> Table <Structure, Set of Records> 18 DORSDL Workshop - 21th of September, 2006 Typed Repository Service A type defines a set of objects with the same structure and the operations (methods) that can be applied to them A Collection is a named set of objects defined according to the type assigned to the Collection A Repository Service Instance is a set of Collections A Repository Service “exposes” to Applications components all Collections defined in its active Instance Applications can manage, search, and manipulate objects of Collection according to the methods (functionalities) exposed by the relative type 19 DORSDL Workshop - 21th of September, 2006 Typed Repository Service Component on Notes and Refs Articles Component on Articles Notes Refs Application System Interface Typed Information Object Model (Type algebra) DO1 A Type Collection DO2 N DO3 R Typed Repository Service 20 DORSDL Workshop - 21th of September, 2006 DL Type Algebra A Type is characterized by: A (possibly empty) set of type properties, i.e. attributes that depend on the Type features A (possibly empty) set of Metadata Fields (MF) describing all objects of the Type, to be defined by the DL Designer A Collection of a given Type offers the primitives (methods) to Search objects according to type properties Search objects according to the MF Add and Delete objects into and from the Collection 21 DORSDL Workshop - 21th of September, 2006 DL Type Algebra Coll ::= Name = Type, Coll | Name = Virtual(Q, Name) | Type ::= Raw(MF, FileFormats, behaviors) | Relation(MF, Type1, Type2, [1:1|1:n|n:n]) | Aggregation(MF, Type) | Union(Name1,…,Namen) | RawView(MF, FileFormats, behaviors) | Name | Others 22 DORSDL Workshop - 21th of September, 2006 Raw type: “ground” objects Raw(MF, file formats, behaviors(in, out)) Object methods Update MF Upload manifestation/change link Update behaviors Class methods Search by MF Search by full-text 23 DORSDL Workshop - 21th of September, 2006 Relation Type: “association” objects Relation(MF, T1,T2, [1:1|1:n | n:n ]) Object methods Update MF Update the two related objects Get the two related objects Class methods Add and Delete Relation Objects Search by MF Search objects related to a given object 24 DORSDL Workshop - 21th of September, 2006 Aggregation Type Hidden Relation Type used by Aggregation (MF’ and an ordering number) A(MF) B A = Aggregation(MF, MF’ + ordering, B) Object methods Add, remove object of B from aggregation Get aggregated objects Search through aggregated objects: by MF or by ordering Class methods Add and Delete Aggregation Objects Search by MF 25 DORSDL Workshop - 21th of September, 2006 Example: Annotations to Articles Articles = Raw(<Title, Author, Year>, PDF) Notes = Raw(<Date, Text, Author>) Anns = Relation(Articles, Notes, [n:n]) Applications can Add&Delete articles, notes, and annotations objects Given an article object A, reach its notes through Anns.getRelated(A) Search all notes inserted in a given period through Notes.search(“Date between x an y”) The store can Create specific indices for each MF format Create a full-text index for PDFs Find the best way to compress PDF and the MF formats available 26 DORSDL Workshop - 21th of September, 2006 Towards DL Systems MF mappings: managed by the administrators Behavior management: managed by administrators Consequences for storage optimization? Limited to file manipulators or more than that, i.e. Web Services? OAI-PMH publishing, harvesting, and aggregation Store distribution and organization Object navigation Include objects as values for Metadata Fields? Query language? 27 DORSDL Workshop - 21th of September, 2006 Towards DL Systems Component on Articles Component on Notes and Refs OAI-PMH Publishing – harvesting aggregating Articles DO1 A Notes DO2 N MF mappings Application Refs System Interface DO3 Typed Information Object Model (Type algebra) R Behaviors Typed Repository Service 28 DORSDL Workshop - 21th of September, 2006 OpenDLib Repository Service Rich Document Model: DoMDL Repository Service tailored to DoMDL Repository Services Can be configured to handle objects that respect a specific subset of DoMDL, thanks to T-DoMDL Export DoMDL information objects 29 DORSDL Workshop - 21th of September, 2006 Light T-DoMDL Coll ::= Name = Vs, Coll | Name = Virtual(Q,Name), Coll | Vs::= Version(A) | A A ::= Aggregation(T1,…,Tn) T ::= Raw[file formats] | A 30 DORSDL Workshop - 21th of September, 2006 Conclusions and future issues Experiment Next steps Motivation: Digital Libraries call for Systems Implementing T-DoMDL in OpenDLib Repository Service Support full type algebra Exploring query languages and storage optimization Experiment Repository Development Future: towards fully-fledged DL Systems Preservation OAI-PMH Harvesting and publishing User Rights Management (Collections) More… 31