Transcript Slide 1
1st European Workshop on the use of information object
Repository Systems in Digital Libraries (DORSDL), in conjunction
with ECDL2006
Typing OpenDLib Repository
Service: Strengths of an Information
Object Type Language
Leonardo Candela, Donatella Castelli
Paolo Manghi, Pasquale Pagano
Centro Nazionale delle Ricerche Pisa, Italy
DORSDL Workshop - 21th of September, 2006
DB Systems: realizing a DB
Application
Application
System Interface
Typed Data Model
(Type Language)
DBMS
DBMS
3
DORSDL Workshop - 21th of September, 2006
DB Systems: type definition
Application
Managers
Projects
System Interface
Typed Data Model
(Type Language)
DBMS
DBMS
4
DORSDL Workshop - 21th of September, 2006
DB Systems: storage creation
Application
Projects
Managers
System Interface
Typed Data Model
(Type Language)
DBMS
M
P
DBMS
5
DORSDL Workshop - 21th of September, 2006
DB Systems: Application Usage
Component
on Managers and Projects
Projects
Managers
Application
System Interface
Typed Data Model
(Type Language)
DBMS
M
P
DBMS
6
DORSDL Workshop - 21th of September, 2006
DB Systems
Component
on Managers and Projects
Projects
Managers
Application
System Interface
Typed Data Model
(Type Language)
DBMS
D1
M
D2
P
DBMS
7
DORSDL Workshop - 21th of September, 2006
DB Systems: additions
Component
on Budgets
Component
on Managers and Projects
Projects Budgets
Managers
Application
System Interface
Typed Data Model
(Type Language)
D1
M
D2
P
D3
B
DBMS
8
DORSDL Workshop - 21th of September, 2006
Relational DB System
Component
on Managers and Projects
Table
Managers
Component
on Budgets
Table
Table
Projects Budgets
Application
System Interface
Relational Model
(SQL schema)
D1
TabM
D2
TabP
D3
TabB
Relational
DBMS
9
DORSDL Workshop - 21th of September, 2006
Typed Data Models: advantages
Application development and maintenance
Functionality
and content are kept
independent from each other
Type correctness: components must be typeconformant
Modularity
Reuse: component-wise and data wise
10
DORSDL Workshop - 21th of September, 2006
Typed Data Models: advantages
Type-driven physical storage
Data
integrity: data can be handled according
to their associated structure only
Type information can be exploited to optimize
space storage and access time
11
DORSDL Workshop - 21th of September, 2006
DL Applications and (Type-less)
Repository Services
DL Applications are built exploiting Repository Services
Repository Services concentrate on physical
management of Information objects
Based on a Type-less Information Object Model
Offer a set of primitives to
Manage an Information Space of information objects: add,
delete, update, search
Manage metadata records: efficient storing (XML), indexing,
mapping, harvesting, publishing, etc.
Extra features: behaviors, communities, users…
Historical reasons:
Originally DLs were flat catalogues of pairs file-metadata or
metadata only
12
DORSDL Workshop - 21th of September, 2006
Gaps of Type-less Repository Services
DL Applications must “encode”
Applications are hard to write, maintain, reuse, and
extend
No type safety and data integrity
The notion of typed collection of information objects, seen as a
collection of objects with the same structure - Prototypes
address this problem (K. Saidis et Al, ECDL2006)
The notion of methods (functionality) of the objects of a given
typed collection
The objects into the store are not aware of their type
All the advantages of DB Systems are lost…
13
DORSDL Workshop - 21th of September, 2006
Gaps of Type-less Repository Services
Component
on Objects
and Rep primitives
Component:
data
Typed Collections
and methods
Application
System Interface
Information
Object Model
Objects
Repository
Service
14
DORSDL Workshop - 21th of September, 2006
Things have changed…
DL Applications are becoming common
DL specific issues arise, regarding both
information spaces and functionalities
Need for systematic approaches, in order
to maximize reuse and minimize effort
Systems
for supporting DL-specific,
customizable, and optimized functionalities to
DL Applications designers and developers
15
DORSDL Workshop - 21th of September, 2006
Things have changed…
Information Space
Towards richer Information Object Models
Collections
of Complex objects: MF, files,
relations, and behaviors
Structured Objects: objects as a compound of
other objects, e.g. photo albums, etc.
Dynamic Objects: dynamic created content
Object features: provenance and preservation
Users-Objects relationship: copyrights,
access rights, authentication, etc.
Others…
16
DORSDL Workshop - 21th of September, 2006
Things have changed…
Functionalities
Towards system primitives
User
profiling
User recommendations
Object Versioning
OAI-PMH Harvesting
Virtual Object Collection management
Others…
17
DORSDL Workshop - 21th of September, 2006
Our goal
Design and develop a Typed Repository
Service, along the line of DB Systems
Typed Information Object
Model (OO)
Relational Model
Type algebra
Relational Algebra
Collection
<Type, Set of information objects>
Table
<Structure, Set of Records>
18
DORSDL Workshop - 21th of September, 2006
Typed Repository Service
A type defines a set of objects with the same structure
and the operations (methods) that can be applied to
them
A Collection is a named set of objects defined according
to the type assigned to the Collection
A Repository Service Instance is a set of Collections
A Repository Service “exposes” to Applications
components all Collections defined in its active Instance
Applications can manage, search, and manipulate
objects of Collection according to the methods
(functionalities) exposed by the relative type
19
DORSDL Workshop - 21th of September, 2006
Typed Repository Service
Component
on Notes and Refs
Articles
Component
on Articles
Notes
Refs
Application
System Interface
Typed Information
Object Model
(Type algebra)
DO1
A
Type
Collection
DO2
N
DO3
R
Typed Repository
Service
20
DORSDL Workshop - 21th of September, 2006
DL Type Algebra
A Type is characterized by:
A (possibly
empty) set of type properties, i.e.
attributes that depend on the Type features
A (possibly empty) set of Metadata Fields (MF)
describing all objects of the Type, to be defined by the
DL Designer
A Collection of a given Type offers the primitives
(methods) to
Search
objects according to type properties
Search objects according to the MF
Add and Delete objects into and from the Collection
21
DORSDL Workshop - 21th of September, 2006
DL Type Algebra
Coll ::= Name = Type, Coll
| Name = Virtual(Q, Name)
|
Type ::= Raw(MF, FileFormats, behaviors)
| Relation(MF, Type1, Type2, [1:1|1:n|n:n])
| Aggregation(MF, Type)
| Union(Name1,…,Namen)
| RawView(MF, FileFormats, behaviors)
| Name
| Others
22
DORSDL Workshop - 21th of September, 2006
Raw type: “ground” objects
Raw(MF, file formats, behaviors(in, out))
Object
methods
Update MF
Upload manifestation/change link
Update behaviors
Class
methods
Search by MF
Search by full-text
23
DORSDL Workshop - 21th of September, 2006
Relation Type: “association” objects
Relation(MF, T1,T2, [1:1|1:n | n:n ])
Object
methods
Update MF
Update the two related objects
Get the two related objects
Class
methods
Add and Delete Relation Objects
Search by MF
Search objects related to a given object
24
DORSDL Workshop - 21th of September, 2006
Aggregation Type
Hidden Relation Type
used by Aggregation
(MF’ and an ordering number)
A(MF)
B
A = Aggregation(MF, MF’ + ordering, B)
Object methods
Add, remove object of B from aggregation
Get aggregated objects
Search through aggregated objects: by MF or by ordering
Class methods
Add and Delete Aggregation Objects
Search by MF
25
DORSDL Workshop - 21th of September, 2006
Example: Annotations to Articles
Articles = Raw(<Title, Author, Year>, PDF)
Notes = Raw(<Date, Text, Author>)
Anns = Relation(Articles, Notes, [n:n])
Applications can
Add&Delete articles, notes, and annotations objects
Given an article object A, reach its notes through
Anns.getRelated(A)
Search all notes inserted in a given period through
Notes.search(“Date between x an y”)
The store can
Create specific indices for each MF format
Create a full-text index for PDFs
Find the best way to compress PDF and the MF formats
available
26
DORSDL Workshop - 21th of September, 2006
Towards DL Systems
MF mappings: managed by the administrators
Behavior management: managed by administrators
Consequences for storage optimization?
Limited to file manipulators or more than that, i.e. Web Services?
OAI-PMH publishing, harvesting, and aggregation
Store distribution and organization
Object navigation
Include objects as values for Metadata Fields?
Query language?
27
DORSDL Workshop - 21th of September, 2006
Towards DL Systems
Component
on Articles
Component
on Notes and Refs
OAI-PMH
Publishing – harvesting
aggregating
Articles
DO1
A
Notes
DO2
N
MF mappings
Application
Refs
System Interface
DO3
Typed Information
Object Model
(Type algebra)
R
Behaviors
Typed Repository
Service
28
DORSDL Workshop - 21th of September, 2006
OpenDLib Repository Service
Rich Document Model: DoMDL
Repository Service tailored to DoMDL
Repository Services
Can
be configured to handle objects that
respect a specific subset of DoMDL, thanks to
T-DoMDL
Export DoMDL information objects
29
DORSDL Workshop - 21th of September, 2006
Light T-DoMDL
Coll ::= Name = Vs, Coll
| Name = Virtual(Q,Name), Coll
|
Vs::= Version(A) | A
A ::= Aggregation(T1,…,Tn)
T ::= Raw[file formats] | A
30
DORSDL Workshop - 21th of September, 2006
Conclusions and future issues
Experiment
Next steps
Motivation: Digital Libraries call for Systems
Implementing T-DoMDL in OpenDLib Repository Service
Support full type algebra
Exploring query languages and storage optimization
Experiment Repository Development
Future: towards fully-fledged DL Systems
Preservation
OAI-PMH Harvesting and publishing
User Rights Management (Collections)
More…
31