Transcript Slide 1

1st European Workshop on the use of information object
Repository Systems in Digital Libraries (DORSDL), in conjunction
with ECDL2006
Typing OpenDLib Repository
Service: Strengths of an Information
Object Type Language
Leonardo Candela, Donatella Castelli
Paolo Manghi, Pasquale Pagano
Centro Nazionale delle Ricerche Pisa, Italy
DORSDL Workshop - 21th of September, 2006
DB Systems: realizing a DB
Application
Application
System Interface
Typed Data Model
(Type Language)
DBMS
DBMS
3
DORSDL Workshop - 21th of September, 2006
DB Systems: type definition
Application
Managers
Projects
System Interface
Typed Data Model
(Type Language)
DBMS
DBMS
4
DORSDL Workshop - 21th of September, 2006
DB Systems: storage creation
Application
Projects
Managers
System Interface
Typed Data Model
(Type Language)
DBMS
M
P
DBMS
5
DORSDL Workshop - 21th of September, 2006
DB Systems: Application Usage
Component
on Managers and Projects
Projects
Managers
Application
System Interface
Typed Data Model
(Type Language)
DBMS
M
P
DBMS
6
DORSDL Workshop - 21th of September, 2006
DB Systems
Component
on Managers and Projects
Projects
Managers
Application
System Interface
Typed Data Model
(Type Language)
DBMS
D1
M
D2
P
DBMS
7
DORSDL Workshop - 21th of September, 2006
DB Systems: additions
Component
on Budgets
Component
on Managers and Projects
Projects Budgets
Managers
Application
System Interface
Typed Data Model
(Type Language)
D1
M
D2
P
D3
B
DBMS
8
DORSDL Workshop - 21th of September, 2006
Relational DB System
Component
on Managers and Projects
Table
Managers
Component
on Budgets
Table
Table
Projects Budgets
Application
System Interface
Relational Model
(SQL schema)
D1
TabM
D2
TabP
D3
TabB
Relational
DBMS
9
DORSDL Workshop - 21th of September, 2006
Typed Data Models: advantages

Application development and maintenance
 Functionality
and content are kept
independent from each other
 Type correctness: components must be typeconformant
 Modularity
 Reuse: component-wise and data wise
10
DORSDL Workshop - 21th of September, 2006
Typed Data Models: advantages

Type-driven physical storage
 Data
integrity: data can be handled according
to their associated structure only
 Type information can be exploited to optimize
space storage and access time
11
DORSDL Workshop - 21th of September, 2006
DL Applications and (Type-less)
Repository Services




DL Applications are built exploiting Repository Services
Repository Services concentrate on physical
management of Information objects
Based on a Type-less Information Object Model
Offer a set of primitives to




Manage an Information Space of information objects: add,
delete, update, search
Manage metadata records: efficient storing (XML), indexing,
mapping, harvesting, publishing, etc.
Extra features: behaviors, communities, users…
Historical reasons:

Originally DLs were flat catalogues of pairs file-metadata or
metadata only
12
DORSDL Workshop - 21th of September, 2006
Gaps of Type-less Repository Services

DL Applications must “encode”




Applications are hard to write, maintain, reuse, and
extend
No type safety and data integrity


The notion of typed collection of information objects, seen as a
collection of objects with the same structure - Prototypes
address this problem (K. Saidis et Al, ECDL2006)
The notion of methods (functionality) of the objects of a given
typed collection
The objects into the store are not aware of their type
All the advantages of DB Systems are lost…
13
DORSDL Workshop - 21th of September, 2006
Gaps of Type-less Repository Services
Component
on Objects
and Rep primitives
Component:
data
Typed Collections
and methods
Application
System Interface
Information
Object Model
Objects
Repository
Service
14
DORSDL Workshop - 21th of September, 2006
Things have changed…
DL Applications are becoming common
 DL specific issues arise, regarding both
information spaces and functionalities
 Need for systematic approaches, in order
to maximize reuse and minimize effort

 Systems
for supporting DL-specific,
customizable, and optimized functionalities to
DL Applications designers and developers
15
DORSDL Workshop - 21th of September, 2006
Things have changed…
Information Space

Towards richer Information Object Models
 Collections
of Complex objects: MF, files,
relations, and behaviors
 Structured Objects: objects as a compound of
other objects, e.g. photo albums, etc.
 Dynamic Objects: dynamic created content
 Object features: provenance and preservation
 Users-Objects relationship: copyrights,
access rights, authentication, etc.
 Others…
16
DORSDL Workshop - 21th of September, 2006
Things have changed…
Functionalities

Towards system primitives
 User
profiling
 User recommendations
 Object Versioning
 OAI-PMH Harvesting
 Virtual Object Collection management
 Others…
17
DORSDL Workshop - 21th of September, 2006
Our goal

Design and develop a Typed Repository
Service, along the line of DB Systems
Typed Information Object
Model (OO)
Relational Model
Type algebra
Relational Algebra
Collection
<Type, Set of information objects>
Table
<Structure, Set of Records>
18
DORSDL Workshop - 21th of September, 2006
Typed Repository Service





A type defines a set of objects with the same structure
and the operations (methods) that can be applied to
them
A Collection is a named set of objects defined according
to the type assigned to the Collection
A Repository Service Instance is a set of Collections
A Repository Service “exposes” to Applications
components all Collections defined in its active Instance
Applications can manage, search, and manipulate
objects of Collection according to the methods
(functionalities) exposed by the relative type
19
DORSDL Workshop - 21th of September, 2006
Typed Repository Service
Component
on Notes and Refs
Articles
Component
on Articles
Notes
Refs
Application
System Interface
Typed Information
Object Model
(Type algebra)
DO1
A
Type
Collection
DO2
N
DO3
R
Typed Repository
Service
20
DORSDL Workshop - 21th of September, 2006
DL Type Algebra

A Type is characterized by:
 A (possibly
empty) set of type properties, i.e.
attributes that depend on the Type features
 A (possibly empty) set of Metadata Fields (MF)
describing all objects of the Type, to be defined by the
DL Designer

A Collection of a given Type offers the primitives
(methods) to
 Search
objects according to type properties
 Search objects according to the MF
 Add and Delete objects into and from the Collection
21
DORSDL Workshop - 21th of September, 2006
DL Type Algebra
Coll ::= Name = Type, Coll
| Name = Virtual(Q, Name)
|
Type ::= Raw(MF, FileFormats, behaviors)
| Relation(MF, Type1, Type2, [1:1|1:n|n:n])
| Aggregation(MF, Type)
| Union(Name1,…,Namen)
| RawView(MF, FileFormats, behaviors)
| Name
| Others
22
DORSDL Workshop - 21th of September, 2006
Raw type: “ground” objects

Raw(MF, file formats, behaviors(in, out))
 Object
methods
Update MF
 Upload manifestation/change link
 Update behaviors

 Class
methods
Search by MF
 Search by full-text

23
DORSDL Workshop - 21th of September, 2006
Relation Type: “association” objects

Relation(MF, T1,T2, [1:1|1:n | n:n ])
 Object
methods
Update MF
 Update the two related objects
 Get the two related objects

 Class
methods
Add and Delete Relation Objects
 Search by MF
 Search objects related to a given object

24
DORSDL Workshop - 21th of September, 2006
Aggregation Type
Hidden Relation Type
used by Aggregation
(MF’ and an ordering number)
A(MF)
B
A = Aggregation(MF, MF’ + ordering, B)

Object methods




Add, remove object of B from aggregation
Get aggregated objects
Search through aggregated objects: by MF or by ordering
Class methods


Add and Delete Aggregation Objects
Search by MF
25
DORSDL Workshop - 21th of September, 2006
Example: Annotations to Articles
Articles = Raw(<Title, Author, Year>, PDF)
Notes = Raw(<Date, Text, Author>)
Anns = Relation(Articles, Notes, [n:n])

Applications can




Add&Delete articles, notes, and annotations objects
Given an article object A, reach its notes through
Anns.getRelated(A)
Search all notes inserted in a given period through
Notes.search(“Date between x an y”)
The store can



Create specific indices for each MF format
Create a full-text index for PDFs
Find the best way to compress PDF and the MF formats
available
26
DORSDL Workshop - 21th of September, 2006
Towards DL Systems

MF mappings: managed by the administrators


Behavior management: managed by administrators




Consequences for storage optimization?
Limited to file manipulators or more than that, i.e. Web Services?
OAI-PMH publishing, harvesting, and aggregation
Store distribution and organization
Object navigation


Include objects as values for Metadata Fields?
Query language?
27
DORSDL Workshop - 21th of September, 2006
Towards DL Systems
Component
on Articles
Component
on Notes and Refs
OAI-PMH
Publishing – harvesting
aggregating
Articles
DO1
A
Notes
DO2
N
MF mappings
Application
Refs
System Interface
DO3
Typed Information
Object Model
(Type algebra)
R
Behaviors
Typed Repository
Service
28
DORSDL Workshop - 21th of September, 2006
OpenDLib Repository Service
Rich Document Model: DoMDL
 Repository Service tailored to DoMDL
 Repository Services

 Can
be configured to handle objects that
respect a specific subset of DoMDL, thanks to
T-DoMDL
 Export DoMDL information objects
29
DORSDL Workshop - 21th of September, 2006
Light T-DoMDL
Coll ::= Name = Vs, Coll
| Name = Virtual(Q,Name), Coll
|
Vs::= Version(A) | A
A ::= Aggregation(T1,…,Tn)
T ::= Raw[file formats] | A
30
DORSDL Workshop - 21th of September, 2006
Conclusions and future issues

Experiment



Next steps




Motivation: Digital Libraries call for Systems
Implementing T-DoMDL in OpenDLib Repository Service
Support full type algebra
Exploring query languages and storage optimization
Experiment Repository Development
Future: towards fully-fledged DL Systems




Preservation
OAI-PMH Harvesting and publishing
User Rights Management (Collections)
More…
31