DL-Application for the University Archive Jena
Download
Report
Transcript DL-Application for the University Archive Jena
DL-Application for the
University Archive Jena
Ulrike Krönert, Mathias Hegner
FSU Jena
Overview
Project Goals
Creation and Use of the Archive
Archive Data Model
Archive Class Library
Current State of the Project
Outlook
Project Goals
Archive is part of UrMEL
Presentation of all FSU archive files
in the internet ?
Digitization of files for the Weimar
classics foundation
Workflow test in 2000, SeptemberNovember
Write software for creating,
exploiting, and searching the archive
Creation and Use of the Archive
Digitization of (selected) files
Description of the files by meta-data, files
become searchable
Scientific exploitation (1): Creating
documents
Scientific exploitation (2): Creating
dossiers on themes
Recording the Archive
36,000,000 pages, 600 dpi, >100 years
– 360 Tbyte of lossless compressed data
– 12 Tbytes of highly compressed data
Digitization and microfilming in one step
Description of the file by hand
Automated page loader
– <holding name><file number><page
number>.<extension>
Exploiting the Archive in Two
Steps
Step 1: Describing documents, select
pages into documents
Results: File as a heap of pages becomes
a folder of documents
Step 2: Summarizing documents, single
pages or even whole files
Results: Dossiers on selected themes
Data Model
Files containing documents
Documents containing pages
Pages containing the images (parts)
Dossiers containing files,
documents, pages
Generalized text search index for
files, documents, dossiers
Note part for each object
File Data Model
Necessary attributes: archive name,
holding name, file number
Additional attributes: origin, period,
size
Parts: file title, keywords, comprise
note, remark
Text index for all attributes and parts
Folder: documents in file
Document Data Model
Attributes: page numbers, date,
document type, author
Parts: reference, remark, co-authors
Text index for all attributes and parts
Folder: pages in document
Help text index containing file pid
(for faster search)
Page and Dossier Data Models
Page
– Page number as an attribute
– JPEG-, TIFF- parts
– Help text index: document pid
Dossier
– Dossier title as an attribute
– Text index containing dossier title
– Folder: objects in dossier
Class Library for Archive
Archive object
Object collection (*)
Browser session (*)
Servlet (FsuArchiv)
Help classes: access rights, archive
user
Archive Objects
Object type (file, document, page,
dossier)
Constructors for persistent and new
objects, Destructor
open/ close, add/ delete
get/ set attributes/ parts
isFolder, hasParent, getIndexClass
get parent/ items/ dossiers
Object Collections
add object(s)
delete object(s)
sort objects (e.g. by page numbers)
manage collection pointer
get archive object(s)
Browser Sessions
User managing by DL (*)
login/ auto logout (after a deadtime)
manage access rights
get/ set session properties, e.g.
session number
manage object collections in a
collections stack
Servlet Class (FsuArchiv)
query parametric/ text/ combined
show object attributes and metaparts
update object/ create new parts
select digitized page into document
add to/ remove from dossier
delete object
Outlook
using SSL, encrypting password etc.
fine-grained access rights for files
and dossiers
manage payment for archive use
multi-medial layout with animations,
images etc.
Ergonomics of the HTML pages