DL-Application for the University Archive Jena

Download Report

Transcript DL-Application for the University Archive Jena

DL-Application for the
University Archive Jena
Ulrike Krönert, Mathias Hegner
FSU Jena
Overview
Project Goals
 Creation and Use of the Archive
 Archive Data Model
 Archive Class Library
 Current State of the Project
 Outlook

Project Goals
Archive is part of UrMEL
 Presentation of all FSU archive files
in the internet ?
 Digitization of files for the Weimar
classics foundation
 Workflow test in 2000, SeptemberNovember
 Write software for creating,
exploiting, and searching the archive

Creation and Use of the Archive

Digitization of (selected) files
 Description of the files by meta-data, files
become searchable
 Scientific exploitation (1): Creating
documents
 Scientific exploitation (2): Creating
dossiers on themes
Recording the Archive

36,000,000 pages, 600 dpi, >100 years
– 360 Tbyte of lossless compressed data
– 12 Tbytes of highly compressed data
 Digitization and microfilming in one step
 Description of the file by hand
 Automated page loader
– <holding name><file number><page
number>.<extension>
Exploiting the Archive in Two
Steps

Step 1: Describing documents, select
pages into documents
 Results: File as a heap of pages becomes
a folder of documents
 Step 2: Summarizing documents, single
pages or even whole files
 Results: Dossiers on selected themes
Data Model
Files containing documents
 Documents containing pages
 Pages containing the images (parts)
 Dossiers containing files,
documents, pages
 Generalized text search index for
files, documents, dossiers
 Note part for each object

File Data Model
Necessary attributes: archive name,
holding name, file number
 Additional attributes: origin, period,
size
 Parts: file title, keywords, comprise
note, remark
 Text index for all attributes and parts
 Folder: documents in file

Document Data Model
Attributes: page numbers, date,
document type, author
 Parts: reference, remark, co-authors
 Text index for all attributes and parts
 Folder: pages in document
 Help text index containing file pid
(for faster search)

Page and Dossier Data Models

Page
– Page number as an attribute
– JPEG-, TIFF- parts
– Help text index: document pid

Dossier
– Dossier title as an attribute
– Text index containing dossier title
– Folder: objects in dossier
Class Library for Archive
Archive object
 Object collection (*)
 Browser session (*)
 Servlet (FsuArchiv)
 Help classes: access rights, archive
user

Archive Objects
Object type (file, document, page,
dossier)
 Constructors for persistent and new
objects, Destructor
 open/ close, add/ delete
 get/ set attributes/ parts
 isFolder, hasParent, getIndexClass
 get parent/ items/ dossiers

Object Collections
add object(s)
 delete object(s)
 sort objects (e.g. by page numbers)
 manage collection pointer
 get archive object(s)

Browser Sessions
User managing by DL (*)
 login/ auto logout (after a deadtime)
 manage access rights
 get/ set session properties, e.g.
session number
 manage object collections in a
collections stack

Servlet Class (FsuArchiv)
query parametric/ text/ combined
 show object attributes and metaparts
 update object/ create new parts
 select digitized page into document
 add to/ remove from dossier
 delete object

Outlook
using SSL, encrypting password etc.
 fine-grained access rights for files
and dossiers
 manage payment for archive use
 multi-medial layout with animations,
images etc.
 Ergonomics of the HTML pages
