DigiTool 3.0 - Ingest and Loading

Download Report

Transcript DigiTool 3.0 - Ingest and Loading

Ingest and Loading
DigiTool Version 3.0
Ingest Agenda
Ingest Overview and Introduction
Ingest activity steps
Transformers
Task Chains
Upload of Files
Ingest Management
Ingest and Loading
2
DigiTool Modules
Deposit
Single
&
Bulk
Search
&
Index
Dispatcher
&
Viewers
Approval
Web Services
Ingest and Loading
3
Ingest Module
 Two main functions:
 Creation and submission of new ingest activities –
bulk and individual
 Monitoring of ingest status (scheduled, running
success etc.)
 Ingest activities can be initiated directly from the
Ingest application, by pre-defined templates for
manual/automatic ingest started in the Deposit Module
or potentially by FTP feed
Ingest and Loading
4
Ingest Main Functions
Ingest and Loading
5
Ingest Architecture
 One loader, multi transformers
 Transformer – takes objects and/or metadata as input,
and transforms it to the Repository digital entity
representation.
 Ingest activity is a workflow that combines a
certain transforming process and potential
background tasks, and is followed by the generic
loader.
 All loads (including batch) are processed as individual
digital entities to control loading errors.
Ingest and Loading
6
Example: Template-Based Transformer
Output
Input
Digital Entity
Transformer and
pre ingest tasks
Digital Entity
Digital Entity
Digital Entity
Digital Entity
Ingest and Loading
Digital Entity
Load to
Repository
7
Common Workflows
Ingest Overview and Introduction
Ingest activity steps
Transformers
Task Chains
Upload of Files
Ingest Management
Ingest and Loading
8
Ingest Activity
A typical workflow for submission of Ingest activity:
 1. Enter activity name and schedule time for running,
select type of transformer and determine the
background tasks to run as part of the ingest activity.
 2. Order/Select background tasks into a task chain
 3. Select Digital Entity template and select/verify
background task parameters
 4. Point to location of files or upload files
 5. Submit
Ingest and Loading
9
Common Workflows
Ingest Overview and Introduction
Ingest activity steps
Transformers
Task Chains
Upload of Files
Ingest Management
Ingest and Loading
10
Step 1 – Ingest Types - Transformers
1. File stream(s) that will be loaded with no relationships
2. File stream(s) that will become part of one parent record
3. File stream(s) utilizing the DigiTool file name convention
4. MARC XML file and associated file stream(s)
5. Dublin Core XML file and associated file stream(s)
6. Comma separated value (.csv) file
7. METS xml file and associated file stream(s)
8. Exported DigiTool repository elements for ingest/re-ingest
Ingest and Loading
11
Step 1 – Ingest Types - Transformers
1. File stream(s) that will be loaded with no relationships
Treats every file uploaded as a separate entity with no relationships.
Each formed record will be separate upon ingest to the repository.
2. File stream(s) that will become part of one parent record
Used to create relationships among the file(s) ingested. An additional
"parent" record will be added that allows navigation between the file(s)
loaded. Ultimately, each file will attain its own individual record, but
with the option of viewing the "parent" record which points to all of the
stream(s) loaded.
3. File stream(s) utilizing the DigiTool file name convention
Takes file stream(s) with filenames according to the DigiTool standard and based
on these filenames, automatically creates a hierarchical METS file for load into
the repository.
Ingest and Loading
12
Step 1 – Ingest Types - Transformers
4. MARC XML file and associated file stream(s)
Takes a standard MARCXML file as input and loads each metadata record as
a separate entity. The MARCXML file may contain links to file stream(s) –
local or remote - through the use of metadata tag placeholders which
would associate each file stream(s) with its MARC record.
5. Dublin Core XML file and associated file stream(s)
Takes a standard DCXML file as input and loads each metadata record as a
separate entity. The DCXML file may contain links to file stream(s) – local
or remote - through the use of metadata tag placeholders which would
associate each file stream(s) with its DC record.
6. Comma separated value (.csv) file
Takes a standard .csv file along with appropriate mapping information and
loads each row as a separate record. File stream(s) may also be uploaded as part of
this transformer’s workflow.
Ingest and Loading
13
Step 1 – Ingest Types - Transformers
7. METS xml file and associated file stream(s)
Takes a METS XML file as input and a decomposition into single atom units
ensues for proper ingest. The XML file may contain links to file stream(s)
local or remote and will be stored in the repository with all structural
relationships defined such that a recomposition takes place upon delivery
of this compound object.
8. Exported DigiTool repository elements for ingest/re-ingest
Takes digital entities that are already in the repository-recognized format
and allows their ingest/re-ingest back into the repository.
Ingest and Loading
14
Step 1 – Ingest Schedule and Assignment
Scheduling ingest assignment is a required portion
of any ingest activity. Options include:
- As soon as possible
- Specified time and date
With the appropriate privileges, the assignment to
other Staff users of the same Admin Unit can be
set. The default is for the assignment to the
logged-in staff user.
Please note: The “assigned to” staff user for any
ingest activity is the only one who can activate that
activity.
Ingest and Loading
15
Common Workflows
Ingest Overview and Introduction
Ingest activity steps
Transformers
Task Chains
Upload of Files
Ingest Management
Ingest and Loading
16
Step 1 – What is a task?
A task is an action to be performed on the
“transformed” digital entities and/or file
stream(s) before ultimately ingesting the entire
set of formed entities into the repository.
Ingest and Loading
17
Step 1 – Task Chain Initiation
 Template based – Server-side templates
representing a variety of pre-defined task chain
combinations.
 New task chain – Allows a tailor-built task chain to
be defined and ordered in Step 2 of the ingest
activity.
 User-defined task chain – User-saved and defined
task chain saved from a previous session. Any task
chain can be saved as a user-defined task chain.
Ingest and Loading
18
Available Task Chains
1. Empty Chain
2. Technical Metadata Extraction
3. Add Metadata
4. Control Section Attribute Assignment
5. Full Text Extraction
6. PDF Full Text Extraction
7. Add History Event
8. Tiff to JP2000 Converter
9. Remote Stream Download
10.Thumbnail Creation
Ingest and Loading
19
Available Task Chains
Empty Chain - No task chain will be applied.
Technical Metadata Extraction - For recognized file
stream(s), technical metadata will be extracted and
mapped into standard technical metadata.
Add Metadata - Allows the linking or copying of a single
metadata record which will be applied to all file stream(s)
part of the ingest activity.
Control Section Attribute Assignment - Allows digital
entity information to be defined on a one-by-one basis
that will be applied to all digital entities part of the ingest
activity.
Full Text Extraction - For recognized file stream(s), full
text will be extracted as the source object’s manifestation.
Ingest and Loading
20
Available Task Chains
PDF Full Text Extraction - For pdf file stream(s), full text
will be extracted as the source object’s manifestation.
Add History Event - Allows additional entries of change
history metadata to be added to the file stream(s) of an
ingest activity.
Tiff to JP2000 Converter - Takes tiff image(s) and creates a
JPEG2000 manifestation of the source image.
Remote Stream Download - Defines the storage of URL
stream(s) – either copied to local or remaining remote.
Thumbnail Creation - For recognized file stream(s), a
thumbnail image will be created as the source object’s
manifestation.
Ingest and Loading
21
Step 2 – Task Chain Definition and Order
 Allows staff user to pick and order the available
tasks for the ingest activity.
 Order of tasks is relevant for certain chains:
e.g. Thumbnail and Full Text before Technical
Metadata extractor
Ingest and Loading
22
Step 3 – Template and Task Chain Parameters
Choose Digital Entity template:
e.g. marc_simple_entity_with_stream.xml when
using the MARC transformer and wishing to load file
stream(s) with the MARC records.
NOTE: Digital Entity templates are sensitive to the
Transformer chosen in Step 1.
Set task parameters:
e.g. thumbnail height, width
text language encoding for full text indexing
MD insertion
etc….
Ingest and Loading
23
METS transformer - METS to D.E.
METS transformer Digital Entity
Mets Header
Control
Section
METS
FILE
dmd & amd Sections (DL
content if necessary)
Descriptive/
technical/rights/
Structural Map
Behavior/Struct Link
MD Section
File Section (URL editing)
For each file in File Sec
Preservation
File structure
MD
Linking
Digital Entity
Ingest and Loading
24
Common Workflows
Ingest Overview and Introduction
Ingest activity steps
Transformers
Task Chains
Upload of Files
Ingest Management
Ingest and Loading
25
Step 4 – Local Files
 Choose files for upload – Active-X plugin required:
Easy to use
Preview of icon/thumbnail during upload
 Send to server
 Preview/Manage files
Ingest and Loading
26
Step 4 – Remote Files (URL)
 Choose files for upload/linkage
URL can be entered 1 by 1 or batch from text list
* Download now
(Store URL file locally)
(Link to Remote location)
 Preview/Manage files
Ingest and Loading
27
Ingest Activity – File upload
Ingest and Loading
28
Common Workflows
Ingest Overview and Introduction
Ingest activity steps
Transformers
Task Chains
Upload of Files
Ingest Management
Ingest and Loading
29
Ingest folders
 Not scheduled – Ingest activities ready for activation
that are not scheduled.
 Scheduled – Ingest activities set for ingest at a
specified time and date.
 Running – Ingest activities that are actively running.
 Success – Ingest activities that have loaded
successfully.
 Failed – Ingest activities that have not loaded
successfully.
Ingest and Loading
30
Ingest Management
 Edit, Delete and Activation
 Monitoring log files –
Task list – Shows all background tasks performed
Task log – Full step by step log file for each ingest
step.
Task summary – Overview of major steps of the
ingest process – e.g. Pre-transformer, Transformer,
Ingest.
Ingest and Loading
31
Additional Functions
 Begin with upload of files before defining
tasks/definitions for ingest activity (for mass file
upload).
 Pre-transformer – Transforms file stream(s) and/or
metadata to the ingest-ready format so that a
transformer can be initiated. Currently, METS Zip input
from deposit is the only pre-transformer.
 Saving task chains to personal user profile for future
use.
Ingest and Loading
32
Thank you!
www.exlibrisgroup.com
Ingest and Loading
33