Code4Lib 2011 Bloomington IN, February 7, 2011 Creating a New JHOVE2 Format Module Sheila Morrissey Portico.

Download Report

Transcript Code4Lib 2011 Bloomington IN, February 7, 2011 Creating a New JHOVE2 Format Module Sheila Morrissey Portico.

Code4Lib 2011
Bloomington IN, February 7, 2011
Creating a New JHOVE2 Format
Module
Sheila Morrissey
Portico
The preservation problem
Managing the gap between what you were given
and what you need
– That gap is only manageable if it is quantifiable
– Characterization tells you what you have, as a stable
starting point for iterative preservation planning
and action
Characterization
Preservation
action
Preservation
planning
Adopted from A. Brown, “Developing Practical Approaches to Active Preservation,” IJDC 2:1 (June 2007): 3-11.
“What? So what?”
Characterization is the automated determination
of the intrinsic and extrinsic properties of a
formatted object
– Identification
– Feature extraction
– Validation
– Assessment
Determining the presumptive format of a
digital object based on suggestive extrinsic
hints and intrinsic signatures
Reporting the intrinsic properties of an
object significant for classification,
analysis, and planning
Supported formats
JHOVE2 can identify (by DROID) many more
formats than it can validate (by modules)
– PRONOM registry documents over 550 “formats”
http://www.nationalarchives.gov.uk/PRONOM
Supported formats
ICC color profile
JPEG 2000
PDF
SGML
Shapefile
TIFF
(ICC.1:2004-10)
JP2 (ISO/IEC 15444-1), JPX (ISO/IEC 15444-2)
PDF 1.0 – 1.7, ISO 3200-1, PDF/A-1 (ISO 19005-1), PDF/X-1
(ISO 15920-1), -1a (ISO 15930-4), -2 (ISO 15930-5) -3 (ISO 15930-6)
Main, Index, dBASE, …
TIFF 4 – 6, Class B, F, G, P, R, Y, TIFF/EP (ISO 12234-2),
TIFF/IT (ISO 12639), GeoTIFF, Exif (JEITA CP-3451), DNG
UTF-8
WAVE
XML
Zip
ASCII (ANSI X3.4)
BWF (EBU N22-1997)
Contributed format modules
From Wegener Institute (http://www.awi-potsdam.de)
– netCDF
– Grib
From NationalbibliothekBibliothèque nationale de
France (BnF) (http://www.bnf.fr/fr/acc/x.accueil.html)
– arc
– gzip
YOU!!!
– ???
Characterization strategy
1. Identify format (if not previously identified)
2. Dispatch to appropriate format module
a) Extract format features and validate
– If a nested source unit is found, process
recursively…
b) Validate format profiles (if registered)
3. Assess
4. If unitary source unit, calculate message digests (optional)
5. If an aggregate source unit, try to identify aggregate
format, and if successful, process recursively…
Characterization strategy
directory/
abc.shp
abc.shx
abc.dbf
abc.tif
xyz.pdf
Characterization strategy
directory/
abc.shp
abc.shx
abc.dbf
abc.tif
xyz.pdf
Main
Index
dBASE
GeoTIFF
PDF
Characterization strategy
directory/
Shapefile
clump
abc.shp
abc.shx
abc.dbf
Main
Index
dBASE
abc.tif
xyz.pdf
GeoTIFF
PDF
Characterization strategy
directory/
“GIS object”
clump
xyz.pdf
PDF
Shapefile
abc.tif
clump
GeoTIFF
abc.shp
abc.shx
abc.dbf
Main
Index
dBASE
API design idioms
Separation of concerns
– Annotation and reflection
confluence.ucop.edu/display/JHOVE2Info/Background+Papers
Inversion of control (IOC) / dependency
injection
– Martin Fowler
martinfowler.com/articles/injection.html
– Spring framework
www.springsource.org/
Separation of concerns
“Let POJOs be POJOs”
– Focus on modeling the format itself
“Let the code write itself”
– Reportables “know” how to expose their
properties for display
– Reference documentation generated from the
code
Annotation and Reflection:
Reportable properties
Each reportable property is represented by a field and
accessor and mutator methods
The accessor method must be marked with the
@ReportableProperty annotation
public class MyReportable
implements Reportable
{
protected String myProperty;
@ReportableProperty(order=1, desc=“description”, ref=“reference”)
public String getMyProperty() {
return this.myProperty;
}
public void setMyProperty(String property) {
this.myProperty = property;
}
}
Dependency injection
All JHOVE2 function is embodied in pluggable
modules
– Flexible customization
 Re-sequencing of pre-existing modules
– Easy extensibility




Additional format modules and profiles
Additional aggregate identifiers
Additional displayers
New behaviors
RenderabilityModule
JHOVE2 framework
Embodiment of a characterization strategy as a
configurable sequence of command-invoked modules
public void characterize(Source source, Input input)
throws IOException, JHOVE2Exception
{
source.getTimerInfo().setStartTime();
/* Update summary counts of source units, by type. */
this.sourceCounter.incrementSourceCounter(source);
for (Command command : this.commands){
TimerInfo time2 = command.getTimerInfo();
time2.resetStartTime();
try {
command.execute(this, source, input);
}
finally {
time2.setEndTime();
}
}
source.getTimerInfo().setEndTime();
}
Characterization
Creating a New Format Module:
What are the deliverables?
•
•
•
•
Source code
Configuration files
Sample (test) files
Documents
Format Module Artifacts:
Source Code
• Module classes
– Module (extends
org.jhove2.module.format.BaseFormatModule)
– Profiles (extend org.jhove2.module.format.
AbstractFormatProfile) as required by format
– Supporting classes expressing format content
model as required by format
• Test classes
– JUnit test(s)
Format Module Artifacts
Configuration Files
• Spring IOC Bean XML configuration files,
• For Module
• For unit test as needed
• For Assessment criteria
• Messages properties file additions if needed
• Properties files
•
•
•
Displayer
Units of measure
Module-specific
Format Module Artifacts:
Sample (Test) Files
–Sample files used in unit test
• Valid files
• Invalid files to exercise validity constraints
Format Module Artifacts:
Documentation
• Module Specification Document
See examples on the JHOVE2 wiki “Modules Documents” page
<https://bitbucket.org/jhove2/main/wiki/Module>
Format Module Artifacts List
New CSV Format Module
Source code
src/main/java/org/jhove2/module/format/csv/CsvModule.java
src/test/java/org/jhove2/module/format/csv/CsvModuleTest.java
Configuration files
Spring
config/spring/module/format/csv/jhove2-csv-config.xml
config/spring/module/assess/jhove2-ruleset-csv-config.xml
src/test/resources/config/module/format/csv/test-config.xml
Messages
config/messages/jhove2_message.properties (update, not new)
Display
config/properties/module/displayer/org/jhove2/module/format/csv/CsvModule_displayer.properties
config/properties/module/units/org/jhove2/module/format/csv/CsvModule_unitproperties (optional)
Module-specific properties files
config/properties/module/format/csv/csv.properties (optional, implementation-determined)
Test File(s)
src/test/resources/examples/csv/goodFile.csv
src/test/resources/examples/csv/badFile01.csv
src/test/resources/examples/csv/badFile02.csv
….
Documentation
CSV Module specification document: Jhove2 wiki
Format Module Artifacts:
The Good News
• Generate module and profile from interfaces and base
classes via inheritance
– Classes reflect format’s own content model: cross-cutting “JHOVE2”
concerns handled via annotation (persistence, serialization,
generation of JHOVE2 identifiers for reportable properties)
• Template for Spring XML Module configuration files
• Utilities to generate
– Displayer properties files
– Units of measure properties files
– XML assessment configuration file
• Utilities for specification document
– Script to generate tabular content for specification document
– Macro to import utility-generated tabular content
Format Module:
Research and Analysis
• Format Definition
–
–
–
–
–
–
–
(org.jhove2.core.format.Format)
Names
Type (format/family)
Ambiguity (ambiguous/unambiguous)
Identifiers
Specifications
Validity (comprehensive/selective)
Profiles (none)
• Significant (Reportable) properties
(org.jhove2.module.format.csv.CsvFormatModule)
Format Definition:
CSV Names
• JHOVE2 canonical name
– Comma Separated Values
• Format aliases
– CSV
– DSV
Might already be defined in config/spring/module/format/jhove2-otherFormats-config.xml
Format Definition :
CSV Formal Identifiers
• JHOVE2 identifier (see org.jhove2.core.I8R$Namespace)
–
[JHOVE2] http://jhove2.org/terms/format/csv
• PRONOM (PUID) identifier (used by DROID)
–
[PUID] x-fmt/18
• MIME type identifier
–
[MIME] text/csv
• RFC identifer
– [RFC] text/csv
• Other identifiers in other namespaces (see org.jhove2.core.I8R$Namespace)
Might already be defined in config/spring/module/format/jhove2-otherFormats-config.xml
If you are not using DROID, then you MUST have the identifier(s) from the namespace of your identification tool
Format Definition :
CSV Formal Identifiers in Spring
<!– Comma Separated Values JHOVE2 identifier bean -->
<!-(canonical identifier in JHOVE2 namespace)
-->
<!– Single constructor arg defaults to JHOVE2 namespace -->
<bean id="CommaSeparatedValuesIdentifier" class="org.jhove2.core.I8R"
scope="singleton">
<constructor-arg type="java.lang.String"
value="http://jhove2.org/terms/format/csv"/>
</bean>
<!– Comma Separated Values PUID identifier bean -->
<!-- (canonical identifier in PRONOM namespace (used by DROID identifier tool)
-->
<bean id="CommaSeparatedValuesPUID1" class="org.jhove2.core.I8R"
scope="singleton">
<constructor-arg type="java.lang.String”value="x-fmt/18"/>
<constructor-arg type="org.jhove2.core.I8R$Namespace"
value="PUID"/>
</bean
Format Definition :
CSV Formal Identifiers in Spring
<!–- Comma Separated Values MIME type aliasIdentifier bean -->
<bean id="CommaSeparatedValuesMIMEType" class="org.jhove2.core.I8R"
scope="singleton">
<constructor-arg type="java.lang.String" value="text/csv"/>
<constructor-arg type="org.jhove2.core.I8R$Namespace" value="MIME"/>
</bean>
<!–- Comma Separated Values RFC aliasIdentifier bean-->
<bean id="CommaSeparatedValuesRFC4180" class="org.jhove2.core.I8R"
scope="singleton">
<constructor-arg type="java.lang.String" value="RFC 4180"/>
<constructor-arg type="org.jhove2.core.I8R$Namespace" value="RFC"/>
</bean>
Format Definition :
CSV Specifications
• For CSV, many variants
• Closest document to a format spec is RFC
– RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt)
Format Definition :
CSV Specification in Spring
<bean id=“CsvSpec" class="org.jhove2.core.Document" scope="singleton">
<constructor-arg type="java.lang.String"
value=“RFC 4180 Common Format and MIME Type for CSV Files"/>
<constructor-arg type="org.jhove2.core.Document$Type" value="Specification"/>
<constructor-arg type="org.jhove2.core.Document$Intention" value="Authoritative"/>
<property name="author" value=“Y. Shafranovich"/>
<property name="date" value=“October 2005"/>
<property name="identifiers">
<list value-type="org.jhove2.core.I8R">
<ref bean=" CsvSpecificationURI "/>
</list>
</property>
<property name="publisher" value="The Internet Engineering Task Force (IETF)"/>
</bean>
<!–- CSV RFC specification URI bean -->
<bean id=“CsvSpecificationURI" class="org.jhove2.core.I8R" scope="singleton">
<constructor-arg type="java.lang.String"
value=“http://www.ietf.org/rfc/rfc4180.txt"/>
<constructor-arg type="org.jhove2.core.I8R$Namespace" value="URI"/>
</bean>
Format Definition :
CSV Format Bean Definition in Spring
<!--
Bean for the JHOVE2 Comma Separated Values Format Bean -->
<bean id="CommaSeparatedValuesFormat" class="org.jhove2.core.format.Format"
scope="singleton">
<constructor-arg type="java.lang.String" value="Comma Separated Values"/>
<constructor-arg ref="CommaSeparatedValuesIdentifier"/>
<constructor-arg type="org.jhove2.core.format.Format$Type"
value="Format"/>
<constructor-arg type="org.jhove2.core.format.Format$Ambiguity"
value="Unambiguous"/>
<property name="aliasIdentifiers">
<set value-type="org.jhove2.core.I8R">
<ref bean="CommaSeparatedValuesIdentifier"/>
<ref bean="CommaSeparatedValuesPUID1"/>
<ref bean="CommaSeparatedValuesMIMEType"/>
<ref bean="CommaSeparatedValuesRFC4180"/>
</set>
</property>
<property name="aliasNames">
<set>
<value>CSV</value>
<value>DSV</value>
</set>
</property>
<property name="specifications">
<list value-type="org.jhove2.core.Document">
<ref bean="CsvSpec"/>
</list>
</property>
</bean>
Format Module:
Format Module Recipe
•
•
•
•
•
•
•
•
•
•
•
Create package
Place in inheritance hierarchy
Enforce persistence requirements
Populate static (non-user-configurable) fields
Implement 2-argument constructor
Create module’s Spring Bean
Define reportable properties and associated methods
Annotate reportable properties accessors
Configure Message properties file
Override parse() method
Implement Validator interface methods
Format Module:
Create Package
• Package
– org.jhove2.module.format.csv
Format Module:
Inheritance Hierarchy
• Inheritance
– Extends
org.jhove2.module.format.BaseFormatModule
– Implements
org.jhove2.module.format.Validator
Format Module:
Persistence requirements
• Module must be annotated with the BerkeleyDBJE
@Persistent annotation
• Module must have a 0-argument constructor
• Module should not contain any non-static nested (inner)
classes
Annotate Reportable Properties
• Module field type must be
– “simple” Java type or
– Persistent type or
– Have a
com.sleepycat.persist.model.PersistentProxy
implementation created for it in package
org.jhove2.persist.berkeleydpl.proxies
Format Module:
Persistence requirements
import com.sleepycat.persist.model.Persistent;
// Berkeley DB JE annotation
@Persistent
public class CsvModule
extends BaseFormatModule
implements Validator
{
/**
* No-arg constructor required by persistence layer
*/
@SuppressWarnings("unused")
private CsvModule() {
this(null, null);
}
…
Format Module:
Non-configurable fields
@Persistent
public class CsvModule
extends BaseFormatModule
implements Validator
{
/** Directory module version identifier. */
public static final String VERSION = "n.n.n";
/** Directory module release date. */
public static final String RELEASE = "yyyy-mm-dd";
/** Directory module rights statement. */
public static final String RIGHTS = "Copyright YYYY by
+ "Copyright holder name "
+ "Available under the terms of the BSD license.";
/** Module validation coverage. */
public static final Coverage COVERAGE = Coverage.Inclusive;
/** CSV validation status. */
protected Validity validity;
"
Format Module:
Two-argument Constructor
/**
* @param format
* @param formatModuleAccessor
*/
public CsvModule(Format format, FormatModuleAccessor
formatModuleAccessor) {
super(VERSION, RELEASE, RIGHTS, format,
formatModuleAccessor);
this.validity = Validity.Undetermined;
}
…
Format Module:
Spring Bean
<bean id="CSVModule" class="org.jhove2.module.format.csv.CsvModule"
scope="prototype">
<constructor-arg ref="CommaSeparatedValuesFormat"/>
<!–- persistence manger bean ref; same for all format modules =
<constructor-arg ref="FormatModuleAccessor"/>
<property name="developers">
<list value-type="org.jhove2.core.Agent">
<ref bean="CSVAgent"/>
</list>
</property>
</bean>
<!–- Module author bean -
<bean id="CSVAgent" class="org.jhove2.core.Agent" scope="singleton">
<constructor-arg type="java.lang.String" value="CSV Author Name"/>
<constructor-arg type="org.jhove2.core.Agent$Type"
value=“Personal"/> <!-- Personal or Corporate -
<property name="URI" value="http://www.csvagent.org/"/>
</bean>
Format Module:
Reportable Properties:CSV Base Definition
file = [header CRLF] record *(CRLF record) [CRLF]
header = name *(COMMA name)
record = field *(COMMA field)
name = field
field = (escaped / non-escaped)
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
non-escaped = *TEXTDATA
COMMA = %x2C
CR = %x0D ;as per section 6.1 of RFC 2234 [2]
DQUOTE = %x22 ;as per section 6.1 of RFC 2234 [2]
LF = %x0A ;as per section 6.1 of RFC 2234 [2]
CRLF = CR LF ;as per section 6.1 of RFC 2234 [2]
TEXTDATA = %x20-21 / %x23-2B / %x2D-7E
From RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt)
Format Module:
Reportable Properties: CSV Complications
•
•
•
•
Delimiter character might be “;” instead of “,”
EOL might be “\n” instead of “\r\n”
EOL might be embedded in contents of field
Different implementations escape the escape character differently
– “” vs. \”
•
•
•
•
•
Last record in file might not have EOL
All records might not have same number of fields
Some implementations trim leading/trailing whitespace in escaped fields
Some implementations allow characters other than ASCII-printable characters
No syntactic way to detect if first record is “header” record
Format Module:
CSV Reportable Properties
•
•
•
•
•
•
Delimiter character
EOL character(s)
Escape character
Escape character sequence within field
Number of records
Number of fields
–
–
–
–
•
•
•
•
•
•
First record
Max
Min
Per record
Field names from header row
Count of records with embedded EOL
Count of records with embedded escape characters
Count of records with leading/trailing whitespace in escaped fields
Does last record in file have EOL?
Does file contain characters other than ASCII-printable ones?
Format Module:
CSV Reportable Properties
• Add significant properties as protected fields to
module class
– Might need to create ancillary @Persistent class to
reflect model of format
– Class should extend
org.jhove2.core.reportable.AbstractReportable
• Create public accessors for those fields
• Annotate accessors with
@ReportableProperty annotation
Format Module:
Reportable Properties: Fields
// Add significant properties as protected fields
protected String delimiterCharacter;
protected String eolString;
protected String escapeCharacter;
protected String escapeCharacterSequenceWithinField;
protected int recordCount;
protected int fieldCountFirstRecord;
protected int fieldCountMax;
protected int fieldCountMin;
protected List<Integer> fieldsPerRecord;
protected List<String> fieldNames;
protected int recordsWithEmbeddedEolCount;
protected int recordsWithEmbeddedEscapeCharCount;
protected int recordsWithUntrimmedWhitespaceCount;
protected boolean eolInLastRecord;
protected boolean containsNonAsciiPrintableChars;
Format Module:
Reportable Properties: Accessors
// Create public accessors for reportable properties fields
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
String getDelimiterCharacter() {...}
String getEolString() {...}
String getEscapeCharacter() {...}
String getEscapeCharacterSequenceWithinField() {...}
int getRecordCount() {...}
int getFieldCountFirstRecord() {...}
int getFieldCountMax() {...}
int getFieldCountMin() {...}
List<Integer> getFieldsPerRecord() {...}
List<String> getFieldNames() {...}
int getRecordsWithEmbeddedEolCount() {...}
int getRecordsWithEmbeddedEscapeCharCount() {...}
int getRecordsWithUntrimmedWhitespaceCount() {...}
boolean isEolInLastRecord() {...}
boolean isContainsNonAsciiPrintableChars() {...}
Format Module:
Reportable Properties: Annotation
public @interface ReportableProperty {
/** Default description and reference value. */
public static final String DEFAULT = "Not available.";
/**
* Property type: raw or descriptive. A raw property reports itself in the exact form that was found
* in the source unit; a descriptive property reports itself in a more human-readable form.
*/
public enum PropertyType {Default, Raw, Descriptive}
/**
* Ordinal position of this property relative to all properties directly defined in a class.
*/
public int order() default 1;
/**
* Property reference, a citation to an external source document that defines the property.
*/
public String ref() default DEFAULT;
/** Property type: raw or descriptive. */
public PropertyType type() default PropertyType.Default;
/** Property description. */
public String value() default DEFAULT;
}
Format Module:
Reportable Properties: Annotation
@ReportableProperty(
order=10,
value="Character used to delimit fields in
record.",
ref="RFC 1480, Section 2, paragraph 4")
public String getDelimiterCharacter() {
return delimiterCharacter;
}
Format Module:
Reportable Message Properties
import org.jhove2.core.Message;
…
// (Reportable) Message properties
protected Message
delimiterCharNotFoundMessage;
Format Module:
Configure Message Properties File
#
#
###########################################################################
#
Message templates for class org.jhove2.module.format.csv.CsvModule
# ########################################################################
#
org.jhove2.module.format.csv.CsvModule.DelimitorChar
acterNotFoundMessage=No occurrence of delimiter
character {0} found in source
#
Added to file config/messages/jhove2_messages.properities
Format Module:
Message Creation
Object[]messageArgs = new
Object[]{csvDelimiterChar};
delimiterCharNotFoundMessage = new Message(
Severity.WARNING,
Context.OBJECT,
"org.jhove2.module.format.csv.CsvModule.DelimitorCha
racterNotFoundMessage",
messageArgs,
jhove2.getConfigInfo());
Format Module:
Override Parse() method
/**
*
*
*
*
*
*
*
*
Parse a source unit.
@param jhove2 JHOVE2 framework
@param sourceunit
@param input CSV source input
@return Number of bytes consumed
@throws EOFException
@throws IOException
@throws JHOVE2Exception
*/
@Override
public long parse(JHOVE2 jhove2, Source source, Input input)
throws IOException, JHOVE2Exception
{
// where the real work happens
//
parse the Source (take care of those CSV complications!!)
//
populate reportable properties
//
construct any Error, Warning, or Info messages
return 0;
}
Format Module:
Override Parse() method
Some Implementation Choices:
• Write from scratch
–
–
–
–
TIFF
WAV
UTF-8
ICC
• Wrap existing JAVA library
– XML
– Beware of persistence traps: Inner classes, non-persistable fields
• Wrap existing non-JAVA library
– SGML
– Beware of performances hits (shell out) or memory leaks (JNI)
Format Module:
Implement Validator methods
/* (non-Javadoc)
* @see org.jhove2.module.format.Validator#getCoverage()
*/
@Override
public Coverage getCoverage() {
return this.COVERAGE;
}
/* (non-Javadoc)
* @see org.jhove2.module.format.Validator#isValid()
*/
@Override
public Validity isValid() {
return this.validity;
}
Format Module:
Implement Validator methods
/* (non-Javadoc)
* @see
org.jhove2.module.format.Validator#validate(org.jhove2.core.JHO
VE2, org.jhove2.core.source.Source, org.jhove2.core.io.Input)
*/
@Override
public Validity validate(JHOVE2 jhove2, Source source, Input
input)
throws JHOVE2Exception {
//Parse might already have set validity; if not; test
//reportable fields values and set
if (this.validity.equals(Validity.Undetermined)){
//...
}
return this.validity;
}
Format Module:
Unit Test
• JUnit 4
• Important to include both good and bad
sample files
Format Module:
Unit Test
package org.jhove2.module.format.csv;
import static org.junit.Assert.*;
import org.junit.Before;
import org.junit.Test;
public class CsvModuleTest {
@Before
public void setUp() throws Exception {
}
@Test
public void testValidate() {
fail("Not yet implemented");
}
@Test
public void testParse() {
fail("Not yet implemented");
}
}
Format Module:
Unit Test: Where it Goes
Unit tests:
src/test/java/org/jhove2/module/format/csv
Sample (test) files
src/test/resources/examples/csv
Spring beans for unit tests:
src/test/resources/config/module/format/csv
– Update Spring configuration file filepaths-config.xml with
base path of your sample file
<bean id="csvDirBasePath" class="java.lang.String" >
<constructor-arg type="java.lang.String"
value="examples/csv/"/>
</bean>
Format Module Artifacts:
What’s Left?
•
•
•
•
Source code
Configuration files
Sample (test) files
Documents
Format Module Artifacts
Configuration Files
• Spring IOC Bean XML configuration files,
• For Module
• For unit test as needed
• For assessment
• Messages properties file additions if needed
• Properties files
– Displayer
– Units of measure
– Module-specific
Format Module:
CSV Assessment Criteria
• Delimiter character=?
• EOL character(s)=?
• Escape character =?
• Escape character sequence =?
• All records have same number of columns?
• Contains no escaped fields with untrimmed whitespace?
• Contains no characters other than ASCII-printable?
• Contains no fields with embedded EOL?
See Richard Anderson’s workshop this afternoon!!!!
Configuration Files:
“We’ve got an app for that!”
• Displayer
– jhove2_dpfg.cmd (Windows)
– jhove2_dpfg.sh (Unix)
• Units of measure
– jhove2_upfg.cmd (Windows)
– jhove2_upfg.sh (Unix)
Configuration Files:
Displayer Properties
USAGE:
jhove2_dpfg.cmd
<fully-qualified-classname>
<output-directory-path>
Configuration Files:
Displayer Properties
Example:
jhove2_dpfg.cmd
org.jhove2.module.format.csv.CsvModule
c:\props
Command line output:
Succesfully created displayer property file for
class org.jhove2.module.format.csv.CsvModule
File can be found at
c:\props\org\jhove2\module\format\csv\CsvModule
_displayer.properties
Configuration Files:
Editable File
# _displayer.properties
# The visibility directives control the display of the properties identified by URI
# The directives can be: Always, IfFalse, IfNegative, IfNonNegative, IfNonPositive,
#
IfNonZero, IfPositive, IfTrue, IfZero, Never
# A property is not displayed if its value is not consistent with the directive.
# Negative means ...,-2,-1; NonNegative means 0,1,2...
# Positive means 1,2,3,...; NonPositive means ...,-2,-1,0
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/DelimiterCharacter
Always | Never
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/EolString
Always | Never
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/EscapeCharacter
Always | Never
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/EscapeCharacterSequenceWithinField
Always | Never
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountFirstRecord
Always | Never
| IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZero
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountMax Always | Never | IfNegative
| IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZero
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountMin Always | Never | IfNegative
| IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZero
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldNames
Always | Never
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldsPerRecord
Always | Never
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordCount
Always | Never | IfNegative
| IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZero
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordsWithEmbeddedEolCount Always | Never
| IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZero
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordsWithEmbeddedEscapeCharCount
Always | Never | IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZero
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordsWithUntrimmedWhitespaceCount
Always | Never | IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZero
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/isContainsNonAsciiPrintableChars
Always | Never | IfTrue | IfFalse
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/isEolInLastRecord
Always | Never
| IfTrue | IfFalse
Configuration Files:
Editable File
# _displayer.properties
# The visibility directives control the display of the properties
identified by URI
# The directives can be: Always, IfFalse, IfNegative, IfNonNegative,
IfNonPositive,
#
IfNonZero, IfPositive, IfTrue, IfZero, Never
# A property is not displayed if its value is not consistent with the
directive.
# Negative means ...,-2,-1; NonNegative means 0,1,2...
# Positive means 1,2,3,...; NonPositive means ...,-2,-1,0
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/D
elimiterCharacter Always | Never
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/F
ieldCountFirstRecord
Always | Never | IfNegative | IfNonNegative
| IfNonPositive | IfNonZero | IfPositive | IfZero
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/i
sContainsNonAsciiPrintableChars
Always | Never | IfTrue | IfFalse
Configuration Files:
Editable File
# _displayer.properties
# The visibility directives control the display of the
properties identified by URI
# The directives can be: Always, IfFalse, IfNegative,
IfNonNegative, IfNonPositive,
#
IfNonZero, IfPositive, IfTrue,
IfZero, Never
# A property is not displayed if its value is not consistent
with the directive.
# Negative means ...,-2,-1; NonNegative means 0,1,2...
# Positive means 1,2,3,...; NonPositive means ...,-2,-1,0
http\://jhove2.org/terms/property/org/jhove2/module/format/c
sv/CsvModule/DelimiterCharacter
Always
http\://jhove2.org/terms/property/org/jhove2/module/format/c
sv/CsvModule/FieldCountFirstRecord IfPositive
http\://jhove2.org/terms/property/org/jhove2/module/format/c
sv/CsvModule/isContainsNonAsciiPrintableChars
IfTrue
Configuration Files:
Units of Measure Properties
USAGE:
jhove2_upfg.cmd
<fully-qualified-classname>
<output-directory-path>
Configuration Files:
Units of Measure Properties
Example:
jhove2_upfg.cmd
org.jhove2.module.format.csv.CsvModule c:\props
Command line output:
Succesfully created unit property file for class
org.jhove2.module.format.csv.CsvModule
File can be found at
c:\props\org\jhove2\module\format\csv\CsvModule
_unit.properties
Configuration Files:
Editable File
# Units of measure properties
# Note: These unit of measure labels are descriptive only; changing the label
# does NOT change the determination of the underlying property value.
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/Recor
dCount
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/Recor
dsWithUntrimmedWhitespaceCount
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/Recor
dsWithEmbeddedEscapeCharCount
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/Field
CountMax
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/Field
CountMin
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/Field
CountFirstRecord
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/Recor
dsWithEmbeddedEolCount
Configuration Files:
Editable File
# Units of measure properties
# Note: These unit of measure labels are descriptive only; changing the label
# does NOT change the determination of the underlying property value.
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/Recor
dsWithEmbeddedEolCount record
Format Module Artifacts:
What’s Left?
•
•
•
•
Source code
Configuration files
Sample (test) files
Documents
– Format Module Specification Document
• “We’ve got an app for (part of) that!”
Documentation :
Specification Sections
1.
2.
3.
4.
5.
6.
7.
8.
9.
Introduction
Identification
References
Terminology and Conventions
Validity
Format Profiles
Reportable Properties
Configuration
Implementation Notes
Documentation :
Minimal template edit
1.
2.
3.
4.
5.
6.
7.
8.
9.
Introduction
Identification
References
Terminology and Conventions
Validity
Format Profiles
Reportable Properties
Configuration
Implementation Notes
Documentation :
Sections from Tabular Data
1.
2.
3.
4.
5.
6.
7.
8.
9.
Introduction
Identification
References
Terminology and Conventions
Validity
Format Profiles
Reportable Properties
Configuration
Implementation Notes
Documentation :
Write “By Hand”
1.
2.
3.
4.
5.
6.
7.
8.
9.
Introduction
Identification
References
Terminology and Conventions
Validity
Format Profiles
Reportable Properties
Configuration
Implementation Notes
Documentation
Module Specification Recipe
• Create module specification from Word Template
• Generate tabular information (reportable
properties)
• Use Word macro to format tabular information
for pasting into module specification
• Complete other sections
• Add specification document to JHOVE2 wiki
Documentation :
Create Tabular Data
• Generate tabular information (reportable
properties) for format module specification
– jhove2_doc.cmd (Windows)
– jhove2_doc.sh (Unix)
Documentation :
Create Tabular Data
USAGE:
jhove2_doc.cmd
<fully-qualified-classname>
<output-directory-path
Documentation :
Create Tabular Data
• Outputs
– CsvModule_id.txt
• (Section 2: Identification)
– CsvModule_ref.txt
• (Section 3: References)
– CsvModule_Reportable_properties.txt
• (Section 7: Reportable properties)
Documentation :
Format tabular data with Macro
• Edit the output file in WordPad or NotePad to
save with MS line endings)
• Follow instructions in Macro file to create
formatted text
• Copy and paste in Specification document
Documentation :
Create Tabular Data
IN generated file:
Property
DelimiterCharacter
Identifier
http://jhove2.org/terms/property/org/jhove2/module/f
ormat/csv/CsvModule/DelimiterCharacter
Type
java.lang.String
Description Character used to delimit fields in record.
Reference
RFC 1480, Section 2, paragraph 4
Documentation :
Create Tabular Data
Identifier
Type
Description
Reference
DelimiterCharacter Property
http://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/Deli
miterCharacter
java.lang.String
Character used to delimit fields in record.
RFC 1480, Section 2, paragraph 4
Documentation
Module Specification Recipe
• Create module specification from Word Template
• Generate tabular information (reportable
properties)
• Use Word macro to format tabular information
for pasting into module specification
• Complete other sections
• Add specification document to JHOVE2 wiki
Questions?
http://jhove2.org
[email protected]
[email protected]
CDL
Portico
Advisory Board
Stephen Abrams
Patricia Cruse
John Kunze
Isaac Rabinovitch
Marisa Strong
Perry Willett
John Meyer
Sheila Morrissey
Stanford University
With help from
Richard Anderson
Tom Cramer
Hannah Frost
Walter Henry
Nancy Hoebelheinrich
Keith Johnson
Evan Owens
Deutsche Nationalbibliothek
Dspace / MIT
Ex Libris
Fedora Commons / Rutgers
Florida Center for Library Automation
Harvard University
Koninklijke Bibliotheek
National Archives (UK)
National Archives (US)
National Library of Australia
National Library of New Zealand
NationalbibliothekBibliothèque nationale de
France (BnF)
Planets / Universität zu Köln
Tessella
Library of Congress
Martha Anderson
Justin Littman