Archival Identity Description and Control Social Networks and Archival Context and a National Archival Cooperative Program.

Download Report

Transcript Archival Identity Description and Control Social Networks and Archival Context and a National Archival Cooperative Program.

Archival Identity
Description and Control
Social Networks and Archival Context
and a
National Archival Cooperative Program
Archives
• Archives are responsible for the curation of
the documents (records) that people generate
while living and working
• The documents are evidence of human
activity
• As such, archival records are the cornerstone
of our cultural heritage
• The sine qua non of understanding human
history
Archival Description
• Central to the description and control of
records is establishing provenance
• Who created the records and in what context?
• Records are largely unintelligible without
knowing who created them and in what
context
Archival Description
• Thus archivists describe the creators and the
context in which they worked and lived
• Their names, of course, but facts about them
too: when and where they were active, what
they did, where, when, and with whom they
did it
• This is archival authority control: establishing
identities and relating these identities to
other identities and to records
Transforming the Descriptive Methods
• To date, identity description has been interleaved
with record description
• Social Networks and Archival Context (NEH and
Mellon) is demonstrating that separating identity
descriptions from record descriptions and
interrelating them
– Enables providing union access to distributed
resources by and about the identity
– Enables providing access to the social (professional
and family) and intellectual networks within which
people (identities) lived and worked
Social Networks and Archival Context
Funding and Timeline
• SNAC Phase I
– National Endowment for the Humanities
– May 2010-April 2012
• SNAC Phase II
– Andrew W. Mellon Foundation
– May 2012-April 2014
• University of Virginia (lead); University of California, Berkeley;
and the California Digital Library
Daniel V. Pitti § Institute for Advanced
Technology in the Humanities § University
of Virginia
The Source Data
• WorldCat Archival Descriptions: 2M+
• EAD-encoded finding aids (guides to archival records)
– 150K
– Primarily from U.S. sources, but also U.K. and France
– Hundreds of repositories
• Archival authority records (360K)
–
–
–
–
–
U.S. National Archives and Records Administration
State Archive of New York
Smithsonian Institution
British Library
France: Archives nationales, BnF, and CCfr
Daniel V. Pitti § Institute for Advanced
Technology in the Humanities § University
of Virginia
VIAF
• Virtual International Authority File (16M+
cluster records)
– Contributed from around the world by national
libraries and others
– Used for matching
Daniel V. Pitti § Institute for Advanced
Technology in the Humanities § University
of Virginia
Methods and Processing
• Extract names and descriptions of people from record descriptions
(EAD and MARC) and assemble in EAC-CPF (archival) authority
records
– Extracting both creators and referenced CPF names
• Match EAC-CPF records against one another and against existing
authority records (ULAN and VIAF); merge records for the same
entity
– Enhance EAC-CPF by normalizing entries, adding alternative entries,
titles of publications (VIAF),
– **Key challenge: two or more people with the same name; two or
more names for the same person
• Create a prototype historical resource and access system
– Historical data and social-professional-intellectual networks
– Links to archive, library, and museum resources (by and about)
Daniel V. Pitti § Institute for Advanced
Technology in the Humanities § University
of Virginia
Refining and Developing
• All of the processes in SNAC I being revised in SNAC II
– Lessons learned, but also required for building incrementally
• Extraction: 4.5M+ EAC-CPF from the WorldCat MARC
descriptions
• Other records extracted from the Joseph Henry
correspondence (30K); New York State Archives;
Smithsonian expedition field books …
• Processing revision for EAD extraction almost completed
• Revision of matching and merging almost completed
• User studies that will lead to revision of public interfaces
underway
National Archival Authorities
Cooperative
An Emerging National Cooperative
• Transforming a research project into a program
• Building a National Archival Authorities
Infrastructure (IMLS)
• A blueprint for a National Archival Authorities
Cooperative to be published in the fall of 2013
• The core objective: gather archival identity
descriptions in one place, cooperatively
maintained by the community (broadly defined),
with additional assistance from the end-user
community
The Emerging Blueprint
• The National Archives and Records Administration (NARA) will host the
administration of the cooperative, including business and governance
• The technical infrastructure would be developed and hosted outside of
NARA, though in close collaboration with NARA
• Phased approach
– An initial set of partners (government, academic, others?)
– Begin with limited though substantial objectives: data from SNAC, ability to
batch ingest/export; reference implementation of an EAC-CPF editor
– Use the process to work through the issues, to explore and further develop
the business, governance, social, and technological infrastructure
• January 2013 meeting at NARA endorsed this plan
• Immediate next step: cooperatively develop the blueprint with NARA and
government repositories, academic institutions …
Demonstration
• http://socialarchive.iath.virginia.edu/xtf/searc
h
• Or …
Daniel V. Pitti § Institute for Advanced
Technology in the Humanities § University
of Virginia
• RDFa owl:sameAs
HTML 5 microdata in chron list
RDF of the social graph
Thanks Ed
Summers!
&mode=xml2owl [experimental]
For More Information
• http://socialarchive.iath.virginia.edu/ (Project
website)
• http://socialarchive.iath.virginia.edu/xtf/searc
h (public prototype)
Daniel V. Pitti § Institute for Advanced
Technology in the Humanities § University
of Virginia