Transcript Slide 1

Scientific Data Curation in Government Agencies
Teaching Agency Data Creators How to Develop an OAIS-Compliant Digital Curation System
Lorraine L. Richards, William C. Regli, Adam Townes, YuanYuan Feng
Drexel University, College of Computing and Informatics
Introduction
For many U.S. federal agencies, scientific data management
activities support the immediate research needs of the agency
scientists, but neither support the recently mandated large-scale
data sharing and data reuse requirements [3,4,5], nor the longterm preservation of the data. As a result, agencies are scrambling
to learn how to curate their scientific data sets without sacrificing
current mission-oriented research activities. This poster examines a
case study of the Federal Aviation Administration’s William J.
Hughes Technical Center (WJHTC), which contracted with the Drexel
University project team to develop requirements and build capacity
for a digital curation and preservation system that will meet OAIS
Reference Model recommendations for such a system. Specifically,
this poster presents findings related to teaching non-Archives and
Records Management personnel how to develop a “big data” digital
curation and preservation system.
• Recommendation of a design for the ingest and tagging
mechanisms to auto-generate metadata tags;
• Research into potential standards for the policies and rules for
data sets and access controls; and
• Analysis of scientific research workflows and task analysis.
Educational Goals:
The development of the organizational knowledge and
capabilities needed to issue a request for proposal or additional
statement of work for a contractor to implement, build, and
maintain a digital curation repository that is compliant with the
recommendations of the OAIS Reference Model for the WJHTC
and its current and future users.
Methods
To support both the implementation and educational goals, the
project team chose to engage in action research. It stressed mutual
cooperation between the WJHTC scientists and the Drexel curation
and cyberinfrastructure experts.
Action Research:
Simulation Workstation for Human Factors Simulation [1, 4] and Simulation Data
showing “Mean and standard deviation of elbow and wrist angles, and elevations of
the arm during test scenarios” [1, 12].
Problem Statement
The WJHTC is an organization that uses “big data” information
resources in the course of large-scale scientific research. While the
WJHTC has not previously been engaged in data curation as a
routine activity, it now requires a trustworthy repository for its
scientific research data, in order to meet government mandates and
to engage in data sharing for future mission-critical projects.
Research and Development Activities
The Drexel University project team is performing activities such as:
• Completion of a data inventory;
• Development of a domain ontology and metadata taxonomy;
“…an emergent inquiry process in which applied behavioural
science knowledge is integrated with existing organizational
knowledge and applied to solve real organizational problems.
It is simultaneously concerned with bringing about change in
organizations, in developing self-help competencies in
organizational members and adding to scientific knowledge.
Finally, it is an evolving process that is undertaken in a spirit
of collaboration and co-inquiry” [6, 439].
References
[1] Higgins, J. Stephens et al. 2012. Human Factors Evaluation of Pointing
Devices Used by Air Traffic Controllers: Changes in Physical Workload and
Behavior. Atlantic City, NJ: Department of Transportation. Available at
http://hf.tc.faa.gov/technotes/dot-faa-tc-12-63.pdf
[2] Shani, A.B. and Pasmore, W.A. 2010. “Organization Inquiry: Towards a Model
of the Action Research Process,” in D. Coghlan and A.B. Shani (eds.)
Fundamentals of Organization Development, Vol 1. London: SAGE, pp. 249-260.
[3] Blue Ribbon Task Force on the Sustainability of Digital Preservation and
Access. 2010. Sustainable Economics for a Digital Planet: Ensuring Long-Term
Access to Digital Information. San Diego: SDSC.
http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf.
[4] Office of Management and Budget. 2013. Open Data Policy – Managing
Information as an Asset. Washington, D.C.: Executive Office of the President.
http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-1313.pdf.
[5] Office of Science and Technology Policy (OSTP). 2013a. Increasing Access to
the Results of Federally Funded Scientific Research. Washington, D.C.: Executive
Office of the President.
Some Findings
• Constantly focus on current use to sustain the project and
maintain interest. (See [1]).
• The value add must continually be evaluated and applied to
the organization’s/departments’/individuals’ key objectives,
linking curation and preservation goals to the ongoing data use
priorities.
• Tie the current curation project directly to other key, strategic
projects within the organization, e.g., UAS (unmanned air
space, or drone), project, NextGen (Next Generation) project,
or SWIM (System-Wide Information Management). This can
require continually reformulating progress reports and
education throughout the project, as the organization’s
priorities change.
• To communicate effectively, the teacher must be willing to be
the student. Building trust requires reciprocity.
• Education is not so much “iterative” as “holographic.” One
iterates through the entire process over and over, providing
more detail to the overall “story board.”
• Continued focus on the value-add of curation
• Continued focus on what steps must be followed
• Continued focus on the “big picture” “final” solution/service and
how individual project steps fit into the big picture.
• Examining, documenting, and validating detailed workflows
provides a common language with which to speak.
• Ideas need to be presented in concrete form, using examples
specific to the domain background of the receiving party.
• Academic or preservation-oriented abstractions are not welcome.
• Existing information ontologies and taxonomies can be used to
gain persuasive power and speed up the metadata
development.
• Use organizational-specific or IT-oriented language, rather than
preservation terms, which often lead to confusion and lessen
impact.
• Although the DPCCM was used and initially presented to FAA
personnel, we found that they responded more positively and
with greater acceptance when these findings were “translated”
into the language of NASA’s “Technology Readiness Levels,” with
which they are already familiar.
References, continued
http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_ac
cess_memo_2013.pdf
[6] Office of Science and Technology Policy (OSTP). 2013b. Science and
Technology Priorities for the FY 2015 Budget. Washington, D.C.: Executive Office
of the President. Available at
http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-1316.pdf.