ARL Survey on E-science and Data Support: Initial Findings

Download Report

Transcript ARL Survey on E-science and Data Support: Initial Findings

ARL Survey on E-science and Data
Support: Initial Finding
Wendy Pradt Lougee
E-Science Working Group
October 14, 2009
ARL E-Science
 2006 Task Force inquiry about library activity
 2007 Task Force Report recommendations
o Education, awareness
o Workforce development
o Relationships with relevant organizations
o Infrastructure development (CNI)
o Policy development, new publishing genre
 2008 Forum, e-research presentations
 2009 Working Group survey of membership
E-Science Survey
 Status of institutional planning
 Campus structures
 Infrastructure development
 Status of library planning & engagement
 Involvement in campus planning
 Library services, infrastructure, capacity (staff)
 Pressure points, areas of interest
Institutional structure/organization
 52 respondents
 Institutional infrastructure in place or planned 75%
 Most institutions have hybrid of institution-wide and
unit planning & infrastructure (59%)
 Only 10% are pursuing institution-wide approach
 Institution-wide approaches: include IT, Library,
faculty/researchers, Office of Research
 Unit-based focus: grounded in science/medicine
units
“Organizational structure is too strong a word. There is
periodic interaction and base-touching between science
departments, ITS and sometimes the Libraries that
considers infrastructure needs…”
Institutional infrastructure
Of those institutions with focused infrastructure
(N=41):
 50% report designated unit to provide data
curation support
 40% have conducted assessment of data
resources and needs
 53% use combo of central/distributed data
centers; 44% only distributed centers
 Lack of awareness about digital lab notebook
support
Decentralized themes
“Most science and engineering departments, labs and
centers…have some infrastructure to support high
performance computing, or provide software tools to
process/visualize research data. But none…are clearly
documented on a single webpage or other place where
researchers can easily locate.”
“Currently there is no one central group or effort that focuses on overall
Planning, but a collection of overlapping initiatives and activities –
This is largely because the university is highly decentralized and others in
The institution do not think in terms of e-science but in terms of
Research supported by cyberinfrastructure.”
Centralized strategies
“A cyberinfrastructure task force is in the planning stages, and it will
report to the President of the University.”
“Two groups exist: Cyberinfrastructure Council and Knowledge
Management Committee. The Council is most involved in the
high performance computing, data centers, other computing and
network issues. The Knowledge Management Committee is
more oriented to the content of escience and data curation…”
“An eScience Institute was formed in January2008 by the Vice
President for Technology and the Provost, in partnership with
key Deans and a group of highly distinguished research faculty…”
External funding
 16 institutions engaged in DataNet proposal
development, 15 involved the library
 13 libraries involved in other e-science grants
(NIH, NSF, Mellon and Gates Foundations)
“Investigating Data Curation Profiles Across Multiple Disciplines (Explores
Who is willing to share what data with whom)…Awarded
by IMLS to Purdue, Libraries PI.”
“NSF Office of Cyberinfrastructure award to [Cornell’s] Mann
Library: III-CXT: Promoting the curation of research data through
library-laboratory collaboration.”
Library eScience Support
 Of libraries with institutional activity, 72% of
respondents reported library involvement.
 Organization: group or group/department/
individual lead
E-Research Working Group, Data Curation Working
Group, e-Data Archiving Group, Science Data Services
Team, Data Executive Group, E-Research Team
 86% libraries offering service collaborate with
other units (e.g., IT, colleges/departments,
centers, Ofc. Research)
Data Assessment/audits
 Washington
http://www.washington.edu/lst/research_development/re
search_projects/LSTsurvey
 U Oregon
http://libweb.uoregon.edu/faculty/SciDataAudit.html
 Purdue/UIUC
http://www.datacuratoinprofiles.org
 Wisconsin
http://digital.library.wisc.edu/1793/34859 and
http://digital.library.wisc.edu/1793/21443
Library service portfolio
 Finding, using available infrastructure
 8 libraries maintain web site on services
 Finding relevant data, developing data
management plans, rights management
 8 libraries offer training in data management
 Metadata and archiving consultation/support
 Most (86%) rely on discipline librarians, many
(69%) also have data librarians
Library Technology Infrastructure
 Institutional repositories
 Domain-specific repositories
 Virtual community support (e.g., VIVO)
 Short-term storage, partner in campus storage
solutions
 Publishing infrastructure
 GIS and social science data services, tools
Library Staff & Staff Development
 62% reassigning existing staff
 42% have hired or are planning (39%) to hire
escience expertise
 62 positions detailed:
 Two named chairs
 70% had library/info science degree
 32% had disciplinary degree (10% PhD only)
 Staff development:
 Conference support, in-house presentations, course
support
 7 institutions collaborating with iSchools
Selected Case Studies
Wisconsin
 Research Data Management Study Group (2008)
 Proposed pilot: jointly funded and managed with
research partners DoIT and Library
 Easily accessed, maintained storage and backup
for data; projects, address consultation needs
 Libraries/DoIT digital curation service (2009)
 Assess institutional models
 User-based data management applications
integrated with storage/retrieval system
 Develop digital curation processes & procedures,
data management assistance
University of Washington
 Institution level
 Study of campus needs: no clear consensus on
technology priorities. Areas of convergence: data
management, shared expertise, computing power &
network access, data collection & analysis,
communication & collaboration
 eScience Institute formed 2008, interdisciplinary and
institution-wide coordinating body; Library interfaces on
planning and referral on data curation
 Library:
 Informal, evolving structure involving metadata unit,
research services unit, and health sciences libraries.
 Planned Libraries integrated data services unit
Purdue
 Institutional level
 No one central group, but collection of overlapping
initiatives
 Planned task force on data management (VP Research,
IT, Libraries, Provost office, colleges/schools)
 Libraries Distributed Data Curation Center (D2C2)
pursues curation issues of organizing, facilitating
access to, archiving for and preserving research data
and data sets in complex environments. Brings
together project teams to apply for grants and
promote collaboration on advancing solutions for data
management.
Cornell University
 DISCOVER Research Service Group
 Partnership: domain scientists, Center for Advanced
Computing, Library, Fedora Commons; Sponsored by VP
Research
 Facilitates collaboration, fosters cross-disciplinary analysis of
data using data mining and visualization tools, supports
development of cyberinfrastructure
 Library Data Working Group white paper:
http://hdl.handle.net/1813/10903
 Library maintains Research Data Management &
Publishing Support site
 Library’s DataStar project: supports collaboration,
short term storage, & data sharing during research
Johns Hopkins
 Data Intensive Engineering and Science (IDIES),
most visible umbrella organization for eScience
http://idies.jhu.edu
 Coalesces data-intensive science efforts
 Brings together scholars from School of Arts/Sciences,
Engineering, Sheridan Libraries to form interdisciplinary
teams.
 Facilitates development of tools and methods
 DataNet award: Data Conservancy Project
addressing creation, implementation, and sustained
management of an integrated and comprehensive
data curation strategy across initial disciplinary base
of astronomy, biodiversity, earth sciences, and social
sciences.
UCSD
 Blueprint for the Digital University (2009)
recommendations: colocation facilities, centralized
disk storage, digital curation & data services, CI
network, “condo clusters,” expertise (labor pool).
 Move toward centralized data service (new facility).
Service sponsored by UC Ofc of President, available
to UC researchers
 Collaboration between SD Supercomputer Center and
Libraries.
 Partner in Chronopolis, national center for the
management, long-term preservation, and
promulgation of national digital assets
http://chronopolis.sdsc.edu/
Pressure Points for ARL Libraries
 Organizational:
 Low recognition of importance of e-science support
 Turf issues
 Complexity of structures
 Resources:
 Staff with relevant expertise
 Technology infrastructure
 Budget constraints
Information Exchange Interests
 Initiatives at member institutions
 Share organizational models, position
descriptions
 Assessments of researcher needs, environmental
scans
 Support for digital humanities (models, programs)
 Data curation technologies, data preservation
Next steps
 Leverage the survey to increase info exchange:
 Occasional paper including cases
 Repeat survey in 2 years
 Briefing paper by and for VPs for Research
 “Institute” for e-science teams (discipline/data
librarians and professionals)
 Build on Reinventing Science Librarianship Forum