Slides - OASPA

Download Report

Transcript Slides - OASPA

Data!
Philip E. Bourne Ph.D.
Associate Director for Data Science
National Institutes of Health
Some Context: NIH Data Science History
6/12
2/14
3/14
• Findings:
• Sharing data & software through catalogs
• Support methods and applications development
• Need more training
• Need campus-wide IT strategy
• Hire CSIO
• Continued support throughout the lifecycle
My Bias
 Still a scientist
 A funder who still thinks like a PI
 Not yet attuned to the federal system
 Big supporter of OA via PLOS and others
Data – A Few Observations …
 We talk about the promise of big data, but we don’t
even know the value of little data (aka could “Big
Data” be the new “AI”)
 Good data is expensive in terms of time and money
 Looking at data retroactively is really expensive
 Good data begats trust; trust begats community;
community is God
 The way we support scientific data currently is not
sustainable
 There is no workable business model currently for
scientific data
Data – A Few NIH Observations …
1. We have little idea how much we spend on data –
estimated over $1bn per year
2. We have even less idea how much we should be
spending

Point 2 is part of a culture clash between the more
observational history of biomedicine and the new
analytical approach to discovery
ADDS Mission
Statement
To foster an ecosystem that enables
biomedical research to be conducted
as a digital enterprise that enhances
health, lengthens life and reduces
illness and disability
What Problems Are We Trying to Solve?
Possible Solutions
 Sustainability – 50% business model
 Efficiency – sharing best practices in longitudinal
clinical studies
 Collaboration - identification of collaborators at the
point of data collection not publication
 Reproducibility – data accessible with publication
 Integration – phenotype homogenization
 Accessibility – clinical trials registration
 Quality – sharing CDEs across institutes
 Training – keeping trainees in the ecosystem
The Data Ecosystem
Community
Policy
Infrastructure
• Sustainable
business
model
• Collaboration
• Training
Raw Materials to Seed the Ecosystem
 NIH mandate & support
 ADDS team of 8 people
 Intramural participation of over 100 team members
across ICs
 Funding through BD2K:
– ~$30M in FY14
– ~$80M in FY15
– ....
Example Communities
– NIH
• 20/27 ICs
– Agencies
• NSF
• DOE
– Private sector
• Phrma
• Google
• Amazon
– Organizations
• DARPA
• PCORI
• NIST
• RDA, ELIXIR
– Government
• CCC
• OSTP
• CATS
• HHS HDI
• FASEB, ISCB
• ONC
• Biophysical Society
• CDC
• Sloan Foundation
• FDA
• Moore Foundation
Example Policies
– Clinical data harmonization
– Data citation
– Machine readable data sharing plans on all grants
– New review models, audiences etc.
• Open review
• Micro funding
• Standing data committees to explore best practices
• Crowd sourcing
Example Infrastructure: The Commons
Data
The Why:
Data Sharing Plans
The How:
The End Game:
Scientific
Discovery
The Long Tail
Knowledge
NIH
Awardees
Government
Software
Index
Standards
The
Commons
Core Facilities/HS Centers
Rest of
Academia
Data
Discovery
Index
BD2K
Centers
Usability
Quality
Private
Sector
Security/
Privacy
Metrics/
Standards
Sustainable
Storage
Clinical /Patient
Cloud, Research Objects,
Business Models
What Does the Commons Enable?
 Dropbox like storage
 The opportunity to apply quality metrics
 Bring compute to the data
 A place to collaborate
 A place to discover
http://100plus.com/wp-content/uploads/Data-Commons-3-1024x825.png
One Possible Commons Business Model
HPC, Institution …
[Adapted from George Komatsoulis]
Pilots Around A Virtuous Cycle
Expect a Funding Call
Training & Diversity
 Training & Diversity Goals:
– Develop a sufficient cadre of diverse researchers skilled in
the science of Big Data
– Elevate general competencies in data usage and analysis
across the biomedical research workforce
– Combat the Google bus
 How:
– Traditional training grants
– Work with IC’s on a needs assessment
– Standards for course descriptions with EU
– Work with institutions on raising awareness
– Partner with minority institutions
– Virtual/physical training center(s)?
What Can Open Access Publishers
Do?
 Work with NIH on supporting data citation
 Experiment with the idea of micropublication
 Other?
NIH…
[email protected]
Turning Discovery Into Health