Serving the Incubation of Services

Download Report

Transcript Serving the Incubation of Services

Data Sharing in
Zooarchaeology
Challenges and Promises
Sarah Whitcher Kansa
The Alexandria Archive Institute
Unless otherwise indicated, this work is licensed under a Creative Commons
Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
Why are we publishing data
as part of this project?
• What is data publishing?
• Why is it good for you?
• Started in 2007
• Publishes archaeological data (open
access / open data)
• Archiving by California Digital Library
• Prioritizes access & reuse
• Data reuse is hard
• Needs documentation (esp. methods), ideally
with data creator
• “Standards” applied, recorded in different ways
• Expand on this initial study
“Traditional” Data Sharing
Method
Pros
Cons
Publications
Control (over content shared)
Difficult to reuse
Rewards (promotion, etc.)
Incomplete data
Slow dissemination
Conference
Papers
Control (over content shared)
No control over reuse
Public statement of work
Citation problems
Quick dissemination
Email
Control (over content shared and who
it’s shared with)
One-to-one, hidden
communication channel
Low time-commitment to prepare for
sharing
Lacks documentation
Can’t pass it on for reuse
Direct contact with author (for
questions, trouble-shooting)
How to cite?
No means to validate
Web-Based Data Editing & Publishing
Pros
Cons
Only one-time effort at outset
Requires TIME (at outset)
Control (over content shared)
No control over reuse
Public statement of work – even early on
Exposure
Direct contact with author (trouble-shoot)
Adds critical documentation (methods)
Linked to print publications
Clear citation (also impact)
Others can validate
Rewards (increasing)
More research opportunities (collab., linked data, etc.)
Responsible stewardship of your data!
“Data sharing as publishing” model
… but need concrete examples of reuse!
- NEH projects
- EOL project
EOL Computable Data
Challenge
(Ben Arbuckle, Sarah
W. Kansa, Eric Kansa)
Anatolia Zooarchaeology
Case Study: Aims
• Collaborative research paper(s)
–
–
–
–
Drawing on integrated datasets
Linked to published data
Example of research potential of data publishing
Eventually fill in the gaps (spatial and temporal)
• Data publications
– Lasting outcome, not just one-time integration
– Edited, verified data
– Linked data for future research opportunities
EOL Computable Data Challenge
1.
2.
3.
4.
5.
6.
7.
14 different sites
34+ zooarchaeologists
Decoding, cleanup, metadata
documentation
220,000+ specimens
450 entities linked to 143 EOL
concepts
Collaborative analysis
Parsed out to you because so large
Data are challenging!
1.
2.
3.
4.
Decoding takes 10x longer
More work needed modeling
research methods (esp.
sampling)
Requires lots of back-and-forth
with data authors.
Tension between modeling
needs and familiarity with tools
(Excel).
Archiving is not enough! NEED data editing!
“Distal epiphysis unfused”
http://opencontext.org/vocabularies/open-context-zooarch/zoo-0058
uf. dist.,
f. prox.
d. uf.
30
Distal epiph.
unfused
Distal end unf.
dist.
unfused
“Distal epiphysis unfused”
http://opencontext.org/vocabularies/open-context-zooarch/zoo-0058
Data Documentation Practices
I use an Excel spreadsheet…which I … inherited from my research
advisers. …my dissertation advisor was still recording data for each
specimen on paper when I was in graduate school so that's what I
started …then quickly, I was like, "This is ridiculous.“… I just started
using an Excel spreadsheet that has sort of slowly gotten bigger and
bigger over time with more variables or columns…I've added …color
coding…I also use…a very sort of primitive numerical coding system,
again, that I inherited from my research advisers…So, this little book
that goes with me of codes which is sort of odd, but …we all know
that a 14 is a sheep.” (CCU13)
A long way to go before we
get usable, intelligible data
Open Context Entity Reconciliation
Authors / Editors
relate project-specific
terminologies to
global terminologies
Many projectspecific terms
related to global
terminologies
Editorial work-flow
helps annotate
data for
interoperability
Project Specific Property
EOL Link (Global Terminology)
Cervidae
http://eol.org/pages/7685/ (Cervidae)
Cervid
http://eol.org/pages/7685/ (Cervidae)
Cervinae
http://eol.org/pages/2851334/ (Cervinae)
Cervus / Dama
http://eol.org/pages/2851334/ (Cervinae)
Cervus sp.
http://eol.org/pages/34545/ (Cervus)
Red deer
http://eol.org/pages/328649/ (Cervus elaphus)
Cervus elaphus
http://eol.org/pages/328649/ (Cervus elaphus)
C. elaphus
http://eol.org/pages/328649/ (Cervus elaphus)
“Ovis aries”
http://eol.org/pages/311906/
Sheep
Schaf
Code: 16
Domestic
sheep
O. aries
Code: 70
Ovis aries
Code: 15
Code: 14
Why is linked
data important
for this project?
- Foundation for future work, much
of which we can’t even imagine.
- Disambiguates terms at the outset,
allowing for future informed uses of the
data.
- Growing movement that allows data to
be part of the web (not just on the web).
Questions for this project
(and in collaboration with DIPIR):
- How was the data reuse experience for you?
- Your thoughts on data publication
- Feedback on EOL concepts