Networked Biodiversity Data and Credibility: Citizen Science and Occurrence Data in CalFlora Nancy Van House SIMS, UC Berkeley www.sims.berkeley.edu/~vanhouse.

Download Report

Transcript Networked Biodiversity Data and Credibility: Citizen Science and Occurrence Data in CalFlora Nancy Van House SIMS, UC Berkeley www.sims.berkeley.edu/~vanhouse.

Networked Biodiversity Data
and Credibility:
Citizen Science and Occurrence Data
in CalFlora
Nancy Van House
SIMS, UC Berkeley
www.sims.berkeley.edu/~vanhouse
Argument
 Networked info >> ready access to unpublished
information
 Information from outside own epistemic community
 Accessed by people from outside own epistemic
community
 Issues of trust and credibility
 Of info
 Of sources
 Of users
 This paper: empirical study of a user-designed, statelevel biodiversity digital library:
 How have consortium of users and producers of info addressed
problems of networked data?
 Practical development of knowledge spaces
Biodiversity Data
 Broad range of datasets: biological,
geographical, meteorological, geological…
 Many, varied data producers and users
 Created, used for different purposes
 Large quantities of data that vary in
 Precision and accuracy
 Methods of data collection, description, storage
 Politically, economically, sensitive data
 Old data particularly valuable
 Change over time
 How things used to be before…
Biodiversity Data, Epistemic
Communities, Knowledge
Spaces
 Boundary crossings
 Scientific specialties
 Planners, governmental agency decisionmakers
 Plant enthusiasts
 Resource-extraction industries
 Environmental activists
 Boundary objects (Star)




Users
Designers
Managers
Technologists
Botanical Occurrence Data




Specimen in hand (herbarium)
Report of sighting, no specimen
Literature – published, scientific
Literature – e.g. flora: ‘this species occurs in
Marin County…
 List - e.g., all species observed on the Bootjack
Trail in Tamalpais State Park, Nov. 4, 2002, by
Joe Smith
 List – “Common plants of Tamalpais State Park”
Observers in the Field as
Source of Fine-Grained Data
 How far north does California Sun Cup grow?
 Is Desert Sand Verbena still growing in Los Angeles county?
(The last observation was recorded in 1935)
 Does Five-finger Fern grow in Ventura county?
(There are no direct observations, but it has been reported
from surrounding counties)
 Does anyone know about that new patch of invasive Artichoke
Thistle growing on my local hillside?
(Early alerts to new infestations while they are small & are
easier to eradicate than well established ones)
 What local biodiversity will be lost if the city decides to allow
that new housing development on the edge of town?
(You can record the plants that are growing there now as a
record for history)
Sources of Data
 Academic researchers
 Professionals in government agencies,
environmental organizations,
 Park rangers, forest service botanists, professionals
in environmental groups…
 Consultants
 Much government research/planning done by
consultants
 Land developers, resource extraction industries hire
them to prepare environmental impact documents
 Native plant enthusiasts
 “Expert amateurs” -- “citizen science”
 California Native Plant Society
 People belong to multiple communities; roles
Risks
 Unreliable info
 Erroneous info
 Undetected duplication > belief that a species is
prevalent >> not preserving a population of a rare
species
 Chasing after erroneous reported sighting of a rare
species
 Confusing naturally-occurring and cultivated
populations
 Accurate but not credible info
 Discounting significant sighting as amateur’s error
 Inappropriate use of info
 Private landowners destroying specimens of a rare
plant to avoid legal limits on land development
 Collectors (over-)collecting specimens of rare or
CalFlora
http://www.calflora.org
 Comprehensive web-accessible database of plant
distribution information for California
 Independent non-profit organization
 Designed/managed by people from botanical
community, not librarians or technologists
 Contributors and users: a coalition of public and
private organizations and individuals
 To exist, has to be responsive to users’ needs,
concerns, practices and negotiate differences x
epistemic and organizational boundaries
 In conjunction with UC Berkeley Digital Library
(http://elib.cs.berkeley.edu)
CalFlora Priorities
 Focus on people; put technology in the
back seat
 Pay attention to how the world works for
the people who produce and use
information
 Honor existing traditions of data exchange
CalFlora Components
 Occurrence Database
 Synonyms Database
 Photos Database
CalFlora Occurrence
Database
 > 850,000 geo-referenced reports of
observations
 Specimens in collections
 Reports from literature
 Reports from field
 Checklists
 Sources
 19 institutions
 Recently began accepting reports from registered
contributors via Internet
Changing Emphases
in Occurrence Data
 Existing data - emphasis on
 Unusual taxa
 Interesting locations, or where observers happened to
be
 Surprisingly small #s
 Most Calif species distributions based on <100 obs
 Data collection methods
 Some emphasize rare taxa, underestimate common
 Some emphasize common taxa, underestimate rare
 New emphasis on common plants
 Preserving species requires preserving community
 Better understanding of current distributions
CalFlora Occurrence
Database: Significance
 Most comprehensive source by far (for Calif)
 Data from many sources ‘synoptically present’
 Adding data from the public:
 “When you have 5 million little trail lists for the whole
state of Calif…all of a sudden you have a real density of
observations [that] would be meaningful.”





Common as well as rare taxa
Reasonably easy to use
Data downloadable, manipulable
Updated quickly
Remote access via Internet
CalFlora Tensions
 Dangers, benefits of info about rare taxa
 Controversy over photos, location info– benefits
outweigh dangers?
 Data Quality
 Accuracy, (undetectable) duplication
 Inclusiveness of observations vs. selectivity, quality
 Trusting users
 Benefits vs. dangers of wide access to information
 Users’ abilities to understand info, use appropriately
vs. guidance from CalFlora, e.g. re quality
Tensions, cont.
 Quality, precision of mapping
 County level too gross; Not too specific for rarities
 Who bears the cost?
 If free, no one has incentive to support it
 Fee may discourage frivolous use
 State: if they charge for their data, even $1, they can
deny people access
 Archiving
 Deletion of modalities? Track data back to source,
definitions, conditions of collection
 Stability of electronic media
 Stability of independent organization
Tensions, Cont.
 Between technologists and information
creators and users
 Techies not understanding the complex social,
organizational, epistemic issues around
creation, maintenance, curation, use of digital
libraries
 Discussed elsewhere –
 http://www.sims.berkeley.edu/~vanhouse/p84vanhouse.pdf
Assessing Trustability of Data
from Expert Amateurs
How (Some) Experts Assess
Occurrence Reports
 The evidence:
 Type of report (specimen, field observation,
list)
 Type of search (casual, directed)
 The source:
 Personal knowledge of contributor’s expertise
 Examination of other contributions, same
person
 Annotations by trusted others
 Ancillary conditions:
 Likelihood of that species appearing at that
time, habitat, geographical location
Current Practice
 Know the individuals:
 “If they are active in CNPS [Calif Native Plant
Society], the people in CNPS know each
another…That’s where you get to that really personal
level of quality control and assurance and data
reliability.”
 “We have a collection of the usual suspects.”
 “When I started my job I went to lots of meetings but I
know everybody now.”
 Review the observations one by one
 “That’s why we have a fairly large concern about any
sort of automated library like CalFlora. No one is
looking at those kind of things.”
How CalFlora Presents
Occurrence Data
 Links to data source(s) – personal and
institutional
 Compliance with institutional source’s
requirements
 Fuzzed locations
 Links to institutional source’s caveats, explanations
 Publicly-contributed observations
 Info about observer
 Info about observation
 Annotations by experts
Data from the public -- How to
identify ‘expert amateurs’?
 May be expert in
 Particular place
 Know the common flora
 Know when something unsual shows up (not not
nec’ly what it is)
 Particular taxon
 Know this taxon and its species and subspecies
(but not necessarily others)
 Wide range of common taxa
 But not unusual ones
Contributor Registration





Biography, credentials (free text)
Expertise/interests (free text)
Affiliation
Contact info/web site
Vows
 “I will submit only my own observations of wild plants.
I realize that this system is only for first-hand reports
about plants, native and introduced, that are growing
without deliberate planting or cultivation.”
 “I will…make sure I have the correct scientific
name…I will submit uncertain identifications only if I
believe them to be very important and time sensitive,
and will label such reports ‘uncertain.’”
Contributor Registration (cont)
 Experience level (self-assessment)
 I am a professional biologist/botanist, or have
professional training in botany.
 Although I do not have formal credentials, I am
recognized as a peer by professional botanists.
 Although I do not consider myself to have
professional-level knowledge, I am quite experienced
in the use of keys and descriptions, and/or have
expertise with the plants for which I will be submitting
observations.
 I do not have extensive experience or background in
botany, but I am confident that I can accurately
identify the plants for which I will be submitting
observations.
Occurrence Report
 Species identification, habitat, location, date
 Method of identification
 “I recognize …from prior determinations and
experience”
 “I compared this plant with herbarium specimens”
 “I keyed this plant in a botanical reference”
 “I compared … with published taxonomic
descriptions”
 “An expert reviewed and confirmed this identification”
 Certainty of identification
 “I am confident of this identification, and submit this
as a positive observation.”
 “I am not certain of this identification but believe it to
Observation Contribution
Process
 Data entered
 Photo appears (if available) – I.e., “Are you
sure?”
 If new county record, notice appears
 I.e., “Are you sure?”
 Lists who will be notified – record likely to be
reviewed
 If new county record, notice sent to county agricultural
officials
 If listed as rare species, notice sent to appropriate state
agency
Annotations
 Herbarium practice: experts annotate
records with corrections, comments
 CalFlora: registered experts can annotate
photos and occurrence records
 Annotation by an expert raises the credibility
of a record.
 Actually – how often?
 Annotation history viewable
Current Developments:
CalFlora Meeting Tomorrow
 Invited wide range of interested parties to
come discuss future of CalFlora
 Services
 Funding
 Seeking to create an engaged user group
 Seeking to create a community around
CalFlora
 Attendees: many people no one seems to
already know
Knowledge Spaces
“ Knowledge is not simply local, it is located....It has place
and creates a space….
“Knowledge spaces have a wide diversity of components:
people, skills, local knowledge and equipment … linked
by social strategies and technical devices …
“To move knowledge from the local site and moment of its
production and application to other places and times,
knowledge producers deploy a variety of social
strategies and technical devices for creating the
equivalences and connections between otherwise
heterogeneous and isolated knowledges….
“Knowledge spaces acquire their … seemingly
unchallengeable naturalness thru the suppression and
denial of work involved in their construction.”
--David Turbull, Masons, Tricksters, and Cartographers p.
19-20
CalFlora and Local Knowledge
 Not as opposed to scientific, but intimate, specific
 In biodiversity:
 Baseline data
 early warning of subtle changes
 How to collect, report, evaluate?
 CalFlora: retain the modalities
 Retain link to observer, info about observer
CalFlora as a Knowledge Space
 Links layers of data, knowledge
 Allows user flexibility in moving local knowledge, combining, filtering
different kinds of data, different sources, making linkages,
equivalences
 Seeks to preserve the work and multiple voices behind the data
 Seeks to create a knowledge space, epistemic community by
making linkages among CalFlora users and contributors
 Moving
 from small-scale and personal
 to large-scale and impersonal
 To large-scale and personal?
Conclusion
 Trust as always a critical issue in knowledge
 Networking as





Foregrounding taken-for-granted practices
Making new practices possible
Creating new knowledge spaces
Making linkages and equivalences across different kinds of knowledge
Empowering users to make own linkages, assessments for different
purposes
 Information systems as sociotechnical networks
 Often invisible to the participants who see them as merely technical
 Using concepts of knowledge spaces, epistemic cultures to help
understand and contribute to system design and use