Networked Biodiversity Data and Credibility: Citizen Science and Occurrence Data in CalFlora Nancy Van House SIMS, UC Berkeley www.sims.berkeley.edu/~vanhouse.
Download
Report
Transcript Networked Biodiversity Data and Credibility: Citizen Science and Occurrence Data in CalFlora Nancy Van House SIMS, UC Berkeley www.sims.berkeley.edu/~vanhouse.
Networked Biodiversity Data
and Credibility:
Citizen Science and Occurrence Data
in CalFlora
Nancy Van House
SIMS, UC Berkeley
www.sims.berkeley.edu/~vanhouse
Argument
Networked info >> ready access to unpublished
information
Information from outside own epistemic community
Accessed by people from outside own epistemic
community
Issues of trust and credibility
Of info
Of sources
Of users
This paper: empirical study of a user-designed, statelevel biodiversity digital library:
How have consortium of users and producers of info addressed
problems of networked data?
Practical development of knowledge spaces
Biodiversity Data
Broad range of datasets: biological,
geographical, meteorological, geological…
Many, varied data producers and users
Created, used for different purposes
Large quantities of data that vary in
Precision and accuracy
Methods of data collection, description, storage
Politically, economically, sensitive data
Old data particularly valuable
Change over time
How things used to be before…
Biodiversity Data, Epistemic
Communities, Knowledge
Spaces
Boundary crossings
Scientific specialties
Planners, governmental agency decisionmakers
Plant enthusiasts
Resource-extraction industries
Environmental activists
Boundary objects (Star)
Users
Designers
Managers
Technologists
Botanical Occurrence Data
Specimen in hand (herbarium)
Report of sighting, no specimen
Literature – published, scientific
Literature – e.g. flora: ‘this species occurs in
Marin County…
List - e.g., all species observed on the Bootjack
Trail in Tamalpais State Park, Nov. 4, 2002, by
Joe Smith
List – “Common plants of Tamalpais State Park”
Observers in the Field as
Source of Fine-Grained Data
How far north does California Sun Cup grow?
Is Desert Sand Verbena still growing in Los Angeles county?
(The last observation was recorded in 1935)
Does Five-finger Fern grow in Ventura county?
(There are no direct observations, but it has been reported
from surrounding counties)
Does anyone know about that new patch of invasive Artichoke
Thistle growing on my local hillside?
(Early alerts to new infestations while they are small & are
easier to eradicate than well established ones)
What local biodiversity will be lost if the city decides to allow
that new housing development on the edge of town?
(You can record the plants that are growing there now as a
record for history)
Sources of Data
Academic researchers
Professionals in government agencies,
environmental organizations,
Park rangers, forest service botanists, professionals
in environmental groups…
Consultants
Much government research/planning done by
consultants
Land developers, resource extraction industries hire
them to prepare environmental impact documents
Native plant enthusiasts
“Expert amateurs” -- “citizen science”
California Native Plant Society
People belong to multiple communities; roles
Risks
Unreliable info
Erroneous info
Undetected duplication > belief that a species is
prevalent >> not preserving a population of a rare
species
Chasing after erroneous reported sighting of a rare
species
Confusing naturally-occurring and cultivated
populations
Accurate but not credible info
Discounting significant sighting as amateur’s error
Inappropriate use of info
Private landowners destroying specimens of a rare
plant to avoid legal limits on land development
Collectors (over-)collecting specimens of rare or
CalFlora
http://www.calflora.org
Comprehensive web-accessible database of plant
distribution information for California
Independent non-profit organization
Designed/managed by people from botanical
community, not librarians or technologists
Contributors and users: a coalition of public and
private organizations and individuals
To exist, has to be responsive to users’ needs,
concerns, practices and negotiate differences x
epistemic and organizational boundaries
In conjunction with UC Berkeley Digital Library
(http://elib.cs.berkeley.edu)
CalFlora Priorities
Focus on people; put technology in the
back seat
Pay attention to how the world works for
the people who produce and use
information
Honor existing traditions of data exchange
CalFlora Components
Occurrence Database
Synonyms Database
Photos Database
CalFlora Occurrence
Database
> 850,000 geo-referenced reports of
observations
Specimens in collections
Reports from literature
Reports from field
Checklists
Sources
19 institutions
Recently began accepting reports from registered
contributors via Internet
Changing Emphases
in Occurrence Data
Existing data - emphasis on
Unusual taxa
Interesting locations, or where observers happened to
be
Surprisingly small #s
Most Calif species distributions based on <100 obs
Data collection methods
Some emphasize rare taxa, underestimate common
Some emphasize common taxa, underestimate rare
New emphasis on common plants
Preserving species requires preserving community
Better understanding of current distributions
CalFlora Occurrence
Database: Significance
Most comprehensive source by far (for Calif)
Data from many sources ‘synoptically present’
Adding data from the public:
“When you have 5 million little trail lists for the whole
state of Calif…all of a sudden you have a real density of
observations [that] would be meaningful.”
Common as well as rare taxa
Reasonably easy to use
Data downloadable, manipulable
Updated quickly
Remote access via Internet
CalFlora Tensions
Dangers, benefits of info about rare taxa
Controversy over photos, location info– benefits
outweigh dangers?
Data Quality
Accuracy, (undetectable) duplication
Inclusiveness of observations vs. selectivity, quality
Trusting users
Benefits vs. dangers of wide access to information
Users’ abilities to understand info, use appropriately
vs. guidance from CalFlora, e.g. re quality
Tensions, cont.
Quality, precision of mapping
County level too gross; Not too specific for rarities
Who bears the cost?
If free, no one has incentive to support it
Fee may discourage frivolous use
State: if they charge for their data, even $1, they can
deny people access
Archiving
Deletion of modalities? Track data back to source,
definitions, conditions of collection
Stability of electronic media
Stability of independent organization
Tensions, Cont.
Between technologists and information
creators and users
Techies not understanding the complex social,
organizational, epistemic issues around
creation, maintenance, curation, use of digital
libraries
Discussed elsewhere –
http://www.sims.berkeley.edu/~vanhouse/p84vanhouse.pdf
Assessing Trustability of Data
from Expert Amateurs
How (Some) Experts Assess
Occurrence Reports
The evidence:
Type of report (specimen, field observation,
list)
Type of search (casual, directed)
The source:
Personal knowledge of contributor’s expertise
Examination of other contributions, same
person
Annotations by trusted others
Ancillary conditions:
Likelihood of that species appearing at that
time, habitat, geographical location
Current Practice
Know the individuals:
“If they are active in CNPS [Calif Native Plant
Society], the people in CNPS know each
another…That’s where you get to that really personal
level of quality control and assurance and data
reliability.”
“We have a collection of the usual suspects.”
“When I started my job I went to lots of meetings but I
know everybody now.”
Review the observations one by one
“That’s why we have a fairly large concern about any
sort of automated library like CalFlora. No one is
looking at those kind of things.”
How CalFlora Presents
Occurrence Data
Links to data source(s) – personal and
institutional
Compliance with institutional source’s
requirements
Fuzzed locations
Links to institutional source’s caveats, explanations
Publicly-contributed observations
Info about observer
Info about observation
Annotations by experts
Data from the public -- How to
identify ‘expert amateurs’?
May be expert in
Particular place
Know the common flora
Know when something unsual shows up (not not
nec’ly what it is)
Particular taxon
Know this taxon and its species and subspecies
(but not necessarily others)
Wide range of common taxa
But not unusual ones
Contributor Registration
Biography, credentials (free text)
Expertise/interests (free text)
Affiliation
Contact info/web site
Vows
“I will submit only my own observations of wild plants.
I realize that this system is only for first-hand reports
about plants, native and introduced, that are growing
without deliberate planting or cultivation.”
“I will…make sure I have the correct scientific
name…I will submit uncertain identifications only if I
believe them to be very important and time sensitive,
and will label such reports ‘uncertain.’”
Contributor Registration (cont)
Experience level (self-assessment)
I am a professional biologist/botanist, or have
professional training in botany.
Although I do not have formal credentials, I am
recognized as a peer by professional botanists.
Although I do not consider myself to have
professional-level knowledge, I am quite experienced
in the use of keys and descriptions, and/or have
expertise with the plants for which I will be submitting
observations.
I do not have extensive experience or background in
botany, but I am confident that I can accurately
identify the plants for which I will be submitting
observations.
Occurrence Report
Species identification, habitat, location, date
Method of identification
“I recognize …from prior determinations and
experience”
“I compared this plant with herbarium specimens”
“I keyed this plant in a botanical reference”
“I compared … with published taxonomic
descriptions”
“An expert reviewed and confirmed this identification”
Certainty of identification
“I am confident of this identification, and submit this
as a positive observation.”
“I am not certain of this identification but believe it to
Observation Contribution
Process
Data entered
Photo appears (if available) – I.e., “Are you
sure?”
If new county record, notice appears
I.e., “Are you sure?”
Lists who will be notified – record likely to be
reviewed
If new county record, notice sent to county agricultural
officials
If listed as rare species, notice sent to appropriate state
agency
Annotations
Herbarium practice: experts annotate
records with corrections, comments
CalFlora: registered experts can annotate
photos and occurrence records
Annotation by an expert raises the credibility
of a record.
Actually – how often?
Annotation history viewable
Current Developments:
CalFlora Meeting Tomorrow
Invited wide range of interested parties to
come discuss future of CalFlora
Services
Funding
Seeking to create an engaged user group
Seeking to create a community around
CalFlora
Attendees: many people no one seems to
already know
Knowledge Spaces
“ Knowledge is not simply local, it is located....It has place
and creates a space….
“Knowledge spaces have a wide diversity of components:
people, skills, local knowledge and equipment … linked
by social strategies and technical devices …
“To move knowledge from the local site and moment of its
production and application to other places and times,
knowledge producers deploy a variety of social
strategies and technical devices for creating the
equivalences and connections between otherwise
heterogeneous and isolated knowledges….
“Knowledge spaces acquire their … seemingly
unchallengeable naturalness thru the suppression and
denial of work involved in their construction.”
--David Turbull, Masons, Tricksters, and Cartographers p.
19-20
CalFlora and Local Knowledge
Not as opposed to scientific, but intimate, specific
In biodiversity:
Baseline data
early warning of subtle changes
How to collect, report, evaluate?
CalFlora: retain the modalities
Retain link to observer, info about observer
CalFlora as a Knowledge Space
Links layers of data, knowledge
Allows user flexibility in moving local knowledge, combining, filtering
different kinds of data, different sources, making linkages,
equivalences
Seeks to preserve the work and multiple voices behind the data
Seeks to create a knowledge space, epistemic community by
making linkages among CalFlora users and contributors
Moving
from small-scale and personal
to large-scale and impersonal
To large-scale and personal?
Conclusion
Trust as always a critical issue in knowledge
Networking as
Foregrounding taken-for-granted practices
Making new practices possible
Creating new knowledge spaces
Making linkages and equivalences across different kinds of knowledge
Empowering users to make own linkages, assessments for different
purposes
Information systems as sociotechnical networks
Often invisible to the participants who see them as merely technical
Using concepts of knowledge spaces, epistemic cultures to help
understand and contribute to system design and use