Networked Biodiversity Data and Credibility: Citizen Science and Occurrence Data in CalFlora Nancy Van House SIMS, UC Berkeley www.sims.berkeley.edu/~vanhouse.
Download ReportTranscript Networked Biodiversity Data and Credibility: Citizen Science and Occurrence Data in CalFlora Nancy Van House SIMS, UC Berkeley www.sims.berkeley.edu/~vanhouse.
Networked Biodiversity Data and Credibility: Citizen Science and Occurrence Data in CalFlora Nancy Van House SIMS, UC Berkeley www.sims.berkeley.edu/~vanhouse Argument Networked info >> ready access to unpublished information Information from outside own epistemic community Accessed by people from outside own epistemic community Issues of trust and credibility Of info Of sources Of users This paper: empirical study of a user-designed, statelevel biodiversity digital library: How have consortium of users and producers of info addressed problems of networked data? Practical development of knowledge spaces Biodiversity Data Broad range of datasets: biological, geographical, meteorological, geological… Many, varied data producers and users Created, used for different purposes Large quantities of data that vary in Precision and accuracy Methods of data collection, description, storage Politically, economically, sensitive data Old data particularly valuable Change over time How things used to be before… Biodiversity Data, Epistemic Communities, Knowledge Spaces Boundary crossings Scientific specialties Planners, governmental agency decisionmakers Plant enthusiasts Resource-extraction industries Environmental activists Boundary objects (Star) Users Designers Managers Technologists Botanical Occurrence Data Specimen in hand (herbarium) Report of sighting, no specimen Literature – published, scientific Literature – e.g. flora: ‘this species occurs in Marin County… List - e.g., all species observed on the Bootjack Trail in Tamalpais State Park, Nov. 4, 2002, by Joe Smith List – “Common plants of Tamalpais State Park” Observers in the Field as Source of Fine-Grained Data How far north does California Sun Cup grow? Is Desert Sand Verbena still growing in Los Angeles county? (The last observation was recorded in 1935) Does Five-finger Fern grow in Ventura county? (There are no direct observations, but it has been reported from surrounding counties) Does anyone know about that new patch of invasive Artichoke Thistle growing on my local hillside? (Early alerts to new infestations while they are small & are easier to eradicate than well established ones) What local biodiversity will be lost if the city decides to allow that new housing development on the edge of town? (You can record the plants that are growing there now as a record for history) Sources of Data Academic researchers Professionals in government agencies, environmental organizations, Park rangers, forest service botanists, professionals in environmental groups… Consultants Much government research/planning done by consultants Land developers, resource extraction industries hire them to prepare environmental impact documents Native plant enthusiasts “Expert amateurs” -- “citizen science” California Native Plant Society People belong to multiple communities; roles Risks Unreliable info Erroneous info Undetected duplication > belief that a species is prevalent >> not preserving a population of a rare species Chasing after erroneous reported sighting of a rare species Confusing naturally-occurring and cultivated populations Accurate but not credible info Discounting significant sighting as amateur’s error Inappropriate use of info Private landowners destroying specimens of a rare plant to avoid legal limits on land development Collectors (over-)collecting specimens of rare or CalFlora http://www.calflora.org Comprehensive web-accessible database of plant distribution information for California Independent non-profit organization Designed/managed by people from botanical community, not librarians or technologists Contributors and users: a coalition of public and private organizations and individuals To exist, has to be responsive to users’ needs, concerns, practices and negotiate differences x epistemic and organizational boundaries In conjunction with UC Berkeley Digital Library (http://elib.cs.berkeley.edu) CalFlora Priorities Focus on people; put technology in the back seat Pay attention to how the world works for the people who produce and use information Honor existing traditions of data exchange CalFlora Components Occurrence Database Synonyms Database Photos Database CalFlora Occurrence Database > 850,000 geo-referenced reports of observations Specimens in collections Reports from literature Reports from field Checklists Sources 19 institutions Recently began accepting reports from registered contributors via Internet Changing Emphases in Occurrence Data Existing data - emphasis on Unusual taxa Interesting locations, or where observers happened to be Surprisingly small #s Most Calif species distributions based on <100 obs Data collection methods Some emphasize rare taxa, underestimate common Some emphasize common taxa, underestimate rare New emphasis on common plants Preserving species requires preserving community Better understanding of current distributions CalFlora Occurrence Database: Significance Most comprehensive source by far (for Calif) Data from many sources ‘synoptically present’ Adding data from the public: “When you have 5 million little trail lists for the whole state of Calif…all of a sudden you have a real density of observations [that] would be meaningful.” Common as well as rare taxa Reasonably easy to use Data downloadable, manipulable Updated quickly Remote access via Internet CalFlora Tensions Dangers, benefits of info about rare taxa Controversy over photos, location info– benefits outweigh dangers? Data Quality Accuracy, (undetectable) duplication Inclusiveness of observations vs. selectivity, quality Trusting users Benefits vs. dangers of wide access to information Users’ abilities to understand info, use appropriately vs. guidance from CalFlora, e.g. re quality Tensions, cont. Quality, precision of mapping County level too gross; Not too specific for rarities Who bears the cost? If free, no one has incentive to support it Fee may discourage frivolous use State: if they charge for their data, even $1, they can deny people access Archiving Deletion of modalities? Track data back to source, definitions, conditions of collection Stability of electronic media Stability of independent organization Tensions, Cont. Between technologists and information creators and users Techies not understanding the complex social, organizational, epistemic issues around creation, maintenance, curation, use of digital libraries Discussed elsewhere – http://www.sims.berkeley.edu/~vanhouse/p84vanhouse.pdf Assessing Trustability of Data from Expert Amateurs How (Some) Experts Assess Occurrence Reports The evidence: Type of report (specimen, field observation, list) Type of search (casual, directed) The source: Personal knowledge of contributor’s expertise Examination of other contributions, same person Annotations by trusted others Ancillary conditions: Likelihood of that species appearing at that time, habitat, geographical location Current Practice Know the individuals: “If they are active in CNPS [Calif Native Plant Society], the people in CNPS know each another…That’s where you get to that really personal level of quality control and assurance and data reliability.” “We have a collection of the usual suspects.” “When I started my job I went to lots of meetings but I know everybody now.” Review the observations one by one “That’s why we have a fairly large concern about any sort of automated library like CalFlora. No one is looking at those kind of things.” How CalFlora Presents Occurrence Data Links to data source(s) – personal and institutional Compliance with institutional source’s requirements Fuzzed locations Links to institutional source’s caveats, explanations Publicly-contributed observations Info about observer Info about observation Annotations by experts Data from the public -- How to identify ‘expert amateurs’? May be expert in Particular place Know the common flora Know when something unsual shows up (not not nec’ly what it is) Particular taxon Know this taxon and its species and subspecies (but not necessarily others) Wide range of common taxa But not unusual ones Contributor Registration Biography, credentials (free text) Expertise/interests (free text) Affiliation Contact info/web site Vows “I will submit only my own observations of wild plants. I realize that this system is only for first-hand reports about plants, native and introduced, that are growing without deliberate planting or cultivation.” “I will…make sure I have the correct scientific name…I will submit uncertain identifications only if I believe them to be very important and time sensitive, and will label such reports ‘uncertain.’” Contributor Registration (cont) Experience level (self-assessment) I am a professional biologist/botanist, or have professional training in botany. Although I do not have formal credentials, I am recognized as a peer by professional botanists. Although I do not consider myself to have professional-level knowledge, I am quite experienced in the use of keys and descriptions, and/or have expertise with the plants for which I will be submitting observations. I do not have extensive experience or background in botany, but I am confident that I can accurately identify the plants for which I will be submitting observations. Occurrence Report Species identification, habitat, location, date Method of identification “I recognize …from prior determinations and experience” “I compared this plant with herbarium specimens” “I keyed this plant in a botanical reference” “I compared … with published taxonomic descriptions” “An expert reviewed and confirmed this identification” Certainty of identification “I am confident of this identification, and submit this as a positive observation.” “I am not certain of this identification but believe it to Observation Contribution Process Data entered Photo appears (if available) – I.e., “Are you sure?” If new county record, notice appears I.e., “Are you sure?” Lists who will be notified – record likely to be reviewed If new county record, notice sent to county agricultural officials If listed as rare species, notice sent to appropriate state agency Annotations Herbarium practice: experts annotate records with corrections, comments CalFlora: registered experts can annotate photos and occurrence records Annotation by an expert raises the credibility of a record. Actually – how often? Annotation history viewable Current Developments: CalFlora Meeting Tomorrow Invited wide range of interested parties to come discuss future of CalFlora Services Funding Seeking to create an engaged user group Seeking to create a community around CalFlora Attendees: many people no one seems to already know Knowledge Spaces “ Knowledge is not simply local, it is located....It has place and creates a space…. “Knowledge spaces have a wide diversity of components: people, skills, local knowledge and equipment … linked by social strategies and technical devices … “To move knowledge from the local site and moment of its production and application to other places and times, knowledge producers deploy a variety of social strategies and technical devices for creating the equivalences and connections between otherwise heterogeneous and isolated knowledges…. “Knowledge spaces acquire their … seemingly unchallengeable naturalness thru the suppression and denial of work involved in their construction.” --David Turbull, Masons, Tricksters, and Cartographers p. 19-20 CalFlora and Local Knowledge Not as opposed to scientific, but intimate, specific In biodiversity: Baseline data early warning of subtle changes How to collect, report, evaluate? CalFlora: retain the modalities Retain link to observer, info about observer CalFlora as a Knowledge Space Links layers of data, knowledge Allows user flexibility in moving local knowledge, combining, filtering different kinds of data, different sources, making linkages, equivalences Seeks to preserve the work and multiple voices behind the data Seeks to create a knowledge space, epistemic community by making linkages among CalFlora users and contributors Moving from small-scale and personal to large-scale and impersonal To large-scale and personal? Conclusion Trust as always a critical issue in knowledge Networking as Foregrounding taken-for-granted practices Making new practices possible Creating new knowledge spaces Making linkages and equivalences across different kinds of knowledge Empowering users to make own linkages, assessments for different purposes Information systems as sociotechnical networks Often invisible to the participants who see them as merely technical Using concepts of knowledge spaces, epistemic cultures to help understand and contribute to system design and use