Transcript Document
Herbarium Data & Visualization Dynamic Information Visualization Tool Paul White Advisor: Dr. Dennis Groth Outline Background Goal Methods Demonstration Results Discussion Informatics and Biological Information Bioinformatics- A merging of biological databases using information technology in order to combine & and leverage information that lies within and thus result in new insights about the biological world. Information technology useful for developing tools and techniques for storing, handling, and communicating data Biological data reaching massive proportions from research: NCBI (GenBank)-Genome and Protein Data LandSat (NASA’s Mission to Planet Earth)-Ecosystem Data Biological Collections-Specimen Data (Herbaria) Herbarium Defined Natural History and Biological Sciences IUB campus Deam Herbarium Biological Collections Botanical Collections like an Herbarium Is a collection of dried plant specimens Research voucher that identifies and names the specimen Mounted, accessioned, specimens that document the flora of Indiana’s geographical region Can think of as a library of type specimens Systematics and Taxonomy Classification of living things How and why do herbaria get used? IU Herbarium houses the collections of Charles C. Deam on which the Flora of Indiana is based. Some purposes of herbaria Verification of type specimens Support of biological research Herbaria at interface of domains Comparative & evolutionary analyses Population & ecological analyses Between organisms-ecological data Within organism-genome data To support a published flora Resource for education at every level K-16 students Educators Scientists Government agencies What is a flora? A formal catalog of plants found in a region Taxonomy Keys Controlled Vocabulary and Glossary Distribution Space (geospatial) Time (historical) Descriptions Botanical Ecological Deam Herbarium Represented as a Digital Library Need for digital access to the herbaria’s collection of plant specimens for biodiversity research purposes Digital Library services need to support distributed users through the full range of the information lifecycle of collaborative knowledge work Enhancing information access Textural data that includes 3587 recognized Indiana vascular plants and their presence in 92 Indiana counties Based on 140,000 actual plant occurrences consisting of core data (species name, name of collector, county, date) collected from many sources Original locations are standardized to county name, analysis is provided as to ID reliability and location accuracy Primary data providers are Deam Herbarium , Friesner Herbarium , BONAP, Missouri Botanical Garden (Kay Yatskievych) Sharing data through well-established data standards. GBIF Global Biodiversity Information Facility: To design, implement, coordinate, and promote the compilation world’s biodiversity data. Standardize the nomenclature in taxonomy. Darwin Core vocabularies are applied consistently, so that participating databases can be interoperable. Goal Searchable interface that gives a hierarchical taxonomic overview A geographical means of visually representing the data Process Herbarium Data Incoming data from databases Other collaborative source data Perl script used to create conversions Self described file format XML XPath to query the data Java applet for user interface Method: Storing the Data Develop a universal method that provides flexibility Have metadata. A way to develop a common language. New data elements- taxonomic data, location data, specimen data. Attributes for the elements appropriately based on Darwin Core vocabulary. XML format adopted as a standard On-the-fly: Conversion files from biological data using Perl scripts that are in self-describing file formats Interoperability: Syntax can vary to suit the research scientists needs The outcome is a series of XML document types then can be used in a modular and extensible manner in which the results of a distributed query can be returned Snapshot Input ACERACEAE ACERACEAE ACERACEAE ACERACEAE ACERACEAE ACERACEAE ACERACEAE ACERACEAE ACERACEAE ACERACEAE ACERACEAE ACERACEAE ACERACEAE ACORACEAE ACORACEAE AGAVACEAE AGAVACEAE Acer campestre Acer negundo Acer negundo var. negundo Acer negundo var. texanum Acer negundo var. violaceum Acer nigrum Acer platanoides Acer rubrum Acer rubrum var. drummondii Acer rubrum var. rubrum Acer rubrum var. trilobum Acer saccharum var. saccharum Acer saccharum var. schneckii Acorus americanus Acorus calamus Manfreda virginica Yucca filamentosa Output <family> <familyname>ACERACEAE</familyname> <species>Acer campestre</species> <species>Acer negundo</species> <species>Acer negundo var. negundo</species> <species>Acer negundo var. texanum</species> <species>Acer negundo var. violaceum</species> <species>Acer nigrum</species> <species>Acer platanoides</species> <species>Acer rubrum</species> <species>Acer rubrum var. drummondii</species> <species>Acer rubrum var. rubrum</species> <species>Acer rubrum var. trilobum</species> <species>Acer saccharum var. saccharum</species> <species>Acer saccharum var. schneckii</species> </family> <family> <familyname>ACORACEAE</familyname> <species>Acorus americanus</species> <species>Acorus calamus</species> </family> <family> <familyname>AGAVACEAE</familyname> <species>Manfreda virginica</species> <species>Yucca filamentosa</species> </family> Method: Handling the Data Why pre-compute the various views of the static data? Underlying information representation Control user default set of outputs This is helpful for the XPath functionality by shortening the wait time for the user Query the data stored in the conversion files by using XPath to get a searchable interface. XPath can be thought of as a query language like SQL. In this case, it extracts information from an XML document. Not written in XML syntax but similar to a way a path as in a directory listing or URL. For example, the following XPath expression: /doc/family/countyname[../species/name='"+item+"'] Method: Communicating the Data The principle objective of any visual representation of the data is to convey a message. Most visual tools involve some sort of comparison. compare an item to another item compare data relationships Provide an interactive visualization tool that: Integrates geographical information with specimen data on vascular plants of Indiana Develop a shape file for the state of Indiana that is represented by counties Lawrence,47,221,417,221,450,219,453,218,456,218,462,175,462,175,417,221,417 Use XPath in a Java program; to extract write an XPath expression indicating what information is wanted from an XML document and ask the XPath engine to fetch it. /doc/family/countyname[../species/name='"+item+"'] Java Applet for the creation of the Graphical interface Fault tolerant after startup Fast to run, slow to start Demonstration Results Discovery: What do we do with what we have? New information about relationships we had not seen before Helps in analyzing patterns in plant distribution and species diversity. Plants found only in Northern or Southern regions. Protecting genetic range and genetic diversity of species Collaborative Knowledge Work Information creation and dissemination Access and presentation Collaboration Discussion Shows regions or areas which have the same characteristics Density vs. Distribution Fact Sheet Have a comprehensive database of Indiana plant occurrence data, online Scientists studying distributions must examine many dispersed, specialized data sets Indiana Botanical Information System Native plant of Indiana Not threatened Name is current Distribution: Lobelia cardinalis L. Cardinal Flower Spec. Pl. (1753) 930 Description: Perennial herb to 1.5 m, with milky sap. Ovary inferior with 2 locules. Calyx fused to form hypanthium with 5 sepal lobes. Petals alternate with sepals. Corolla slit to base on back and deeply cleft, with lower lip of 3 petals, and 2 upper petals of equal length. Stamens fused with bristles on lower 2 anthers. Recorded in county Not known in county Source: Curator Eric Knox Direct inquiries and comments to the Deam Herbarium ([email protected]). References Processing XML with Java by Elliotte Rusty Harold XPath Essentials by Andrew Watt Perl Cookbook by Tom Christiansen, Nathan Torkington Acknowledgements I wish to thank Dr. Sun Kim and Dr. Mehmet Dalkilic for their words of encouragement My advisor Dr. Dennis Groth for taking the time to comment on and provide insite into this project. Curator and professor Eric Knox for his enthusiasm and recent effort to launch the Deam Herbarium into the digital age.