Transcript Document

Herbarium Data & Visualization
Dynamic Information Visualization Tool
Paul White
Advisor: Dr. Dennis Groth
Outline






Background
Goal
Methods
Demonstration
Results
Discussion
Informatics and Biological Information

Bioinformatics- A merging of biological databases
using information technology in order to combine &
and leverage information that lies within and thus
result in new insights about the biological world.


Information technology useful for developing tools
and techniques for storing, handling, and
communicating data
Biological data reaching massive proportions from
research:
NCBI (GenBank)-Genome and Protein Data
 LandSat (NASA’s Mission to Planet Earth)-Ecosystem
Data
 Biological Collections-Specimen Data (Herbaria)

Herbarium Defined

Natural History and Biological Sciences


IUB campus Deam Herbarium




Biological Collections
 Botanical Collections like an Herbarium
Is a collection of dried plant specimens
Research voucher that identifies and names the
specimen
Mounted, accessioned, specimens that document the
flora of Indiana’s geographical region
Can think of as a library of type specimens

Systematics and Taxonomy
 Classification of living things
How and why do herbaria get used?


IU Herbarium houses the collections of
Charles C. Deam on which the Flora of
Indiana is based.
Some purposes of herbaria


Verification of type specimens
Support of biological research



Herbaria at interface of domains



Comparative & evolutionary analyses
Population & ecological analyses
Between organisms-ecological data
Within organism-genome data
To support a published flora

Resource for education at every level




K-16 students
Educators
Scientists
Government agencies
What is a flora?

A formal catalog of
plants found in a region




Taxonomy
Keys
Controlled Vocabulary
and Glossary
Distribution



Space (geospatial)
Time (historical)
Descriptions


Botanical
Ecological
Deam Herbarium Represented
as a Digital Library



Need for digital access to the herbaria’s collection of plant specimens for
biodiversity research purposes
Digital Library services need to support distributed users through the full range of
the information lifecycle of collaborative knowledge work
Enhancing information access





Textural data that includes 3587 recognized Indiana vascular plants and their
presence in 92 Indiana counties
Based on 140,000 actual plant occurrences consisting of core data (species name,
name of collector, county, date) collected from many sources
Original locations are standardized to county name, analysis is provided as to ID
reliability and location accuracy
Primary data providers are Deam Herbarium , Friesner Herbarium , BONAP,
Missouri Botanical Garden (Kay Yatskievych)
Sharing data through well-established data standards.


GBIF Global Biodiversity Information Facility: To design, implement, coordinate, and promote the compilation world’s biodiversity data. Standardize the
nomenclature in taxonomy.
Darwin Core vocabularies are applied consistently, so that participating
databases can be interoperable.
Goal


Searchable interface that gives a hierarchical taxonomic overview
A geographical means of visually representing the data
Process
Herbarium Data
Incoming
data from
databases
Other collaborative
source data
Perl script
used to
create
conversions
Self
described
file format
XML
XPath to
query the
data
Java applet
for user
interface
Method: Storing the Data

Develop a universal method that provides flexibility


Have metadata. A way to develop a common language. New
data elements- taxonomic data, location data, specimen data.
Attributes for the elements appropriately based on Darwin
Core vocabulary.
XML format adopted as a standard



On-the-fly: Conversion files from biological data using Perl scripts that
are in self-describing file formats
Interoperability: Syntax can vary to suit the research scientists needs
The outcome is a series of XML document types then can be used in a
modular and extensible manner in which the results of a distributed
query can be returned
Snapshot
Input
ACERACEAE
ACERACEAE
ACERACEAE
ACERACEAE
ACERACEAE
ACERACEAE
ACERACEAE
ACERACEAE
ACERACEAE
ACERACEAE
ACERACEAE
ACERACEAE
ACERACEAE
ACORACEAE
ACORACEAE
AGAVACEAE
AGAVACEAE
Acer campestre
Acer negundo
Acer negundo var. negundo
Acer negundo var. texanum
Acer negundo var. violaceum
Acer nigrum
Acer platanoides
Acer rubrum
Acer rubrum var. drummondii
Acer rubrum var. rubrum
Acer rubrum var. trilobum
Acer saccharum var. saccharum
Acer saccharum var. schneckii
Acorus americanus
Acorus calamus
Manfreda virginica
Yucca filamentosa
Output
<family>
<familyname>ACERACEAE</familyname>
<species>Acer campestre</species>
<species>Acer negundo</species>
<species>Acer negundo var. negundo</species>
<species>Acer negundo var. texanum</species>
<species>Acer negundo var. violaceum</species>
<species>Acer nigrum</species>
<species>Acer platanoides</species>
<species>Acer rubrum</species>
<species>Acer rubrum var. drummondii</species>
<species>Acer rubrum var. rubrum</species>
<species>Acer rubrum var. trilobum</species>
<species>Acer saccharum var. saccharum</species>
<species>Acer saccharum var. schneckii</species>
</family>
<family>
<familyname>ACORACEAE</familyname>
<species>Acorus americanus</species>
<species>Acorus calamus</species>
</family>
<family>
<familyname>AGAVACEAE</familyname>
<species>Manfreda virginica</species>
<species>Yucca filamentosa</species>
</family>
Method: Handling the Data


Why pre-compute the various views of the static data?

Underlying information representation

Control user default set of outputs

This is helpful for the XPath functionality by shortening the
wait time for the user
Query the data stored in the conversion files by using XPath to get
a searchable interface.

XPath can be thought of as a query language like SQL. In this
case, it extracts information from an XML document.

Not written in XML syntax but similar to a way a path as in a
directory listing or URL.
 For example, the following XPath expression:
/doc/family/countyname[../species/name='"+item+"']
Method: Communicating the Data

The principle objective of any visual representation of the data is to convey a
message. Most visual tools involve some sort of comparison.



compare an item to another item
compare data relationships
Provide an interactive visualization tool that:

Integrates geographical information with specimen data on vascular plants
of Indiana



Develop a shape file for the state of Indiana that is represented by counties
 Lawrence,47,221,417,221,450,219,453,218,456,218,462,175,462,175,417,221,417
Use XPath in a Java program; to extract write an XPath expression indicating what
information is wanted from an XML document and ask the XPath engine to fetch it.
 /doc/family/countyname[../species/name='"+item+"']
Java Applet for the creation of the Graphical interface
 Fault tolerant after startup
 Fast to run, slow to start
Demonstration
Results


Discovery: What do we do with what we have?

New information about relationships we had not seen before

Helps in analyzing patterns in plant distribution and species
diversity. Plants found only in Northern or Southern regions.

Protecting genetic range and genetic diversity of species
Collaborative Knowledge Work

Information creation and dissemination

Access and presentation

Collaboration
Discussion





Shows regions or areas which have the same characteristics
Density vs. Distribution
Fact Sheet
Have a comprehensive database of Indiana plant occurrence
data, online
Scientists studying distributions must examine many dispersed,
specialized data sets
Indiana Botanical Information System
Native plant of Indiana
Not threatened
Name is current
Distribution:
Lobelia cardinalis L.
Cardinal Flower
Spec. Pl. (1753) 930
Description:
Perennial herb to 1.5 m, with milky sap.
Ovary inferior with 2 locules.
Calyx fused to form hypanthium with 5 sepal lobes.
Petals alternate with sepals.
Corolla slit to base on back and deeply cleft, with lower lip
of 3 petals, and 2 upper petals of equal length.
Stamens fused with bristles on lower 2 anthers.
Recorded in county
Not known in county
Source: Curator Eric Knox
Direct inquiries and comments to the Deam Herbarium ([email protected]).
References

Processing XML with Java by Elliotte Rusty Harold
XPath Essentials by Andrew Watt

Perl Cookbook by Tom Christiansen, Nathan Torkington

Acknowledgements



I wish to thank Dr. Sun Kim and Dr. Mehmet Dalkilic for their words of
encouragement
My advisor Dr. Dennis Groth for taking the time to comment on and
provide insite into this project.
Curator and professor Eric Knox for his enthusiasm and recent effort to
launch the Deam Herbarium into the digital age.