HYDROSEEK and HYDROTAGGER A Search Engine for Hydrologists GIS in Water Resources Lecture M.

Download Report

Transcript HYDROSEEK and HYDROTAGGER A Search Engine for Hydrologists GIS in Water Resources Lecture M.

HYDROSEEK and HYDROTAGGER
A Search Engine for Hydrologists
GIS in Water Resources Lecture
M. Piasecki
November, 2007
11/6/2015
Department of Civil, Architectural & Environmental Engineering
1
Lecture
 Demo of HydroSeek
 What are the search criteria?
 Functionality of the Engine Interface
 Data Sources
 Common Sources
 Common Problems (Completeness, Syntax, Semantics)
 Ontologies
 Ontology details
 Concept-to-data variable tagging
 Architecture
 Flow Chart
 Technologies used
 Demo of HydroTagger
 Why the Tagging?
 Technologies
11/6/2015
Department of Civil, Architectural & Environmental Engineering
2
www.HydroSeek.org
11/6/2015
Department of Civil, Architectural & Environmental Engineering
3
HIS Goals
 Hydrologic Data Access System – better access
to a large volume of high quality hydrologic data
 Support for Observatories – synthesizing
hydrologic data for a region
 Advancement of Hydrologic Science – data
modeling and advanced analysis
 Hydrologic Education – better data in the
classroom, basin-focused teaching
11/6/2015
Department of Civil, Architectural & Environmental Engineering
4
Objective
 Search multiple heterogeneous data sources simultaneously
regardless of semantic or structural differences between them
What we are doing now …..
NWIS
request
return
request
return
NAWQA
request
return
request
return
NAM-12
request
return
request
return
request
return
request
return
NARR
11/6/2015
Department of Civil, Architectural & Environmental Engineering
6
What we would like to do …..
GetValues
Semantic Mediator
GetValues
NWIS
GetValues
GetValues
generic
request
GetValues
GetValues
NAWQA
GetValues
GetValues
NARR
HODM
11/6/2015
Department of Civil, Architectural & Environmental Engineering
7
Data sources…
USGS
EPA
CIMS
TCEQ
NADP
11/6/2015
Department of Civil, Architectural & Environmental Engineering
8
Spatial Coverage
 STORET has 758 sites in Texas, TCEQ has
8407.
 STORET has 47,602 sites in Florida, NWIS has
27,906.
 NWIS has 121,545 in Minnesota, STORET has
22,260.
11/6/2015
Department of Civil, Architectural & Environmental Engineering
10
Data Availability
11/6/2015
Department of Civil, Architectural & Environmental Engineering
11
Temporal Coverage
1977-2003
1957-1977
2003-2007
Nitrogen
11/6/2015
Department of Civil, Architectural & Environmental Engineering
12
Interface Problem
 NWIS ~175 form elements on a single page
 STORET + NWIS + TCEQ + CIMS = ???
A drop down menu ∞
 String search across parameter list? How
about synonyms?
‘Elevation, water surface’ vs. ‘stage height’
11/6/2015
Department of Civil, Architectural & Environmental Engineering
13
Completeness Problem: Metadata Catalog
• Better query performance
• Freedom
• Fewer errors
Total Number of Sites
274,918
Sites with geographic coordinates
274,435
Sites with State/County information
273,113
Sites with Hydrologic Unit Codes
128,646
Availability of geographic identifiers for stations in EPA STORET
11/6/2015
Department of Civil, Architectural & Environmental Engineering
14
Heterogeneity Problem
 Syntax
E.g. date & time formats, Gregorian versus Julian
 Data format/structure
E.g. XML, HTML, tab/tilde/comma separated
text, gunzipped tar balls…
 Semantics
more …..
11/6/2015
Department of Civil, Architectural & Environmental Engineering
15
Issues with Semantics
 Hyponymy
Parameter “Groundwater level”, “Stream stage”, “Reservoir
level” versus “Water level”
 Pseudo hyponymy due to lack of metadata
Parameter “Manganese, 6N hydrochloric acid extracted,
recoverable, dry weight, milligrams per kilogram” versus
“Manganese, milligrams per kilogram”
 Synonymy
‘Total Kjeldahl Nitrogen’ vs. ‘Ammonia+Organic Nitrogen’
11/6/2015
Department of Civil, Architectural & Environmental Engineering
16
Search Strategy
Search  Fine tune  Retrieve
rather than
Search  Retrieve
avoid ‘high precision, low recall’
and ‘low precision, high recall’
problems.
11/6/2015
Department of Civil, Architectural & Environmental Engineering
17
Layered Ontology Model
11/6/2015
Department of Civil, Architectural & Environmental Engineering
18
Core
Navigation
Compound
11/6/2015
Department of Civil, Architectural & Environmental Engineering
19
Knowledge Base
 OWL Ontologies
‘Escherichia coli’ = ‘E. coli’
‘E. coli’ is-a ‘Indicator Organism’
‘Copper’ is-a ‘Micronutrient’
‘Copper’ isMeasuredIn ‘Medium’
‘Medium’ = {Water, Soil…}
‘Micronutrient’ is-a ‘Nutrient’
• Supports classification of
search results
• Entities in the ontology are
associated with measured
variables in a relational
database
• Helps solving semantic
heterogeneity issues
between data repositories
11/6/2015
Department of Civil, Architectural & Environmental Engineering
20
11/6/2015
Department of Civil, Architectural & Environmental Engineering
21
http://www.cuahsi.org/his/webservices.html
USGS
Point Observations Information Model
Data Source
Streamflow gages
GetSites
Network
GetSiteInfo
Neuse River near Clayton, NC
Sites
Discharge, stage
(Daily or instantaneous)
GetVariables
Variables
Values
•
•
•
•
•
•
•
11/6/2015
GetVariableInfo
GetValues
206 cfs, 13 August 2006 {Value, Time, Qualifier, Offset}
A data source operates an observation network
A network is a set of observation sites
A site is a point location where one or more variables are measured
A variable is a property describing the flow or quality of water
A value is an observation of a variable at a particular time
A qualifier is a symbol that provides additional information about the value
An offset allows specification of measurements at various depths in water
Department of Civil, Architectural & Environmental Engineering
22
Hydroseek Webservices
MicroSoft Server
San Diego Supercomputer
Center Server
VirtualEarth Map
 Most Hydroseek functions are
available as web services
(SOAP)
 Support for queries using
GlobalChangeMasterDirectory
GCMD keywords
 Supports output in
GeographyMarkupLanguage
GML as well as WaterML
Native Services
EPA
STORET
WaterOneFlow
USGS
Daily
WaterOneFlow
USGS
Realtime
Drexel Server
CIMS
WaterOneFlow
TCEQ
WaterOneFlow
HydroSeek
11/6/2015
WaterOneFlow
Department of Civil, Architectural & Environmental Engineering
23
GetStations
Request
Response
BoundingBox
11/6/2015
Department of Civil, Architectural & Environmental Engineering
24
GetStationsByHU
Request
Response
HUC_Code
11/6/2015
Department of Civil, Architectural & Environmental Engineering
25
GetStationCatalogueFiltered
Request
Response
11/6/2015
Department of Civil, Architectural & Environmental Engineering
26
GetStationCatalogue
Request
Response
11/6/2015
Department of Civil, Architectural & Environmental Engineering
27
 Allows searching multiple heterogeneous data sources
simultaneously regardless of semantic or structural differences
between them
 Modular & extensible
Architecture Outline
11/6/2015
Inside the CUAHSI HOD Module
Department of Civil, Architectural & Environmental Engineering
28
The Database-Ontology Link
www.HdyroTagger.org
11/6/2015
Department of Civil, Architectural & Environmental Engineering
30
1) FrequentUpDates_Table
2)
MappingsApproved_Table
HydroSeek ODM needed
an upgrade, i.e. additional
tables.
11/6/2015
Department of Civil, Architectural & Environmental Engineering
31
How does the Tagging work?
Step 1
Users need to register on the
web-site first before they can use
the HydroTagger.
When registering select the
testbed site you are affiliated
with. Each testbed site needs
ONE administrator who can then
admit additional users for that
specific testbed site.
Please send an email to identify
the designated tagger site
administrator so we can promote
that person to the role.
11/6/2015
Department of Civil, Architectural & Environmental Engineering
32
How does the Tagging work?
WATERS Network
Information System
Step 2
The “Sniffer” jumps into action
and trawls through the testbed
sites to find and identify new
variablenames (once a week,
currently every Sunday night)
It does so by using the regular
web-services published through
the WSDL (no “hacking”!!!)
It returns i) data updating
information and ii) variablenames
used and compares these to
those used by HydroSeek.
11/6/2015
Department of Civil, Architectural & Environmental Engineering
33
How does the Tagging work?
Test-Bed
CCBay
CCBay
CCBay
VarName
DOConcSuf
DOConcBot
DOConcMid
Siteexist?
Y
Y
N
VarName?
Y
N
Y
content
new data
new variable
new data
Action
update Cat (Time)
place in TaggerBin => DO
upudate Cat (Site+Time)
SRBHOS
DO_Water
Y
Y
new data
update Cat (Time)
Minnehaha TempSurf
Y
MInnehaha StreamDOCon Y
N
N
new variable
new variable
place in TaggerBin => Temp
place in TaggerBin => DO
SantaFe
SantaFe
N
N
new variable
place in TaggerBin => DO
new var/no conc place in TaggerBin => ??
11/6/2015
WaterDOCon Y
GoldConc
Y
Step 3
The Tagger now updates the
HydroSeek catalogue (an
amalgamation of all 10 testbed
catalogues) with the newly found
data entries.
If it finds a new variablename
(introduced during the data
loading process using the DataLoader), it puts it into a table and
offers it up to he HydroTagger
GUI for semantic Tagging.
Department of Civil, Architectural & Environmental Engineering
34
Thank
you…Questions?
11/6/2015
Department of Civil, Architectural & Environmental Engineering
35