Environmental Data Exchange Network (EDEN)

Download Report

Transcript Environmental Data Exchange Network (EDEN)

MCC
Outline





EDEN Project Overview
InfoSleuth in a microsecond
The Ontology in InfoSleuth
Value Mapping and the Environmental Data
Registry
Virtual demo
Microelectronics and Computer Technology Corporation
MCC
Environmental Data Exchange Network

The challenge:
• Acquisition, use and dissemination of environmental information
is of increasing strategic importance to EPA, DOD, DOE, and
EEA

EDEN is an application of MCC's InfoSleuth technology
• Employs intelligent agent technology through the Internet to
conduct concept-based searches of heterogeneous, distributed
information

The EDEN Project demonstrates how organizations can save
time and money:
•
•
•
•
•
Provides easy access over intranet or the Internet
Enables users to access information from multiple sources
Simplifies the exchange and sharing of data
Reduces the reporting burden
Brings information together for presentation and analysis
Microelectronics and Computer Technology Corporation
MCC
Common Set of Requirements

Reduce the reporting burden imposed by the parties on
each other

Sharing of best available and most timely information

Enable users to access information from multiple
sources

Coordinate only the common vocabulary – not the end
use of information resources; focus on the inputs with
each participant; individually interpreting and
communicating outputs
Microelectronics and Computer Technology Corporation
MCC
Pilot Databases

CERCLIS-3: EPA Superfund (Oracle, VA)

ITT: EPA Remediation Technology (MS-Access, TX)

HazDat: EPA Hazardous Substances (Sybase, GA)

ERPIMS: Air Force Env. Restoration (Oracle, TX)

EEA: Basel Convention (Ms-Access, TX)

IRDMIS: Army Installation Restoration (Oracle, MD)

DOE INEEL (Oracle, ID)

DOE ORNL (Oracle, TN)
Microelectronics and Computer Technology Corporation
MCC
InfoSleuth

System of “competent” agents for dynamic,
scalable (SQL-based) access to heterogeneous
distributed information sources

Ontology-based information management

Advertise-discover paradigm supported by
brokering over semantic constraints
Microelectronics and Computer Technology Corporation
MCC
InfoSleuth System





Java-based agents
Knowledge Query Manipulation Language message
layer provides speech-act agent interface
Agent conversation shell provides structure for
KQML messages
Open KnowledgeBase Connectivity language
provides semantic communication layer
Brokering reasoning provided by Logical Data
Language, LDL++
Microelectronics and Computer Technology Corporation
MCC
More InfoSleuth Agents






JDBC Resource agents translate between
application domain ontology and database
schemata
Multi-resource query agent uses either LDL++ or
Oracle to support query decomposition and result
recomposition
Value mapper translates to/from canonical value
domains
Text agent supports ontology-based query
Task execution agent manages CLIPS rule base for
task planning and subscription maintenance
Sentinel and Deviation detection agents cooperate
to detect complex event patterns
Microelectronics and Computer Technology Corporation
MCC
Basic InfoSleuth Application Recipe










6 cups ontology
3 cups resource agent configuration
1-3 cups user interface development
Lightly brown the multi-resource query agent
Pour in other agents out of the box
Stir
Serve
...
add or remove resource agents as desired
add other functionality with more configuration
effort
Microelectronics and Computer Technology Corporation
MCC
A Distributed Query
Resources
broker
agent
User
broker
agent
Viewer
Applets
user
agent
resource
agent
mapping
info
text
task
agent
task
agent
User
Viewer
Applets
Resources
multiquery
agent
ontology
agent
User
Viewer
Applets
multiquery
agent
user
agent
resource
agent
resource
agent
mapping
info
SQL
Refined
Data
valuemap
agent
Microelectronics and Computer Technology Corporation
MCC
Purpose of The Ontology in InfoSleuth

To describe the domain with minimal ambiguity
• the structure defines the domain
• documentation strings

To be the integration hub for the DB schema
• query relaxation through the taxonomy
• vertical fragmentation
• multi-resource path expressions

To provide the language of the queries and the
language of expression of the results
• value mapping
Microelectronics and Computer Technology Corporation
MCC
Expressing the Ontology


OKBC (Open Knowledge Base Connectivity): a
standard for Knowledge Representation
Classes, Slots, Facets:
•
•
•
•
•

(class Observed_Contamination)
(template-slot-of analysis_method Observed_Contamination)
(template-facet-value :VALUE-TYPE analysis_method
Observed_Contamination :STRING)
(template-slot-of site Observed_Contamination)
(template-facet-value :VALUE-TYPE site
Observed_Contamination Eden_Site)
Subclass and Instance-Of Links
Microelectronics and Computer Technology Corporation
MCC
Ontology Features

Value Mapping Modelling
Quantity
Unit Of Measure
canonical unit
Person
height
Distance
Meter
unit
unit
Foot
data-type
STRING
Microelectronics and Computer Technology Corporation
MCC
Value mapping requirements

Translate terms in queries
• Allow users to choose a coding scheme for querying
• Query each database in terms of its own coding scheme

Translate results of queries
• Facilitate merging of data from different sources
• Display results according to user preference
Microelectronics and Computer Technology Corporation
MCC
Value mapping and the ontology




A class has one or more slots
Each slot has a conceptual domain name
Each slot has preferred value domain
Resource Agents must advertise in the preferred
value domain
• possibly translating to/from a different value domain

Users may query and view data in a different value
domain
• User Agent handles translation to/from preferred value
domain
Microelectronics and Computer Technology Corporation
MCC
EDR contents
Conceptual
domain
Value
meaning
CD_VM_Assoc
Value domain
Permissible
value
We use a specialized resource agent
(map agent) to access the EDR
Microelectronics and Computer Technology Corporation
MCC
Additions to EDR




Downloaded files of permissible values for CAS
number and Chemical name (Merck index) from
EPA site
Assigned value meanings
Created value domains for CAS code, CAS padded,
ycode; loaded permissible values
Added 3 extra chemical names because Merck
index file was incomplete
Microelectronics and Computer Technology Corporation
MCC
Linking EDR to EDEN ontology
Conceptual CD_ID
domain
state
8
VD_ID
123
Value
domain
state
abbr
state
code
chemical
name
cas
padded
cas code
123
ycode
903
state
8
chemical
substance
chemical
substance
chemical
substance
chemical
substance
123
123
210
219
357
901
902
Preferred
domain
state
name
state
name
cas
number
cas
number
cas
number
cas
number
PD_ID
6
6
430
430
430
430
Microelectronics and Computer Technology Corporation
MCC
View of the EDR
CREATE VIEW edr_map (conceptual_domain, cd_id,
value_domain, vd_id, preferred_domain, pd_id) AS
SELECT emc.conceptual_domain, emc.value_domain,
pref.pv_nm, act.pv_nm
FROM edr_map_class emc, cd_vm_assoc a,
permissible_value pref, permissible_value act
WHERE a.cd_id = emc.cd_id AND a.vm_id = act.vm_id
AND a.vm_id = pref.vm_id AND emc.vd_id = act.vd_id
AND emc.pd_id = pref.vd_id
Conceptual
domain
Coding
scheme
Preferred
value
Actual
value
chemical
substance
chemical
name
001332214 Arsenic
Microelectronics and Computer Technology Corporation
MCC
Query Processing
9. Result
User Agent
1. Query
2. Query
8. Result
Map Agent
Query Agent
3. Query
7. Result
Resource Agent
4. Query
6. Result
5. Query/
Result
DBMS
Microelectronics and Computer Technology Corporation
MCC
Query translation
SELECT name FROM site WHERE state = ‘Texas’’
translated to
SELECT name FROM site WHERE state = ‘TX’
Microelectronics and Computer Technology Corporation
MCC
Result translation
State Chemical
TX
1332-21-4
Translated to
State
Chemical
Texas
Arsenic
Microelectronics and Computer Technology Corporation
MCC
EDR lookup
SELECT preferred_value
FROM edr_map
WHERE actual_value = ‘Benzene’
AND coding_scheme = ‘chemical_name’
AND conceptual_domain =
‘chemical_substance’
Microelectronics and Computer Technology Corporation
MCC
Outstanding issues

No match in EDR for database value
•
•
•
•


differences in case (‘Texas’, ‘TEXAS’)
CAS number format (dashes, leading zeros)
word order (‘n-Propyl benzene’, ‘Benzene, n-Propyl’)
bad data
Functional mapping needed
Approximate string matching
Microelectronics and Computer Technology Corporation
MCC
DEMO
Microelectronics and Computer Technology Corporation