No Slide Title

Download Report

Transcript No Slide Title

Supporting Creativity in Science:
Cooperative Knowledge Acquisition & Knowledge
Refinement Systems
Derek Sleeman
Department of Computing Science
The University
ABERDEEN
AB24 3FX
Tel: +44 (0)1224 272296
Email: [email protected]
WWW: http//www.csd.abdn.ac.uk
Acknowledgements:
EPSRC support for the AKT Consortium
Students: Eugenio Alberdi, David Corsar, Andy Aiken, Mark Winter
OVERVIEW of TALK
I:
Context: Advanced Knowledge Technologies
(AKT) Consortium
II:
Co-operative Knowledge Acquisition &
Knowledge Refinement Systems.
III:
ReTAX system
IV:
The REFINER++ System
Questions / Discussion
I: AKT’s CHALLENGES
Knowledge Acquisition
Knowledge
Maintenance
Knowledge
Publishing
Knowledge
Modelling
Life Cycle, Integration Issues &
Testbeds
Knowledge Retrieval
Knowledge
Reuse
II: Co-operative KA & Knowledge
Refinement Systems
Knowledge-Based systems inevitably require a sizeable amount of
domain knowledge. This can be acquired from:
•
domain experts (KA)
•
detailed examples (using ML techniques) etc
However for complex tasks these KBs are inevitably
•
incomplete when further Knowledge-Acquisition is
needed;
•
inconsistent when the KB needs to be refined.
•
also it is likely that background knowledge will be
incomplete; thus requiring an expert to act as an oracle.
Hence the need for: Co-operative (Problem Solving) Knowledge
Acquisition & Knowledge Refinement Systems
II: Co-operative KA & Knowledge Refinement
Systems
KRUST (Classical KB; Classification)
(Susan Craw)
STALKER (Efficient Truth Maintenance based system; Classification)
(Leo Carbonara)
REFINER/Refiner++ / R5 (Case-base; Classification)
(Sunil Sharma;
Mark Winter; Andy Aiken)
RETAX (Revision of Taxonomies)
(Eugenio Alberdi; David
Corsar)
CRIMSON (Refinement of Constraints)
(Mark Winter)
TIGON
Time Series Data/Causal Model (Diagnosis)
(Fraser Mitchell)
SALT+
Rules & Constraints; Propose & Revise
(Piero Leo)
References see - WWW: http//www.csd.abdn.ac.uk
II: Co-operative KA & Knowledge Refinement
Systems
KRUST &
STALKER
Wine Adviser
REFINER+
Attendance at Medical Clinics
& Stock control
Stock control
CRIMSON/ConRef
RETAX
Botanical Taxonomies
TIGON
Turbines (Fault Detection & Diagnosis)
SALT+
Elevators/Lifts
References see - WWW: http//www.csd.abdn.ac.uk
III: RETAX+
The heuristics in RETAX are based on a study to determine how
Botanists reacted to a rogue item(s).
There are 2 (principal) rules which determine whether a taxonomy is
well formed:
• each child node must be more specialized that its parent
• each of a node’s siblings must be unique.
Retax was used to replicate the revision of a major botanical
taxonomy done “manually” in Aberdeen’s Botany dept in the 90s.
References: Middleton & Wilcox (1990) Edinburgh Journal of Botany
{revision of taxonomy for Pernettya / Gaultheria}
Alberdi & Sleeman (1997) AI Journal, p257-279.
Alberdi, Sleeman & Korpi (1999) Cognitive Science Journal
Label
string
ANY
Wheels
Size
Motor
EnginePower
Parent
Depth
integerrange
ordered-set
Integer-
string
Integer-
4
orderedset
(low medium
2
(0 20)
ANY
(0 3)
large high)
(yes no)
(low medium
(yes no)
0 - 20
root
0
(yes)
15 - 20
vehicle
1
(yes)
2 - 10
vehicle
1
(2 – 8)
vehicle
2-8
Range
Range
Large, high)
train
6-8
(medium
Large)
car
3-6
(low medium
high)
cycle
2-3
(low)
(yes no)
0-3
vehicle
1
lorry
4-8
(medium
(yes)
5 - 20
vehicle
1
high large)
4
(low)
(yes)
5 – 10
car
2
salon-car
4
(medium)
(yes)
3–5
car
2
bicycle
2
(low)
(no)
0
cycle
2
motor-
2
(low)
(yes)
1–3
cycle
2
4–8
(large)
(yes)
6 - 20
lorry
2
4
(medium)
(yes)
5 – 10
lorry
2
4
(medium)
(yes)
6
small-
3
sportscar
cycle
largelorry
smallvan
smaller-
van
van
Vehicle
Train
Car
Cycle
Lorry
Sports Car
Salon Car
Bicycle
Motorbike
Large Lorry
Small Van
Smaller Van
RETAX+
Let’s refer to a new object/node as N, the existing hierarchy/tree as T, and
the potential parent node as P. Then possible operations are:
•
Is T well formed? (If not report nodes which violate the rules.)
{E.G., If Sibling nodes N1 & N2 are equal, then merge the 2 nodes.}
•
•
•
•
•
Is N already in T?
Assuming T is well-formed, to which parent node, P, can N be
attached without causing T to be rearranged or N modified? (Answer
could be none)
What changes have to be made to N to make it a “legal” child of node
P?
What changes have to be made to T so that N can be a child of P?
Combinations of the last 2 operations
ReTAX
Ericaceae
Arctostaphylos Arbutus Pernettya Leucothoe Gaultheria Agauria Andromeda
A. uva-ursi A. unedo
P. tasminica
G.oppositfolia G. rupestris
G. antipoda
A. polifolia
ReTAX
- Historical: In Bentham & Hooker’s (1876*) classification
the main differences detected between the Pernettya &
Gaultheria genera were type of fruit and succulence of the
calyx features.
*G Bentham & JD Hooker (1876). Genera Plantarum, Vol II,
Part2. (Publ: Reeves & Co, London)
- Subsequent botanical investigations in the 20th Century
challenged this analysis, but did not suggest any further
distinguishing features for the 2 genera; hence the 2 genera
were combined, (Middleton & Wilcox, 1990).
ReTAX
Simulation (Simplified)
- The descriptions of several species of the Pernettya &
Gaultheria genus were replaced by others with revised
features (descriptors) which effect the definitions of the
parent nodes (P +G)
- When parent nodes (Pernettya & Gaultheria) are found to
be the same, the system checks a set of other features (further
facility of ReTAX) to see if they are distinctive & when no
differences are found, the 2 nodes (P+G) are collapsed
RETAX+: Current / Future activities
• Use with other experts to help them formulate
/ refine taxonomies (eg other aspects of botany,
microbiology)
•
Use RETAX+, or a variant, to formulate /
refine ontologies (eg medical terminologies). This
has resulted in the Protégé RepairTAB which
detects inconsistencies on OWL Ontologies &
gives advice about removing inconsistencies. (Lam,
Sleeman, Pan, & Wasconcelos (2008) Journal of Data Semantics)
IV: REFINER++ System
•
The Refiner++ algorithm
 Sample dataset
•
Interaction with experts
•
Current / future work
The Sample Dataset
Age
DBP
Associated
Disease
Category
1
50
90
D1
A
2
56
90
D2
A
3
52
101
D3
A
4
50
95
D3
B
5
56
97
D3
B
6
-
89
D5
A
7
52
97
D3
A
The Refiner++ Algorithm
•
•
•
•
Each case is assigned to a category
Category descriptions are inferred from the case values
When a case matches a category it was not assigned, by the
expert, this is an inconsistency
While inconsistencies exist…
 A selection of disambiguation strategies are
suggested
 The user chooses a strategy to be performed
 The list of inconsistencies is re-evaluated
•
The refined dataset is now consistent
Generating Descriptions
Generalise each field
•
•
•
•
Numeric: range from lowest to highest
String: set of all unique items
Taxon: nearest common parent
Boolean: set of all unique items from the set {‘true’, ‘false’,
‘any’}
Combine to get category description
Category Descriptions
Category
Age
DBP
Disease
A
50 – 56
89 – 101
All
B
50 – 56
95 – 97
D3
There are inconsistencies:
Cases 4 and 5 match A
Case 7 matches B
We need to remove the overlap
Disambiguation Strategies
•
•
•
•
•
•
Change values for certain cases
Remove values from a category (eg, create a disjunction)
Reclassify a case
Make a case match an additional category
Shelve a problem case
Add a new field
Refiner++
C2
C1
C3
Strategies for this problem
•
•
•
•
•
•
•
•
•
•
Change value of DBP in case 7 to 90
Change value of DBP in case 5 to 95
Reclassify case 7 to category B
Add case 7 to category B
Shelve case 7
Change value of Disease in cases 3 and 7 to D3
Reclassify cases 4 and 5 to category A
Add cases 4 and 5 to category A
Shelve cases 4 and 5
Add a new field
Strategy Ordering
Typically, many strategies are suggested
We need heuristics to order them
•
•
Ordered by number of times suggested; prefer strategies
which are suggested many times
Ordered by number of cases affected; prefer strategies
which affect fewer cases
The Refiner++ Main Screen
Scalability
Measured the time taken to
perform validation on
randomly-generated datasets
with varying numbers of
cases, fields and categories
For most datasets, time taken
is under 1 second
Use of REFINER++ by Experts*
Refiner++ has been used with various experts including:
• Pain Control Expert (Anaesthesiology)
•
•
Child psychologist
High Dependency Unit (HDU) Physician
* KCAP-2003 paper (Aiken & Sleeman)
Pain Control
• Pre-existing Access dataset on epidural patients
• Many cases, lots of fields / descriptors
• Refiner++ imported the data (almost) perfectly
• Expert categorised cases based on the length of the epidural (in
days)
• REFINER++ took only a few seconds to create category
descriptions and validate
But…
Pain Control
•
•
Hundreds of inconsistencies found
Hundreds of strategies suggested
 Almost all which were ‘change value’
•
Why did it not work better?
 Subjective nature of the subject domain.
 Categories were contiguous
Child Psychology
The session was a series of anecdotes and outlines of specific cases
Three types of cases were identified:
•
•
•
Severely autistic
Mildly autistic
Difficulties with language development
Child Psychology
The expert stated that autistic children usually had the
following characteristics:
•
•
•
Problems with language and verbal communication
Problems with social interaction
Obsessive behaviour
These characteristics were abstracted by the knowledge
engineers and subsequently confirmed with the expert
The expert showed no inclination to use REFINER++, but a case set was
created by the knowledge engineers
HDU
• Task poised by domain expert: when to move high dependency unit
(HDU) patients to a general ward, or the intensive care unit (ICU), or
leave them in the HDU.
• Used Refiner++ with three datasets one for each condition (cardiac,
neuro & respiratory)
• Expert did not use the system but did dictate the descriptors & the sets
of cases to the knowledge engineers who typed this information into
REFINER.
• Refiner++ found 2 categories were consistent; & in the third identified
inconsistencies
Inconsistent Dataset
HR
RR
AVPU
Sat O2
Cat.
1
105
27
1
94
Higher
2
120
35
2
88
Higher
3
140
45
3
80
Higher
4
105
28
1
94
Same
5
90
22
1
95
Same
6
80
18
1
96
Lower
7
70
15
1
98
Lower
Category Descriptions
Category
HR
RR
AVPU
Sat O2
higher
105-140
27-45
1-3
80-94
same
90-105
22-38
1
94-95
lower
70-80
15-18
1
96-98
•
•
•
There are inconsistencies:
 Case 1 matches Category SAME
 Case 4 matches Category
HIGHER
We need to remove the overlap
Refiner++ suggested lower and upper
‘danger zones’ for each field
Future Work: Use with Domain
Experts
•
•
•
Make the system’s GUI more intuitive (some changes already
made)
Ask expert to come along to the session with a document which
summarizes the main features of the dataset they wish to
discuss. (In session ask them to highlight principal concepts)
For each domain expert contacted, record an AVI session of a
simple but related domain (eg simple childhood diseases before
approach a paediatrician) (demo)
Current Work (ICU domain)
•
•
•
•
Developed system which is statistically based, so given a case
description it returns the likelihood of that case belonging to one of
the predefined categories (R5: Andy Aiken)
Acquired data set of patients’ physiological parameters from an
ICU DB, and have clinicians assign patients on day-by-day &
hour-by-hour to a 5-point severity score. (Develop in conjunction
with Glasgow Royal Infirmary)
Using R5 with the above data set to assign new patient reports to a
severity class. (Practically important as the descriptors include
clinical interventions which “standard” scales don’t.)
Identify & analyse (explain) anomalous / unusual cases (segments
of cases)
VI: Dimensional Analysis ??
•Outline issue
•Pointer to TR
•Pointer to WWW systems / sources
Questions/Comments
V: (Causal) Explanations for
Anomalous Medical cases
•Discuss ICU context
•Experiment to detect Anomalous cases / sections of cases
•Outline a typical investigation
V: Seeking to Explain an anomalous
Observation
EXPECTED: An injection of X will cause the heart (Organ, O) to increase
its contraction rate within T seconds.
SUPPOSE that does not happen, then here are some of the investigations
which might be performed:
a) Is the injection being given effectively
b) IF so then check whether the drug X is being transported to Organ, O
a) Is the transport path physically / bio-chemically blocked?
b) Is the transport mechanism inhibited slowed down?
c) IF the drug is actually arriving at Organ O & the conc is OK, then
investigate:
a) Is the drug mechanism within the organ being blocked?
b) Is the organ for some reason unable to respond in the usual way
(eg weaken heart muscle)