supporting creativity in science: cooperative knowledge acquisition & knowledge refinement...
TRANSCRIPT
Supporting Creativity in Science:Cooperative Knowledge Acquisition & Knowledge
Refinement Systems
Derek SleemanDepartment of Computing Science
The UniversityABERDEEN AB24 3FX
Tel: +44 (0)1224 272296Email: [email protected]
WWW: http//www.csd.abdn.ac.uk
Acknowledgements: EPSRC support for the AKT Consortium
Students: Eugenio Alberdi, David Corsar, Andy Aiken, Mark Winter
OVERVIEW of TALKOVERVIEW of TALK
I: Context: Advanced Knowledge Technologies (AKT) Consortium
II:II: Co-operative Knowledge Acquisition & Knowledge Refinement Systems.
III: ReTAX system
IV: The REFINER++ System
Questions / Discussion
I: AKT’s CHALLENGESI: AKT’s CHALLENGES
Knowledge Acquisition
Knowledge Maintenance
Knowledge Publishing
Knowledge
Modelling
KnowledgeReuse
Life Cycle, Integration Issues & Testbeds
Knowledge Retrieval
Knowledge-Based systems inevitably require a sizeable amount ofdomain knowledge. This can be acquired from:
• domain experts (KA)• detailed examples (using ML techniques) etc
However for complex tasks these KBs are inevitably • incomplete when further Knowledge-Acquisition is
needed;• inconsistent when the KB needs to be refined.• also it is likely that background knowledge will be
incomplete; thus requiring an expert to act as an oracle.
Hence the need for: Co-operative (Problem Solving) Knowledge Acquisition & Knowledge Refinement Systems
II: Co-operative KA & Knowledge Refinement Systems
KRUST (Classical KB; Classification) (Susan Craw)
STALKER (Efficient Truth Maintenance based system; Classification) (Leo Carbonara)
REFINER/Refiner++ / R5 (Case-base; Classification) (Sunil Sharma;Mark Winter; Andy Aiken)
RETAX (Revision of Taxonomies) (Eugenio Alberdi; David Corsar)
CRIMSON (Refinement of Constraints) (Mark Winter)
TIGON Time Series Data/Causal Model (Diagnosis) (Fraser Mitchell)
SALT+ Rules & Constraints; Propose & Revise (Piero Leo)
References see - WWW: http//www.csd.abdn.ac.uk
II: Co-operative KA & Knowledge RefinementSystems
KRUST & Wine AdviserSTALKER
REFINER+ Attendance at Medical Clinics& Stock control
CRIMSON/ConRef Stock control
RETAX Botanical Taxonomies
TIGON Turbines (Fault Detection & Diagnosis)
SALT+ Elevators/Lifts
References see - WWW: http//www.csd.abdn.ac.uk
II: Co-operative KA & Knowledge RefinementSystems
III: RETAX+
The heuristics in RETAX are based on a study to determine how Botanists reacted to a rogue item(s).
There are 2 (principal) rules which determine whether a taxonomy is well formed:
• each child node must be more specialized that its parent• each of a node’s siblings must be unique.
Retax was used to replicate the revision of a major botanical taxonomy done “manually” in Aberdeen’s Botany dept in the 90s.
References: Middleton & Wilcox (1990) Edinburgh Journal of Botany {revision of taxonomy for Pernettya / Gaultheria}Alberdi & Sleeman (1997) AI Journal, p257-279.Alberdi, Sleeman & Korpi (1999) Cognitive Science Journal
Label Wheels Size Motor Engine-Power
Parent Depth
string
ANY
integer-range
(2 – 8)
ordered-set
4
(low medium
large high)
ordered-set
2
(yes no)
Integer-
Range
(0 20)
string
ANY
Integer-
Range
(0 3)
vehicle 2 - 8 (low medium
Large, high)
(yes no) 0 - 20 root 0
train 6 - 8 (medium
Large)
(yes) 15 - 20 vehicle 1
car 3 - 6 (low medium
high)
(yes) 2 - 10 vehicle 1
cycle 2 - 3 (low) (yes no) 0 - 3 vehicle 1
lorry 4 - 8 (medium
high large)
(yes) 5 - 20 vehicle 1
sports-
car
4 (low) (yes) 5 – 10 car 2
salon-car 4 (medium) (yes) 3 – 5 car 2
bicycle 2 (low) (no) 0 cycle 2
motor-
cycle
2 (low) (yes) 1 – 3 cycle 2
large-
lorry
4 – 8 (large) (yes) 6 - 20 lorry 2
small-
van
4 (medium) (yes) 5 – 10 lorry 2
smaller-
van
4 (medium) (yes) 6 small-
van
3
Vehicle
TrainCar Cycle
Sports Car Salon Car Bicycle Motorbike
Lorry
Large Lorry Small Van
Smaller Van
RETAX+
Let’s refer to a new object/node as N, the existing hierarchy/tree as T, and the potential parent node as P. Then possible operations are:
• Is T well formed? (If not report nodes which violate the rules.)
{E.G., If Sibling nodes N1 & N2 are equal, then merge the 2 nodes.}
• Is N already in T?
• Assuming T is well-formed, to which parent node, P, can N be attached without causing T to be rearranged or N modified? (Answer could be none)
• What changes have to be made to N to make it a “legal” child of node P?
• What changes have to be made to T so that N can be a child of P?
• Combinations of the last 2 operations
ReTAX
Ericaceae
Arctostaphylos Arbutus Pernettya Leucothoe Gaultheria Agauria Andromeda
A. uva-ursi A. unedo P. tasminica G.oppositfolia G. rupestris G. antipoda A. polifolia
ReTAX
- Historical: In Bentham & Hooker’s (1876*) classification the main differences detected between the Pernettya & Gaultheria genera were type of fruit and succulence of the calyx features.
*G Bentham & JD Hooker (1876). Genera Plantarum, Vol II, Part2. (Publ: Reeves & Co, London)
- Subsequent botanical investigations in the 20th Century challenged this analysis, but did not suggest any further distinguishing features for the 2 genera; hence the 2 genera were combined, (Middleton & Wilcox, 1990).
ReTAX
Simulation (Simplified)
- The descriptions of several species of the Pernettya & Gaultheria genus were replaced by others with revised features (descriptors) which effect the definitions of the parent nodes (P +G)
- When parent nodes (Pernettya & Gaultheria) are found to be the same, the system checks a set of other features (further facility of ReTAX) to see if they are distinctive & when no differences are found, the 2 nodes (P+G) are collapsed
RETAX+: Current / Future activities
• Use with other experts to help them formulate / refine taxonomies (eg other aspects of botany, microbiology)
• Use RETAX+, or a variant, to formulate / refine ontologies (eg medical terminologies). This has resulted in the Protégé RepairTAB which detects inconsistencies on OWL Ontologies & gives advice about removing inconsistencies. (Lam, Sleeman, Pan, & Wasconcelos (2008) Journal of Data Semantics)
IV: REFINER++ System
• The Refiner++ algorithm
Sample dataset
• Interaction with experts
• Current / future work
The Sample Dataset
Age DBP Associated Disease
Category
1 50 90 D1 A
2 56 90 D2 A
3 52 101 D3 A
4 50 95 D3 B
5 56 97 D3 B
6 - 89 D5 A
7 52 97 D3 A
The Refiner++ Algorithm
• Each case is assigned to a category
• Category descriptions are inferred from the case values
• When a case matches a category it was not assigned, by the expert, this is an inconsistency
• While inconsistencies exist…
A selection of disambiguation strategies are suggested
The user chooses a strategy to be performed
The list of inconsistencies is re-evaluated
• The refined dataset is now consistent
Generating Descriptions
Generalise each field
• Numeric: range from lowest to highest
• String: set of all unique items
• Taxon: nearest common parent
• Boolean: set of all unique items from the set {‘true’, ‘false’, ‘any’}
Combine to get category description
Category Descriptions
Category Age DBP Disease
A 50 – 56 89 – 101 All
B 50 – 56 95 – 97 D3
There are inconsistencies:
Cases 4 and 5 match A
Case 7 matches B
We need to remove the overlap
Disambiguation Strategies
• Change values for certain cases
• Remove values from a category (eg, create a disjunction)
• Reclassify a case
• Make a case match an additional category
• Shelve a problem case
• Add a new field
Refiner++
C1C2
C3
Strategies for this problem
• Change value of DBP in case 7 to 90
• Change value of DBP in case 5 to 95
• Reclassify case 7 to category B
• Add case 7 to category B
• Shelve case 7
• Change value of Disease in cases 3 and 7 to D3
• Reclassify cases 4 and 5 to category A
• Add cases 4 and 5 to category A
• Shelve cases 4 and 5
• Add a new field
Strategy Ordering
Typically, many strategies are suggested
We need heuristics to order them
• Ordered by number of times suggested; prefer strategies which are suggested many times
• Ordered by number of cases affected; prefer strategies which affect fewer cases
The Refiner++ Main Screen
Scalability
Measured the time taken to
perform validation on
randomly-generated datasets
with varying numbers of
cases, fields and categories
For most datasets, time taken
is under 1 second
Use of REFINER++ by Experts*
Refiner++ has been used with various experts including:
• Pain Control Expert (Anaesthesiology)
• Child psychologist
• High Dependency Unit (HDU) Physician
* KCAP-2003 paper (Aiken & Sleeman)
Pain Control
• Pre-existing Access dataset on epidural patients
• Many cases, lots of fields / descriptors
• Refiner++ imported the data (almost) perfectly
• Expert categorised cases based on the length of the epidural (in days)
• REFINER++ took only a few seconds to create category descriptions and validate
But…
Pain Control
• Hundreds of inconsistencies found
• Hundreds of strategies suggested
Almost all which were ‘change value’
• Why did it not work better?
Subjective nature of the subject domain.
Categories were contiguous
Child Psychology
The session was a series of anecdotes and outlines of specific cases
Three types of cases were identified:
• Severely autistic
• Mildly autistic
• Difficulties with language development
Child Psychology
The expert stated that autistic children usually had the
following characteristics:
• Problems with language and verbal communication
• Problems with social interaction
• Obsessive behaviour
These characteristics were abstracted by the knowledge
engineers and subsequently confirmed with the expert
The expert showed no inclination to use REFINER++, but a case set was created by the knowledge engineers
HDU
• Task poised by domain expert: when to move high dependency unit (HDU) patients to a general ward, or the intensive care unit (ICU), or leave them in the HDU.
• Used Refiner++ with three datasets one for each condition (cardiac, neuro & respiratory)
• Expert did not use the system but did dictate the descriptors & the sets of cases to the knowledge engineers who typed this information into REFINER.
• Refiner++ found 2 categories were consistent; & in the third identified inconsistencies
Inconsistent Dataset
HR RR AVPU Sat O2 Cat.
1 105 27 1 94 Higher
2 120 35 2 88 Higher
3 140 45 3 80 Higher
4 105 28 1 94 Same
5 90 22 1 95 Same
6 80 18 1 96 Lower
7 70 15 1 98 Lower
Category Descriptions
• There are inconsistencies: Case 1 matches Category SAME Case 4 matches Category
HIGHER
• We need to remove the overlap• Refiner++ suggested lower and upper ‘danger zones’ for each field
Category HR RR AVPU Sat O2
higher 105-140 27-45 1-3 80-94
same 90-105 22-38 1 94-95
lower 70-80 15-18 1 96-98
Future Work: Use with Domain Experts
• Make the system’s GUI more intuitive (some changes already made)
• Ask expert to come along to the session with a document which summarizes the main features of the dataset they wish to discuss. (In session ask them to highlight principal concepts)
• For each domain expert contacted, record an AVI session of a simple but related domain (eg simple childhood diseases before approach a paediatrician) (demo)
Current Work (ICU domain)
• Developed system which is statistically based, so given a case description it returns the likelihood of that case belonging to one of the predefined categories (R5: Andy Aiken)
• Acquired data set of patients’ physiological parameters from an ICU DB, and have clinicians assign patients on day-by-day & hour-by-hour to a 5-point severity score. (Develop in conjunction with Glasgow Royal Infirmary)
• Using R5 with the above data set to assign new patient reports to a severity class. (Practically important as the descriptors include clinical interventions which “standard” scales don’t.)
• Identify & analyse (explain) anomalous / unusual cases (segments of cases)
VI: Dimensional Analysis ??
•Outline issue
•Pointer to TR
•Pointer to WWW systems / sources
Questions/Comments
V: (Causal) Explanations for Anomalous Medical cases
•Discuss ICU context
•Experiment to detect Anomalous cases / sections of cases
•Outline a typical investigation
V: Seeking to Explain an anomalous Observation
EXPECTED: An injection of X will cause the heart (Organ, O) to increase its contraction rate within T seconds.
SUPPOSE that does not happen, then here are some of the investigations which might be performed:
a) Is the injection being given effectively
b) IF so then check whether the drug X is being transported to Organ, O
a) Is the transport path physically / bio-chemically blocked?
b) Is the transport mechanism inhibited slowed down?
c) IF the drug is actually arriving at Organ O & the conc is OK, then investigate:
a) Is the drug mechanism within the organ being blocked?
b) Is the organ for some reason unable to respond in the usual way (eg weaken heart muscle)