supporting creativity in science: cooperative knowledge acquisition & knowledge refinement...

Supporting Creativity in Science:Cooperative Knowledge Acquisition & Knowledge

Refinement Systems

Derek SleemanDepartment of Computing Science

The UniversityABERDEEN AB24 3FX

Tel: +44 (0)1224 272296Email: [email protected]

WWW: http//www.csd.abdn.ac.uk

Acknowledgements: EPSRC support for the AKT Consortium

Students: Eugenio Alberdi, David Corsar, Andy Aiken, Mark Winter

OVERVIEW of TALKOVERVIEW of TALK

I: Context: Advanced Knowledge Technologies (AKT) Consortium

II:II: Co-operative Knowledge Acquisition & Knowledge Refinement Systems.

III: ReTAX system

IV: The REFINER++ System

Questions / Discussion

I: AKT’s CHALLENGESI: AKT’s CHALLENGES

Knowledge Acquisition

Knowledge Maintenance

Knowledge Publishing

Knowledge

Modelling

KnowledgeReuse

Life Cycle, Integration Issues & Testbeds

Knowledge Retrieval

Knowledge-Based systems inevitably require a sizeable amount ofdomain knowledge. This can be acquired from:

• domain experts (KA)• detailed examples (using ML techniques) etc

However for complex tasks these KBs are inevitably • incomplete when further Knowledge-Acquisition is

needed;• inconsistent when the KB needs to be refined.• also it is likely that background knowledge will be

incomplete; thus requiring an expert to act as an oracle.

Hence the need for: Co-operative (Problem Solving) Knowledge Acquisition & Knowledge Refinement Systems

II: Co-operative KA & Knowledge Refinement Systems

KRUST (Classical KB; Classification) (Susan Craw)

STALKER (Efficient Truth Maintenance based system; Classification) (Leo Carbonara)

REFINER/Refiner++ / R5 (Case-base; Classification) (Sunil Sharma;Mark Winter; Andy Aiken)

RETAX (Revision of Taxonomies) (Eugenio Alberdi; David Corsar)

CRIMSON (Refinement of Constraints) (Mark Winter)

TIGON Time Series Data/Causal Model (Diagnosis) (Fraser Mitchell)

SALT+ Rules & Constraints; Propose & Revise (Piero Leo)

References see - WWW: http//www.csd.abdn.ac.uk

II: Co-operative KA & Knowledge RefinementSystems

KRUST & Wine AdviserSTALKER

REFINER+ Attendance at Medical Clinics& Stock control

CRIMSON/ConRef Stock control

RETAX Botanical Taxonomies

TIGON Turbines (Fault Detection & Diagnosis)

SALT+ Elevators/Lifts

References see - WWW: http//www.csd.abdn.ac.uk

II: Co-operative KA & Knowledge RefinementSystems

III: RETAX+

The heuristics in RETAX are based on a study to determine how Botanists reacted to a rogue item(s).

There are 2 (principal) rules which determine whether a taxonomy is well formed:

• each child node must be more specialized that its parent• each of a node’s siblings must be unique.

Retax was used to replicate the revision of a major botanical taxonomy done “manually” in Aberdeen’s Botany dept in the 90s.

References: Middleton & Wilcox (1990) Edinburgh Journal of Botany {revision of taxonomy for Pernettya / Gaultheria}Alberdi & Sleeman (1997) AI Journal, p257-279.Alberdi, Sleeman & Korpi (1999) Cognitive Science Journal

Label Wheels Size Motor Engine-Power

Parent Depth

string

ANY

integer-range

(2 – 8)

ordered-set

4

(low medium

large high)

ordered-set

2

(yes no)

Integer-

Range

(0 20)

string

ANY

Integer-

Range

(0 3)

vehicle 2 - 8 (low medium

Large, high)

(yes no) 0 - 20 root 0

train 6 - 8 (medium

Large)

(yes) 15 - 20 vehicle 1

car 3 - 6 (low medium

high)


cycle 2 - 3 (low) (yes no) 0 - 3 vehicle 1

lorry 4 - 8 (medium

high large)


sports-

car

4 (low) (yes) 5 – 10 car 2

salon-car 4 (medium) (yes) 3 – 5 car 2

bicycle 2 (low) (no) 0 cycle 2

motor-

cycle

2 (low) (yes) 1 – 3 cycle 2

large-

lorry

4 – 8 (large) (yes) 6 - 20 lorry 2

small-

van

4 (medium) (yes) 5 – 10 lorry 2

smaller-

van

4 (medium) (yes) 6 small-

van

3

Vehicle

TrainCar Cycle

Sports Car Salon Car Bicycle Motorbike

Lorry

Large Lorry Small Van

Smaller Van

RETAX+

Let’s refer to a new object/node as N, the existing hierarchy/tree as T, and the potential parent node as P. Then possible operations are:

• Is T well formed? (If not report nodes which violate the rules.)

{E.G., If Sibling nodes N1 & N2 are equal, then merge the 2 nodes.}

• Is N already in T?

• Assuming T is well-formed, to which parent node, P, can N be attached without causing T to be rearranged or N modified? (Answer could be none)

• What changes have to be made to N to make it a “legal” child of node P?

• What changes have to be made to T so that N can be a child of P?

• Combinations of the last 2 operations

ReTAX

Ericaceae

Arctostaphylos Arbutus Pernettya Leucothoe Gaultheria Agauria Andromeda

A. uva-ursi A. unedo P. tasminica G.oppositfolia G. rupestris G. antipoda A. polifolia

ReTAX

- Historical: In Bentham & Hooker’s (1876*) classification the main differences detected between the Pernettya & Gaultheria genera were type of fruit and succulence of the calyx features.

*G Bentham & JD Hooker (1876). Genera Plantarum, Vol II, Part2. (Publ: Reeves & Co, London)

- Subsequent botanical investigations in the 20th Century challenged this analysis, but did not suggest any further distinguishing features for the 2 genera; hence the 2 genera were combined, (Middleton & Wilcox, 1990).

ReTAX

Simulation (Simplified)

- The descriptions of several species of the Pernettya & Gaultheria genus were replaced by others with revised features (descriptors) which effect the definitions of the parent nodes (P +G)

- When parent nodes (Pernettya & Gaultheria) are found to be the same, the system checks a set of other features (further facility of ReTAX) to see if they are distinctive & when no differences are found, the 2 nodes (P+G) are collapsed

RETAX+: Current / Future activities

• Use with other experts to help them formulate / refine taxonomies (eg other aspects of botany, microbiology)

• Use RETAX+, or a variant, to formulate / refine ontologies (eg medical terminologies). This has resulted in the Protégé RepairTAB which detects inconsistencies on OWL Ontologies & gives advice about removing inconsistencies. (Lam, Sleeman, Pan, & Wasconcelos (2008) Journal of Data Semantics)

IV: REFINER++ System

• The Refiner++ algorithm

Sample dataset

• Interaction with experts

• Current / future work

The Sample Dataset

Age DBP Associated Disease

Category

1 50 90 D1 A

2 56 90 D2 A

3 52 101 D3 A

4 50 95 D3 B

5 56 97 D3 B

6 - 89 D5 A

7 52 97 D3 A

The Refiner++ Algorithm

• Each case is assigned to a category

• Category descriptions are inferred from the case values

• When a case matches a category it was not assigned, by the expert, this is an inconsistency

• While inconsistencies exist…

A selection of disambiguation strategies are suggested

The user chooses a strategy to be performed

The list of inconsistencies is re-evaluated

• The refined dataset is now consistent

Generating Descriptions

Generalise each field

• Numeric: range from lowest to highest

• String: set of all unique items

• Taxon: nearest common parent

• Boolean: set of all unique items from the set {‘true’, ‘false’, ‘any’}

Combine to get category description

Category Descriptions

Category Age DBP Disease

A 50 – 56 89 – 101 All

B 50 – 56 95 – 97 D3

There are inconsistencies:

Cases 4 and 5 match A

Case 7 matches B

We need to remove the overlap

Disambiguation Strategies

• Change values for certain cases

• Remove values from a category (eg, create a disjunction)

• Reclassify a case

• Make a case match an additional category

• Shelve a problem case

• Add a new field

Refiner++

C1C2

C3

Strategies for this problem

• Change value of DBP in case 7 to 90

• Change value of DBP in case 5 to 95

• Reclassify case 7 to category B

• Add case 7 to category B

• Shelve case 7

• Change value of Disease in cases 3 and 7 to D3

• Reclassify cases 4 and 5 to category A

• Add cases 4 and 5 to category A

• Shelve cases 4 and 5

• Add a new field

Strategy Ordering

Typically, many strategies are suggested

We need heuristics to order them

• Ordered by number of times suggested; prefer strategies which are suggested many times

• Ordered by number of cases affected; prefer strategies which affect fewer cases

The Refiner++ Main Screen

Scalability

Measured the time taken to

perform validation on

randomly-generated datasets

with varying numbers of

cases, fields and categories

For most datasets, time taken

is under 1 second

Use of REFINER++ by Experts*

Refiner++ has been used with various experts including:

• Pain Control Expert (Anaesthesiology)

• Child psychologist

• High Dependency Unit (HDU) Physician

* KCAP-2003 paper (Aiken & Sleeman)

Pain Control

• Pre-existing Access dataset on epidural patients

• Many cases, lots of fields / descriptors

• Refiner++ imported the data (almost) perfectly

• Expert categorised cases based on the length of the epidural (in days)

• REFINER++ took only a few seconds to create category descriptions and validate

But…

Pain Control

• Hundreds of inconsistencies found

• Hundreds of strategies suggested

Almost all which were ‘change value’

• Why did it not work better?

Subjective nature of the subject domain.

Categories were contiguous

Child Psychology

The session was a series of anecdotes and outlines of specific cases

Three types of cases were identified:

• Severely autistic

• Mildly autistic

• Difficulties with language development

Child Psychology

The expert stated that autistic children usually had the

following characteristics:

• Problems with language and verbal communication

• Problems with social interaction

• Obsessive behaviour

These characteristics were abstracted by the knowledge

engineers and subsequently confirmed with the expert

The expert showed no inclination to use REFINER++, but a case set was created by the knowledge engineers

HDU

• Task poised by domain expert: when to move high dependency unit (HDU) patients to a general ward, or the intensive care unit (ICU), or leave them in the HDU.

• Used Refiner++ with three datasets one for each condition (cardiac, neuro & respiratory)

• Expert did not use the system but did dictate the descriptors & the sets of cases to the knowledge engineers who typed this information into REFINER.

• Refiner++ found 2 categories were consistent; & in the third identified inconsistencies

Inconsistent Dataset

HR RR AVPU Sat O2 Cat.

1 105 27 1 94 Higher

2 120 35 2 88 Higher

3 140 45 3 80 Higher

4 105 28 1 94 Same

5 90 22 1 95 Same

6 80 18 1 96 Lower

7 70 15 1 98 Lower

Category Descriptions

• There are inconsistencies: Case 1 matches Category SAME Case 4 matches Category

HIGHER

• We need to remove the overlap• Refiner++ suggested lower and upper ‘danger zones’ for each field

Category HR RR AVPU Sat O2

higher 105-140 27-45 1-3 80-94

same 90-105 22-38 1 94-95

lower 70-80 15-18 1 96-98

Future Work: Use with Domain Experts

• Make the system’s GUI more intuitive (some changes already made)

• Ask expert to come along to the session with a document which summarizes the main features of the dataset they wish to discuss. (In session ask them to highlight principal concepts)

• For each domain expert contacted, record an AVI session of a simple but related domain (eg simple childhood diseases before approach a paediatrician) (demo)

Current Work (ICU domain)

• Developed system which is statistically based, so given a case description it returns the likelihood of that case belonging to one of the predefined categories (R5: Andy Aiken)

• Acquired data set of patients’ physiological parameters from an ICU DB, and have clinicians assign patients on day-by-day & hour-by-hour to a 5-point severity score. (Develop in conjunction with Glasgow Royal Infirmary)

• Using R5 with the above data set to assign new patient reports to a severity class. (Practically important as the descriptors include clinical interventions which “standard” scales don’t.)

• Identify & analyse (explain) anomalous / unusual cases (segments of cases)

VI: Dimensional Analysis ??

•Outline issue

•Pointer to TR

•Pointer to WWW systems / sources

Questions/Comments

V: (Causal) Explanations for Anomalous Medical cases

•Discuss ICU context

•Experiment to detect Anomalous cases / sections of cases

•Outline a typical investigation

V: Seeking to Explain an anomalous Observation

EXPECTED: An injection of X will cause the heart (Organ, O) to increase its contraction rate within T seconds.

SUPPOSE that does not happen, then here are some of the investigations which might be performed:

a) Is the injection being given effectively

b) IF so then check whether the drug X is being transported to Organ, O

a) Is the transport path physically / bio-chemically blocked?

b) Is the transport mechanism inhibited slowed down?

c) IF the drug is actually arriving at Organ O & the conc is OK, then investigate:

a) Is the drug mechanism within the organ being blocked?

b) Is the organ for some reason unable to respond in the usual way (eg weaken heart muscle)

supporting creativity in science: cooperative knowledge acquisition & knowledge refinement...

Documents

ofdomain knowledge

background knowledge

cooperative problem

alberdi sleeman

revision of taxonomy

edinburgh journal of

major botanical taxonomy

akt consortiumstudents