slides for knowledge extraction presentation

26
Evaluating Challenge Levels of Medical Images by Measuring Expressed Domain Concepts Xuan Guo 1

Upload: xuan-guo

Post on 09-Aug-2015

44 views

Category:

Documents


2 download

TRANSCRIPT

Evaluating Challenge Levels of Medical Images by Measuring Expressed Domain Concepts

Xuan Guo

1

Challenge Levels of Medical Image Inspection

• Perceptual inspection – involves visual feature saliency, experts’ inspecting

strategies, their habits, etc.

• Conceptual reasoning – requires domain knowledge and clinical

experience, etc.

• Challenge Levels of these images are what physicians perceive and can be measured by their performances.

2

Why is it important to study challenge levels of medical image inspections?

• Help organize medical image-cases based on their challenge levels to physicians

– Closer to physicians’ mental model.

• Help devise medical education system as another application

– Medical images of different challenge levels--> different levels of expertise

– Provide tasks based on performance

3

Starting from the Same Corpus

4

Starting from the Same Corpus

5

Computational Linguistic

• Computational manipulation of natural language -> we call it “NLP”

• Usually use “Natural Language Toolkit (NLTK)”

– Semantic hierarchical structure -> we use “WordNet”, distance & similarity

6

WordNet: An Example

• Fragment of WordNet Concept Hierarchy: nodes correspond to synsets; edges indicate the hypernym/hyponym relation, i.e. the relation between superordinate and subordinate concepts.

7

8

Computational Linguistic

WordNet Lack of medical concepts

Therefore, we use UMLS

– Medical knowledge-base (concepts and relations)

9

My Work

• Linguistic Data Preprocessing – (UMLS) MetaMap for mapping medical texts (in natural

language) into medical concepts

– Used for detecting medical concepts (thus filtering out function words)

• Measuring “Challenge Levels” by defining: – Lexical Consistency (among physicians)

– Conceptual Relatedness (to correct diagnosis)

• Clustering results – Verify the usefulness of (1) verbal narratives, and of (2)

my proposed metrics (consistency and relatedness). 10

Linguistic Data Preprocessing

• A use of domain ontology (UMLS)

• UMLS helps analyze lower level raw data collected from verbal descriptions

• Extracting domain knowledge that is conveyed by medical terms

11

MetaMap: Detecting Multi-word Expression within Narratives (1/2)

12

MetaMap: Detecting Multi-word Expression within Narratives (2/2)

13

Lexical Consistency

• Physicians are same/similar in use of medical concepts?

Subj. erythematous cheek … psoriasis

1 × × … ×

2 × − … ×

… … … … …

16 × × … −

Subj. erythematous cheek … psoriasis

1 × − … −

2 × − … −

… … … … …

16 × − … −

1

3[ 1, 1, 1 ∙ 1, 0, 1 /6 +

1, 1, 1 ∙ 1, 1, 0 /6 +

1, 0, 1 ∙ 1, 1, 0 /4] =𝟏𝟏

𝟑𝟔

1

3[ 1, 0, 0 ∙ 1, 0, 0 /1 +

1, 0, 0 ∙ 1, 0, 0 /1 +

1, 0, 0 ∙ 1, 0, 0 /1] =𝟑𝟔

𝟑𝟔

cosine similarity = 𝐴·𝐵

𝐴 𝐵

14

Lexical Consistency: An Example

15

Lexical Consistency – An Easy Case

Common warts Verruca vulgaris

16

Lexical Consistency – A Difficult Case

Impetigo Atopic dermatitis Zinc deficiency syndrome Kawasaki's Candida Infected with strep or staph Slap cheek syndrome … Acrodermatitis enteropathica

17

Lexical Consistency - Rationale

• Physicians we recruited, with high level of expertise, cannot make the same misdiagnosis.

– For an easy image case, physicians could arrive at the correct diagnosis by observing different clues.

– For a difficult image case, physicians may arrive at different incorrect diagnosis by noticing different tricky clues.

18

Conceptual Relatedness

• Path-based Algorithms

• Definition-based Algorithms

– Definition of each disease → compare vector similarity

19

Definition-based Algorithms: An Example

20

Conceptual Relatedness: An Example

highly-related diagnoses

incorrect diagnoses

somewhat related descriptions

21

Conceptual Relatedness - Rationale

• More relevant speech about the correct diagnosis, more knowledge on this topic.

– To judge the expertise of physicians in training.

22

Lexical Consistency Score, SC

23

Top X Relatedness Scores, SRX

24

Ground Truth & Results

Highly-correlated or moderately-correlated

25

Next Steps

• Combined with other types of data (image features itself, and eye movement patterns) for image retrieval applications.

• Implementing a multimodal interactive image retrieval system.

26