the neighborhood auditing tool
DESCRIPTION
The Neighborhood Auditing Tool. James Geller Yehoshua Perl C. Paul Morrey. Dayanand Sagar Kushal Chopra Sandeep Ramachandran Anisa Vishnani Aditi Dekhane Kandarp Shah Rajesh Gupta Suraj Pal Singh Saurabh Patel. Kartik Gopal Yakup Kav Rahul Bhave Sirish Motati Pratik Shah - PowerPoint PPT PresentationTRANSCRIPT
The Neighborhood Auditing Tool
James GellerYehoshua PerlC. Paul Morrey
22
Participating Student Developers
Dayanand Sagar Kushal Chopra Sandeep Ramachandran Anisa Vishnani Aditi Dekhane Kandarp Shah Rajesh Gupta Suraj Pal Singh Saurabh Patel
Kartik Gopal Yakup Kav Rahul Bhave Sirish Motati Pratik Shah Saurabh Singhi Sirish Motati Reddy Sandeep Pasuparthy Ramya Gokanakonda
33
Overview
Goals of an Auditor’s Tool for the UMLS Principles of Auditing with Neighborhoods The Idea of a Hybrid Display Current State of the NAT: Serving the Auditor Feature Presentation Live Audit Session Planned State of the NAT: Guiding the Auditor Conclusions and Future Work
44
Auditing the UMLS
The UMLS consists of over 100 terminologies.
It is natural that inconsistencies will appear Over 1.5 million concepts and over 7
million terms Two level structure consisting of the
Semantic Network and the Metathesaurus
5
How We did it before the NAT: Paper Form
CPT: C1081844 Antonospora locustaeSRC: NCBISTY: T004T009 Fungus + InvertebrateDEF:SYN: Antonospora locustae | Nosema locustaePAR: Antonospora{STY: Invertebrate}CHD:
6
Previous Work on Auditing H. Gu, Y. Perl, J. Geller, M. Halper, L. Liu, and J.J. Cimino.
Representing the UMLS as an Object-oriented Database: Modeling Issues and Advantages. J Am Med Inform Assoc, 7(1):66-80, 2000.
J. Geller, H. Gu, Y. Perl, and M. Halper. Semantic refinement and error correction in large terminological knowledge bases. Data & Knowledge Engineering, 45(1):1-32, 2003.
Y. Chen, Y. Perl, J. Geller, and J.J. Cimino. Analysis of a study of the users, uses, and future agenda of the UMLS. J Am Med Inform Assoc, 14(2):221-231, 2007.
H. Gu, G. Hripcsak, Y. Chen, C.P. Morrey, G. Elhanan, J.J. Cimino, J. Geller, and Y. Perl. Evaluation of a UMLS auditing process of semantic type assignments. In J.M. Teich, J. Suermondt, and G. Hripcsak, editors, Proc AMIA Symp, pages 294-298, Chicago IL, Nov. 2007.
77
Auditing Results Paper Form(C1081844) Antonospora locustaeSTY: Fungus + Invertebrate
No errors Semantic Type Error: Fungus Semantic Type Error: Invertebrate Ambiguity Add Semantic Type______________________ Other error_____________________________ Comments _____________________________
______________________________________
88
Goals of an Auditor’s Tool for the UMLS
Display relevant information to the auditor. Do not overwhelm the auditor with too
much information. Helps the auditor focus on areas most
likely to contain errors.Neighborhood display of reviewed conceptsAlgorithms suggest likely erroneous concepts
99
Principles of Auditing with Neighborhoods
Several years of experience: Auditing is to a large degree a “local” activity.
Concepts have two kinds of knowledge elements:Textual Knowledge Elements: Preferred term,
CUI, synonyms, LUI, definition, sources, semantic types
CONtextual Knowledge Elements: Neighbors
1010
Neighborhoods
Focus concept: The concept presently under review
Immediate Neighborhood: The set of concepts reachable from the focus concept by stepping one relationship (up, down, lateral, etc.)
Extended neighborhood: Includes parents of parents (grandparents), children of children (grandchildren) and siblings. No lateral chains.
1111
Immediate Neighborhood
Microsporidia, Unclassified
Microsporidia <protozoa>
Dictyocoela Edhazardia
FibrillanosemaMicrosporidium
Kabatana
Oligosporidium
Cellular aspects of
Microbiological
Pathogenicity Aspects
virologic
1212
Extended Neighborhood
RELATIONSHIPS
SIBLINGS
GRANDCHILDREN
CHILDREN
FOCUS CONCEPT
PARENTS
GRANDPARENTS
Microsporidia, Unclassified
Microsporidia <protozoa>
Erroneous concept
fungus
PHYLUM MICROSPORA
Protozoa
Sporozeoa
Dictyocoela Edhazardia
FibrillanosemaMicrosporidium
Dictyocoela berillonum
Dictyocoela cavimanum
Edhazardia aedis
Fibrillanosema crangonycis
Microsporidium 57864
Dictyocoela dehayesum
Dictyocoela duebenum
Dictyocoela grammarellum
Dictyocoela muelleri
Dictyocoela sp.L11
Kabatana
Kabatana takedai
Microsporidium africanum
Microsporidium ceylonensis
Microsporidium cypselurus
Microsporidium prosopium
Microsporidium seriolae
Oligosporidium
Oligosporidium occidentalis
Microsporea
Cellular aspects of
Microbiological
Pathogenicity Aspects
virologic
SIB
13
Up-Extended and Down-Extended Neighborhood
An up-extended neighborhood includes grandparents and the immediate neighborhood.
A down-extended neighborhood includes grandchildren and the immediate neighborhood.
Give auditor all s/he needs but not more.
14
Semantic Type Neighborhood
If we provide the semantic types for every concept, those also form a neighborhood.
It is important to keep the information which semantic types belong to which concepts.
15
References about Neighborhood M.S. Tuttle, D.D. Sherertz, N.E. Olson, M.S. Erlbaum, W.D.
Sperzel, and L.F. Fuller, et al. Using META-1, the first version of the UMLS Metathesaurus. In Proc 14th Annu Symp Comput Appl Med Care, pages 131-135, Washington, D.C., 1990.
S.J. Nelson, M.S. Tuttle, W.G. Cole, D.D. Sherertz, W. D. Sperzel, M.S. Erlbaum, L.L. Fuller, N.E. Olson, From meaning to term: semantic locality in the UMLS Metathesaurus. In Proc Annu Symp Comput Appl Med Care, pages 209-213, Washington, D.C., 1991.
J.J. Cimino, H. Min, and Y. Perl. Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus. J Biomed Inform, 36(6):450-461, 2003.
1616
Desirable Information Beyond Neighborhoods
Concept definition for Focus Concept Concept sources for Focus Concept Assigned Semantic Types of concepts Definitions of relevant Semantic Types Global view of the Semantic Network
Indented (better for wide branches)Graphical (better for almost everything else)–
we set the standard on this.
1717
The Idea of a Hybrid Display
Diagrams are wonderful – as long as they fit on one screen.
Indented text is wonderful – as long as there are no or very few multiple parents.
But the UMLS does not fit onto one screen and there are many cases of multiple parents.
1818
WHAT makes a diagram wonderful?
You can follow parent/child paths with your eyes.
You can get a feeling for everything a concept is connected to with one look.
You can see multiple parents and paths with one look.
You can see global features (short and bushy versus tall and sparse, or (gasp) tall and bushy).
1919
What makes Indented Text Wonderful?
Indentation expresses parenthood elegantly.
There are no lines crossing. You don’t need a layout algorithm. There is a linear order in which to study
text.
2020
The Idea of a Hybrid Display (cont.)
Keep the best features of text and the best features of diagrams.
Maintain relative positions between the focus concept and its children, parents, etc.
Eliminate clutter of arrows.
2121
A Hybrid Diagram/Form Display of a Neighborhood
Children
Focus ConceptSynonyms Relationships
Parents
22
Important Auditing Principles
If a concept C has a combination of semantic types assigned, and very few other concepts C1…Cn (n < 6) have that same combination assigned, then C and C1…Cn are suspicious concepts.
We call this “a small intersection.” Group-based auditing: Audit sets of similar
concepts. Y. Chen, H. Gu, Y. Perl, J. Geller, and M. Halper. Structural group
auditing of a UMLS semantic type’s extent. J Biomed Inform, 2007. Accepted for publication.
2323
Current State of the NAT: Serving the Auditor
The Neighborhood Auditing Tool has been implemented to fully support display of neighborhoods.
Navigation to “adjacent neighborhoods” is easy.
Additional features listed before have been implemented.
2424
Demonstration of NAT Features
Neighborhood Relationships Siblings Grandparents and
grandchildren Synonyms Focus concept definition Focus concept sources Semantic Type display Semantic Type definition
Semantic Network (indented)
Semantic Network (diagram)
Display Options Navigation Search Viewing History UMLS version
offline version
2525
Audit Example
An algorithm determined that the concept Antonospora locustae was likely assigned incorrect semantic types.
We follow an auditor’s review of this concept using the data from 2007AA.
offline version
26
Preliminary Evaluation Study with NAT
Compare paper-based auditing and NAT-based auditing.
Counterbalanced groups. Recall improves with NAT use. Auditors
seem willing to investigate more concepts. Precision stays the same. Auditors’ mental
process does not improve (?).
2727
Planned State of the NAT:Guiding the Auditor by Finding
(i.e. Computing) Audit Sets As noted before, errors are likely in small
intersections. Planned new version of the NAT will compute
and display small intersections. Errors are clearly visible in small groups of
supposedly similar concepts. Planned new version of the NAT will compute
small groups of supposedly similar concepts.
2828
2929
Finding Successively Smaller Groups of Concepts
Finding Audit sets by selecting:
1. Concepts with same semantic type.
2. Concepts with 1. and same root.
3. Concepts with 1. and 2. that have the same relationships.
30
A
B
C D
E
LEGEND
concept
PAR/CHD relationship
area
EXTENT OF A SEMANTIC TYPE
Area A
Area B
Area C Area D
Area E
Other relationship
r1 r3
r'3
r2
r4
3131
Audit Set Examples Example A A selection of concepts in
the intersection of Manufactured Object + Organization under the root School (environment).
Example B All concepts that are in a non-chemical intersection with an extent size less than five.
3232
Possible Auditor’s Recommendations (see Pg. 7)
Mark concept as reviewed and correct. Mark semantic types that should be
removed. Mark semantic types that should be
added. Mark other kinds of errors. Attach notes to a reviewed concept.
3333
3434
Conclusions and Future Work
Preliminary study showed that people are more successful finding errors with NAT than with paper sources.
Recall improved with the NAT, precision did not.
NAT seems to nicely complement use of the UMLSKS.
3535
Conclusions and Future Work (cont.)
This year, work with more human subjects to quantify these observations.
Integration of algorithms for finding audit sets with NAT.By extent sizeUsing roots, and relationship patterns
within extents.
36
3737
38
Auditor
Errors Recall Precision F
with NAT
w/o NAT
with NAT
w/o NAT
with NAT
w/o NAT
with NAT
w/o NAT
1 57 45 0.97 0.82 0.53 0.51 0.86 0.63
2 22 20 0.43 0.35 0.55 0.55 0.48 0.43
3 39 34 0.64 0.58 0.46 0.53 0.54 0.55
4 56 44 0.55 0.54 0.30 0.34 0.39 0.42
Avg. 44 36 0.65 0.57 0.46 0.48 0.57 0.51
Preliminary Evaluation Study
39
Improved Recall
The auditor finds it easy to search for more errors in the neighborhood of the suspicious concept.
With better recall and the same precision you still find more errors.
4040
Auditing Demonstration
The concept Antonospora locustae was selected for audit by an algorithm that found it was the only concept assigned to the intersection Fungus + Invertebrate in the UMLS 2007AA.
4141
4242
4343
4444
45
4646
4747
4848
4949
50
51
52
5353
NAT Features Demonstration
54
Neighborhood
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74