semi-automatic indexing of full text biomedical articles washington d.c. october 25, 2005 clifford...
TRANSCRIPT
![Page 1: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/1.jpg)
Semi-Automatic Indexing of Full Text Biomedical Articles
Washington D.C. October 25, 2005
Clifford W. GayClifford W. Gay
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA
![Page 2: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/2.jpg)
2 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
AcknowledgmentsAcknowledgments
Alan R. Aronson, PhD.Alan R. Aronson, PhD.
Mehmet Kayaalp, M.D., PhD.Mehmet Kayaalp, M.D., PhD.
![Page 3: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/3.jpg)
3 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
OutlineOutline
IntroductionIntroduction The System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI) The Data: Online biomedical journalsThe Data: Online biomedical journals The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text
ResultsResults Observations on PubMed Central articlesObservations on PubMed Central articles Model selection resultsModel selection results Recent workRecent work
![Page 4: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/4.jpg)
IntroductionThe System: Medical Text Indexer (MTI)
The Data: Online medical journalsThe Data: Online medical journals
The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text
ResultsResultsObservations on PubMed Central articlesObservations on PubMed Central articles
Model selection resultsModel selection results
Recent workRecent work
![Page 5: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/5.jpg)
5 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Why Semi-Automatic Indexing?Why Semi-Automatic Indexing?
U.S. National Library of Medicine indexes 5000 U.S. National Library of Medicine indexes 5000 journal titlesjournal titles Supports over 60 million PubMed searches each monthSupports over 60 million PubMed searches each month Has 130 indexersHas 130 indexers Indexed 570,000 articles in 2004Indexed 570,000 articles in 2004
Will need to index 1,000,000 very soonWill need to index 1,000,000 very soon Automated support is helping to meet this demandAutomated support is helping to meet this demand
– MTI was used on 26% of articles in 2004MTI was used on 26% of articles in 2004
More about MTIMore about MTI Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ.
The NLM Indexing Initiative's Medical Text Indexer. Medinfo. 2004; 11(Pt 1): 268-72. PMID: 15360816
![Page 6: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/6.jpg)
Title + Abstract et al.
Ordered list of MeSH Terms
MeSH Headings
UMLS Concepts
Postprocessing
Restrict to MeSH
TrigramPhrase
Matching
Rel. Cits.
PubMedRelated
Citations
ExtractMeSH
Phrasex
MetaMap
Phrases
Medical Text Indexer (MTI)Medical Text Indexer (MTI)
![Page 7: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/7.jpg)
7 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
DCMS with MTI SuggestionsDCMS with MTI Suggestions
![Page 8: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/8.jpg)
IntroductionThe System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI)
The Data: Online biomedical journals
The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text
ResultsResultsObservations on PubMed Central articlesObservations on PubMed Central articles
Model selection resultsModel selection results
Recent workRecent work
![Page 9: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/9.jpg)
9 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Why Full Text?Why Full Text?
Medical Text Indexer uses article title and abstractMedical Text Indexer uses article title and abstract HoweverHowever
Human indexers taught not to use abstractHuman indexers taught not to use abstract Author’s complete intent may not be in abstractAuthor’s complete intent may not be in abstract Check tags may only appear in a table or methods Check tags may only appear in a table or methods
section.section. If MTI indexes from full text articles it mayIf MTI indexes from full text articles it may
Find central concepts missing from abstractFind central concepts missing from abstract Identify terms when article has no abstract Identify terms when article has no abstract More accurately select check tagsMore accurately select check tags Be in better compliance with indexing policyBe in better compliance with indexing policy
![Page 10: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/10.jpg)
10 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Test Collection SelectionTest Collection Selection
Available online from PubMed CentralAvailable online from PubMed Central Consistent XML formatConsistent XML format
Identifies title, abstract, sections, tables, figures, Identifies title, abstract, sections, tables, figures, references, etc.references, etc.
500 articles from 17 diverse biomedical journals500 articles from 17 diverse biomedical journals Did not use: Did not use:
ReferencesReferences GraphicsGraphics MathMath
![Page 11: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/11.jpg)
11 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Test CollectionTest Collection
5 Clinical journals (165):5 Clinical journals (165): Breast Cancer Research (11)Breast Cancer Research (11) Journal of Clinical Microbiology (80)Journal of Clinical Microbiology (80)
3 Organization based journals (28):3 Organization based journals (28): Journal of American Medical Informatics Assoc. (10)Journal of American Medical Informatics Assoc. (10) Proceeding of the National Academy of Sciences (11)Proceeding of the National Academy of Sciences (11)
9 Journals in other categories:9 Journals in other categories: Pharmacology (65); Biochemistry (65); Plants (46); Pharmacology (65); Biochemistry (65); Plants (46);
Molecular Biology (45); Learning (30); Hospitals (22)Molecular Biology (45); Learning (30); Hospitals (22)
![Page 12: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/12.jpg)
IntroductionThe System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI)
The Data: Online medical journalsThe Data: Online medical journals
The Task: Emulate Medline indexing using full text
ResultsResultsObservations on PubMed Central articlesObservations on PubMed Central articles
Model selection resultsModel selection results
Recent workRecent work
![Page 13: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/13.jpg)
13 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Indexing TaskIndexing Task
Title + Abstract et al.
Ordered list of MeSH Terms
MeSH Headings
UMLS Concepts
Postprocessing
Restrict to MeSH
TrigramPhrase
Matching
Rel. Cits.
PubMedRelated
Citations
ExtractMeSH
Phrasex
MetaMap
Phrases
Title + Abstract et al.
Ordered list of MeSH Terms
MeSH Headings
UMLS Concepts
Postprocessing
Restrict to MeSH
TrigramPhrase
Matching
Rel. Cits.
PubMedRelated
Citations
ExtractMeSH
Phrasex
MetaMap
Phrases
![Page 14: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/14.jpg)
Medline IndexingMedline Indexingbeta-Lactamasesbeta-Lactamases
/*genetics /*metabolism /*genetics /*metabolism EnterobacteriaceaeEnterobacteriaceae/drug effects /drug effects
/*enzymology/genetics /*enzymology/genetics
PlasmidsPlasmids/*genetics /*genetics
Genes,Genes, BacterialBacterial/genetics /genetics
Genotype Genotype
Kinetics Kinetics
Microbial Sensitivity TestsMicrobial Sensitivity Tests
Molecular Sequence DataMolecular Sequence Data
Research Support, Non-U.S. Research Support, Non-U.S. Gov't Gov't
Example ArticleExample Article
• DNA Transposable DNA Transposable Elements Elements
• Escherichia coliEscherichia coli• Genes, BacterialGenes, Bacterial• Cloning, MolecularCloning, Molecular• Klebsiella pneumoniaeKlebsiella pneumoniae• Amino Acid SequenceAmino Acid Sequence• Microbial Sensitivity Microbial Sensitivity
TestsTests• CephalothinCephalothin• Proteus mirabilisProteus mirabilis• ErwiniaErwinia• Salmonella typhimuriumSalmonella typhimurium• Enterobacteriaceae Enterobacteriaceae
InfectionsInfections• LactamsLactams
• beta-Lactamasesbeta-Lactamases• PlasmidsPlasmids• EnterobacteriaceaeEnterobacteriaceae• beta-Lactam Resistancebeta-Lactam Resistance• Conjugation, GeneticConjugation, Genetic• Cephalosporin ResistanceCephalosporin Resistance• CefotaximeCefotaxime• Nucleotide SequencesNucleotide Sequences• Molecular Sequence DataMolecular Sequence Data• CephalosporinsCephalosporins• Chromosomes, BacterialChromosomes, Bacterial• DNA, BacterialDNA, Bacterial
MTI Indexing
•MMIMMI •RELREL •MMI & RELMMI & REL
Recall = 0.67 Precison = 0.24 F2 measure = 0.492
![Page 15: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/15.jpg)
15 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
EvaluationEvaluation
F2 Measure Measure Weighted harmonic mean of Recall and PrecisionWeighted harmonic mean of Recall and Precision Weights Recall twice as important as PrecisionWeights Recall twice as important as Precision Values: 0.0 to 1.0Values: 0.0 to 1.0
Computed for each article and averagedComputed for each article and averaged
![Page 16: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/16.jpg)
IntroductionIntroductionThe System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI)
The Data: Online medical journalsThe Data: Online medical journals
The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text
ResultsObservations on PubMed Central articles
Model selection resultsModel selection results
Recent workRecent work
![Page 17: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/17.jpg)
17 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Section Header ClassesSection Header Classes
Semantically equivalent section headersSemantically equivalent section headers MATERIALS AND METHODS class:
Materials and Method(s) Method(s) Scoring Methods Experimental Procedures Other Methods Tested
CAPTIONS class:CAPTIONS class: the titles and captions from tables and figuresthe titles and captions from tables and figures
![Page 18: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/18.jpg)
18 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Section ClassSection Class Average FAverage F22
CAPTIONSCAPTIONS 0.3175 0.3175
ABSTRACTABSTRACT 0.29600.2960
INTRODUCTIONINTRODUCTION 0.28690.2869
RESULTSRESULTS 0.27900.2790
DISCUSSIONDISCUSSION 0.27340.2734
NO HEADERNO HEADER 0.25740.2574
…… ……
CONCLUSIONS 0.1961
ABBREVIATIONSABBREVIATIONS 0.13040.1304
Section Class PerformanceSection Class Performance
![Page 19: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/19.jpg)
IntroductionIntroductionThe System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI)
The Data: Online medical journalsThe Data: Online medical journals
The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text
ResultsObservations on PubMed Central articlesObservations on PubMed Central articles
Model selection results
Recent workRecent work
![Page 20: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/20.jpg)
20 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
ExperimentsExperiments
Varied MTI components usedVaried MTI components used MetaMap Indexing (MMI)MetaMap Indexing (MMI) Related Citations (REL)Related Citations (REL)
Varied section classes processedVaried section classes processed Used model selectionUsed model selection Used binary weighting for sectionsUsed binary weighting for sections
A model is A model is A selection of section classes and A selection of section classes and The text in those sections The text in those sections That represents the articleThat represents the article
![Page 21: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/21.jpg)
21 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Production BaselineProduction Baseline
Title+Abstract
MMI
REL
F2 = 0.457
![Page 22: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/22.jpg)
22 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Naive ModeNaive Mode
Title+Abstract
MMI
REL
Materials and Methods
Results andDiscussion
No Header F2 = 0.453( - 0.9%)All Section Classes
![Page 23: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/23.jpg)
23 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
MetaMap Indexing ModeMetaMap Indexing Mode
Title+Abstract
MMI
REL
Introduction
Results
Discussion
Other
No Header F2 = 0.373(-18.4%)
Captions
![Page 24: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/24.jpg)
24 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Augmented ModeAugmented Mode
Title+Abstract
MMI
REL
Introduction
Results
Discussion
Other
No Header
F2 = 0.475(+3.9%)
Captions
![Page 25: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/25.jpg)
25 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Refined Augmented ModeRefined Augmented Mode
Title+Abstract
MMI
REL
Captions
Results
Background
F2 = 0.485(+ 6.1%)
![Page 26: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/26.jpg)
26 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Full MTI ModeFull MTI Mode
Title+Abstract
MMI
REL
Introduction
Results
Discussion
Other
No HeaderF2 = 0.488(+ 6.8%)MMI model
Captions
![Page 27: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/27.jpg)
27 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Refined Full MTI Refined Full MTI
Title+Abstract
MMI
REL
Results
Results andDiscussion
No Header F2 = 0.491(+ 7.4%)
Captions
Conclusions
![Page 28: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/28.jpg)
28 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
MTI Performance SummaryMTI Performance Summary
Indexing ModelIndexing ModelRecallRecall PrecisionPrecision
Avg. FAvg. F22
Production Baseline (Ti, Ab)Production Baseline (Ti, Ab) 0.530.53 0.320.32 0.4570.457
Naive Mode (full text)Naive Mode (full text) 0.570.57 0.270.27 0.4530.453
Augmented Mode Augmented Mode (MMI + REL (Ti, Ab))(MMI + REL (Ti, Ab))
0.590.59 0.290.29 0.4750.475
Augmented Mode (refined)Augmented Mode (refined) 0.600.60 0.300.30 0.4850.485
Full MTI (MMI + REL Full MTI (MMI + REL common sections)common sections)
0.600.60 0.300.30 0.4880.488
Full MTI (refined)Full MTI (refined) 0.600.60 0.310.31 0.4910.491
![Page 29: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/29.jpg)
IntroductionIntroductionThe System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI)
The Data: Online medical journalsThe Data: Online medical journals
The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text
ResultsObservations on PubMed Central articlesObservations on PubMed Central articles
Model selection resultsModel selection results
Recent work
![Page 30: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/30.jpg)
30 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Improvement PotentialImprovement Potential
With current modelWith current model No cut off at 25 terms yields No cut off at 25 terms yields
maximum recall of 0.79maximum recall of 0.79
If all good terms prioritized correctlyIf all good terms prioritized correctly F2 = 0.64 Improvement over baseline
7% 40%
![Page 31: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/31.jpg)
31 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Increase REL CitationsIncrease REL Citations
MTI currently uses 10 Related CitationsMTI currently uses 10 Related Citations
Optimal number for full text articles is 15Optimal number for full text articles is 15
Best model confirmed for this settingBest model confirmed for this setting
Additional Improvement in FAdditional Improvement in F22 = 0.01 = 0.01
![Page 32: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/32.jpg)
32 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
SummarizationSummarization
Selecting important text before MTI processingSelecting important text before MTI processing Using Yeh, Ke, Yang, Meng approachUsing Yeh, Ke, Yang, Meng approach Combines Combines
Latent Semantic Analysis and Latent Semantic Analysis and Salton’s Text Relationship MapSalton’s Text Relationship Map
Start with current modelStart with current model Document representation includesDocument representation includes
Bag of wordsBag of words MetaMap identified conceptsMetaMap identified concepts
![Page 33: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/33.jpg)
NLM Indexing Initiative
Clifford W. GayClifford W. Gay
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA
Contact:Contact:Web:Web:
[email protected]@nlm.nih.govii.nlm.nih.gov/fulltext.shtmlii.nlm.nih.gov/fulltext.shtml
![Page 34: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/34.jpg)
34 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
NONE SectionsNONE Sections
Most appear in articles that have no abstract Most appear in articles that have no abstract 20/2320/23
Some are errorsSome are errors 4 have “Introduction” header in publisher version4 have “Introduction” header in publisher version 2 appear within other sections with headers.2 appear within other sections with headers.
Many contain the primary text of the articleMany contain the primary text of the article Comments, Editorials, Letters (11/23)Comments, Editorials, Letters (11/23)
![Page 35: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/35.jpg)
35 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Other SectionsOther Sections
Other section class has 525 sections (16%)Other section class has 525 sections (16%) Non-standard article organizationNon-standard article organization
Common in Review articlesCommon in Review articles
ExampleExample ß-Lactamases of ß-Lactamases of Kluyvera ascorbataKluyvera ascorbata, Probable Progenitors of , Probable Progenitors of
Some Plasmid-Encoded CTX-M Types Some Plasmid-Encoded CTX-M Types Bacterial strains.Bacterial strains. Antimicrobial agents and susceptibility testing.Antimicrobial agents and susceptibility testing. Kinetic and IEF analyses.Kinetic and IEF analyses. Genetic characterization of Genetic characterization of blablaKLUA.KLUA. Genetic environment of Genetic environment of blablaKLUA-1.KLUA-1. Arguments for mobilization of chromosomal Arguments for mobilization of chromosomal blablaKLUA gene.KLUA gene.
![Page 36: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/36.jpg)
36 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Ranking FunctionRanking Function
Made ranking function for Related Citations more Made ranking function for Related Citations more like MetaMap Indexing.like MetaMap Indexing.
Resulted in a more inclusive modelResulted in a more inclusive model Materials and MethodsMaterials and Methods IntroductionIntroduction
F2 measure = 0.4865F2 measure = 0.4865
![Page 37: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/37.jpg)
37 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Tuning Path WeightTuning Path Weight
Ratio of weights between the two indexing pathsRatio of weights between the two indexing paths MetaMap Indexing – 7MetaMap Indexing – 7 Related Citations – 2Related Citations – 2
No improvement possibleNo improvement possible
![Page 38: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649e985503460f94b9b5fd/html5/thumbnails/38.jpg)
38 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Partial Weight for Singleton HeadersPartial Weight for Singleton Headers
OTHER section classOTHER section class Header is uniqueHeader is unique Contain content termsContain content terms
Gave section class weight between 0 and 1Gave section class weight between 0 and 1 Some recall improvementSome recall improvement No collection wide improvement in FNo collection wide improvement in F22