cartic ramakrishnan's dissertation defense
DESCRIPTION
TRANSCRIPT
![Page 1: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/1.jpg)
Extracting, representing and mining Semantic Metadata from text:Facilitating Knowledge Discovery in Biomedicine
Cartic Ramakrishnan
Advisor:
Dr. Amit Sheth
Committee Members: Dr. Michael RaymerDr. Guozhu DongDr. Thaddeus TarpeyDr. Vasant HonavarDr. Shaojun Wang
![Page 2: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/2.jpg)
2
Overview
• Define Knowledge Discovery• Contributions
– Text Mining– Knowledge Discovery
• Past work• Future Work
![Page 3: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/3.jpg)
3
What Knowledge Discovery is NOT
Search– Keyword-in-document-
out – Keywords are fully
specified features of expected outcome
– Searching for prospective mining sites
Mining – Know where to look– Underspecified
characteristics of what is sought are available
– Patterns
![Page 4: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/4.jpg)
4
What is knowledge discovery?
“knowledge discovery is more like sifting through a warehouse filled with small gears, levers, etc., none of which is particularly valuable by itself. After appropriate assembly, however, a Rolex watch emerges from the disparate parts.” – James Caruther
“discovery is often described as more opportunistic search in a less well-defined space, leading to a psychological element of surprise” – James Buchanan
Opportunistic search over an ill-defined space leading to surprising but useful emergent knowledge
![Page 5: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/5.jpg)
5
Element of surprise – Swanson’s discoveries
MagnesiumMigraine
PubMed
?Stress
Spreading Cortical Depression
Calcium Channel Blockers
Swanson’s Discoveries
Associations Discovered based on keyword searches followed by manually analysis of text to establish possible relevant relationships
11 possible associations found
![Page 6: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/6.jpg)
6
Knowledge Discovery in AI-The robot scientistPlanned search over an well-defined (axiomatic) space
leading to knowledge discovery.
Knowledge discovery by humans is done in non-axiomatic ill-defined spaces over multi-modal data.
Scientific literature is ill-defined and loosely structured source of data used in scientific investigations.
Assigning structure and interpretation to text (Semantics)Syntax Structure Semantics
![Page 7: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/7.jpg)
7
Knowledge Discovery = Extraction + Heuristic Aggregation
Leonardo Da Vinci
The Da Vinci code
The Louvre
Victor Hugo
The Vitruvian man
Santa Maria delle Grazie
Et in Arcadia EgoHoly Blood, Holy Grail
Harry Potter
The Last Supper
Nicolas Poussin
Priory of Sion
The Hunchback of Notre Dame
The Mona Lisa
Nicolas Flammel
painted_by
painted_by
painted_by
painted_by
member_of
member_of
member_of
written_by
mentioned_in
mentioned_in
displayed_at
displayed_at
cryptic_motto_of
displayed_at
mentioned_in
mentioned_in
Undiscovered Public Knowledge
![Page 8: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/8.jpg)
8
Information Extraction & Text Mining
This MEK dependency was observed in BRAF mutant cells regardless of tissue lineage, and correlated with both downregulation of cyclin D1 protein expression and the induction of G1 arrest.
*MEK dependency ISA Dependency_on_an_Organic_chemical *BRAF mutant cells ISA Cell_type*downregulation of cyclin D1 protein expression ISA Biological_process*tissue lineage ISA Biological_concept*induction of G1 arrest ISA Biological_process
Information Extraction = segmentation+classification+association+mining
Text mining = entity identification+named relationship extraction+discovering association chains….
Segmentation
Classification
Named Relationship ExtractionMEK dependency
observed in
BRAF mutant cells
downregulation of cyclin D1 protein expression
correlated with
induction of G1 arrest
correlated with
![Page 9: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/9.jpg)
9
Overview
• Define Knowledge Discovery• Contributions
– Text Mining• Compound Entities • Complex relationships
– Knowledge Discovery• Subgraph discovery
• Past work• Future Work
![Page 10: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/10.jpg)
10
Knowledge Discovery over text
Text
Extraction of Semantics from text
Semantic Metadata Guided
Knowledge Explorations
Assigning interpretation to text
Semantic Metadata Guided
Knowledge Discovery
Triple-basedSemantic
Search
Semanticbrowser
Subgraphdiscovery
Semantic metadata in the form ofsemi-structured data
![Page 11: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/11.jpg)
Ontology-enabled Information Extraction
Cartic Ramakrishnan, Krys Kochut, Amit P. Sheth: A Framework for Schema-Driven Relationship Discovery from Unstructured Text. International Semantic Web Conference 2006: 583-596
![Page 12: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/12.jpg)
12
Comparison with standard IE
Standard IE– Simple entities– Typically supervised pattern-based– Output not structured
• simple tags – Type specific patterns to train– No focus on semantics
Our approach– Compound and modified entities– Unsupervised– Output structured to support knowledge discovery– Not restricted to specific entity types– Assign semantic interpretations to sentence
[U.S. ORG] general [David Petraeus PER ] heads for [Baghdad LOC ] .
Token POS Chunk Tag---------------------------------------------------------U.S. NNP I-NP I-ORGgeneral NN I-NP O David NNP I-NP B-PER Petraeus NNP I-NP I-PERheads VBZ I-VP O for IN I-PP O Baghdad NNP I-NP I-LOC . . O O
![Page 13: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/13.jpg)
13
Information Extraction via Ontology assisted text mining – Relationship extraction
Biologically active substance
LipidDisease or Syndrome
affects
causes
affectscauses
complicates
Fish Oils Raynaud’s Disease???????
instance_of instance_of
UMLS Semantic Network
MeSH
PubMed9284 documents
4733 documents
5 documents
![Page 14: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/14.jpg)
14
Background knowledge and Data used
UMLS – A high level schema of the biomedical domain– 136 classes and 49 relationships– Synonyms of all relationship – using variant
lookup (tools from NLM)– 49 relationship + their synonyms = ~350 verbs
MeSH – 22,000+ topics organized as a forest of 16 trees– Used to query PubMed
PubMed – Over 16 million abstract– Abstracts annotated with one or more MeSH
terms
![Page 15: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/15.jpg)
15
Method – Parse Sentences in PubMed
SS-Tagger (University of Tokyo)
SS-Parser (University of Tokyo)
(TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) (NN endometrium) ) ) ) ) ) )
• Entities (MeSH terms) in sentences occur in modified forms• “adenomatous” modifies “hyperplasia”• “An excessive endogenous or exogenous stimulation” modifies “estrogen”
• Entities can also occur as composites of 2 or more other entities• “adenomatous hyperplasia” and “endometrium” occur as “adenomatous hyperplasia of the endometrium”
![Page 16: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/16.jpg)
16
Method – Identify entities and relationships in Parse Tree
TOP
NP
VP
S
NPVBZ
induces
NPPP
NPINof
DTthe
NNendometrium
JJadenomatous
NNhyperplasia
NP PP
INby
NNestrogenDT
theJJ
excessive ADJP NNstimulation
JJendogenous
JJexogenous
CCor
MeSHIDD004967MeSHIDD006965 MeSHIDD004717
UMLS ID
T147
ModifiersModified entitiesComposite Entities
![Page 17: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/17.jpg)
17
Representation – Resulting RDF
ModifiersModified entitiesComposite Entities
estrogen
An excessive endogenous or
exogenous stimulation
modified_entity1composite_entity1
modified_entity2
adenomatous hyperplasia
endometrium
hasModifier
hasPart
induces
hasPart
hasPart
hasModifier
hasPart
![Page 18: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/18.jpg)
18
Preliminary Results
Swanson’s discoveries – Associations between Migraine and Magnesium [Hearst99]
• stress is associated with migraines • stress can lead to loss of magnesium • calcium channel blockers prevent some migraines • magnesium is a natural calcium channel blocker • spreading cortical depression (SCD) is implicated in some migraines • high levels of magnesium inhibit SCD • migraine patients have high platelet aggregability • magnesium can suppress platelet aggregability
Data sets generated using these entities (marked red above) as boolean keyword queries against pubmed
Bidirectional breadth-first search used to find paths in resulting RDF
![Page 19: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/19.jpg)
19
Paths between Migraine and Magnesium
Paths are considered interesting if they have one or more named relationshipOther than hasPart or hasModifiers in them
![Page 20: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/20.jpg)
20
An example of such a path
platelet(D001792)
collagen(D003094)
migraine(D008881)
magnesium(D008274)
me_3142by_a_primary_abnormality_of_platelet_behavior
me_2286_13%_and_17%_adp_and_collagen_induced_platelet_aggregation
caused_by
hasPart
hasPart
stimulated
stimulatedhasPart
CONCLUSIONRules over parse trees are able to extract structure from sentences
Our definition of compound and modified entities are critical for identifying both implicit and explicit relationships
Swanson’s discovery can be automated – if recall can be improved – what hurts recall?
![Page 21: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/21.jpg)
Interesting Observations from this preliminary work
![Page 22: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/22.jpg)
22
Observations – Sentence characteristics and Parsing
Sentence characteristics– Not all sentences have such “neat” parses –
• long convoluted sentences (some >100 words) – resulting in low recall
– Not all are semantically that straightforward • even seemingly simple ones are tricky to represent in computer
readable form
Parser issues– Constituency parses used give information about
• Nested phrasal containment• Our algorithm leverages this – entities found contiguous
– Dearth of features – parsers that provide richer features needed
(TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) (NN endometrium) ) ) ) ) ) )
![Page 23: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/23.jpg)
23
Observations – Complex entities with nesting and overlapping structureComplex Entities
– Chevy Chase, Chevy Chase bank building on 5th and 3rd
– Sentential forms of entities are often quite complex
• (e.g. Reactive oxygen intermediate-dependent NF-kappaB activation)
– Structurally and Semantically complex nested entities
• Human Immunodeficiency Virus Type-2 Enhancer Activity [[[Human Immunodeficiency Virusdisease] Type-2disease] Enhancer
Activitybiological_process]
• CD28 surface receptor[[CD28protein_molecule] surface receptorprotein_family_or_group]
![Page 24: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/24.jpg)
24
Possible Strategies
General strategy for dealing with complex sentences– Identify and extract complex entities across a given
corpus– Replace occurrences of all complex entities with single
word place holders– Re-parse the sentence to extract relationships
Tactic used – Use a feature rich parse like a dependency parse to segment
sentences into SubjPredObject– Subjects and Objects represent compound entities– Use corpus statistics to predict constituents of compound
entities
![Page 25: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/25.jpg)
Unsupervised Joint Extraction of Compound Entities and Relationship
Cartic Ramakrishnan, Pablo N. Mendes, Shaojun Wang and Amit P. Sheth "Unsupervised Discovery of Compound Entities for Relationship Extraction"EKAW 2008 - 16th International Conference on Knowledge Engineering and Knowledge Management Knowledge Patterns
![Page 26: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/26.jpg)
26
Joint Extraction approach
Dependency parse – Stanford Parser
governor
dependent
amod = adjectival modifiernsubjpass = nominal subject in passive voice
![Page 27: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/27.jpg)
27
Stanford Dependency Hierarchy
![Page 28: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/28.jpg)
28
Hierarchy used to generalize the rules
Small set of rules over dependency types dealing with
– modifiers (amod, nn) etc. subjects, objects (nsubj, nsubjpass) etc.
Since dependency types are arranged in a hierarchy
– We use this hierarchy to generalize the more specific rules
– There are only 4 rules in our current implementation
Carroll, J., G. Minnen and E. Briscoe (1999) `Corpus annotation for parser evaluation'. In Proceedings of the EACL-99 Post-Conference Workshop on Linguistically Interpreted Corpora, Bergen, Norway. 35-41. Also in Proceedings of the ATALA Workshop on Corpus Annotés pour la Syntaxe - Treebanks, Paris, France. 13-20.
![Page 29: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/29.jpg)
29
Algorithm
Relationship head
Subject head
Object head Object head
![Page 30: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/30.jpg)
30
Preliminary results
![Page 31: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/31.jpg)
31
Extracted Triples
![Page 32: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/32.jpg)
32
Analysis of compound entities
Errors – some sources– 4 rules therefore compound entities composed of
other compound connected by • Prepositions • Punctuations
– Verbs interpreted as nouns by the parser, wind up as part of entities
A Fix – Corpus statistics– Mutual Information
• Human Immunodeficiency Virus Type-2 Enhancer Activity [[[Human Immunodeficiency Virusdisease] Type-2disease] Enhancer
Activitybiological_process]
![Page 33: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/33.jpg)
33
Predicting the constituents to compound entitiesGiven compound entities
– Predict which token subsequences are entities– Identify their semantic type in UMLS
Central idea in constituent prediction– A token sequence is likely to form an entity if
that sequence occurs often across a given corpus– But mere co-occurrence does not work
• Illeal Neoplasm vs. Neoplasm of the Illeum
– Instead we use dependency co-occurence
![Page 34: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/34.jpg)
34
What is Mutual information?
A measure for discovering interesting word collocations– information that two random variables share: it
measures how much knowing one of these variables reduces our uncertainty about the other
![Page 35: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/35.jpg)
35
Dependency-based mutual information
Collecting dependencies– We parse 800,000 sentences using the Stanford parser– Index all dependencies using a Lucene index
Advantage– Capture long range dependencies between words– Adjacency not required
rel(wi,wj) rel(wj,wi)
rel(wi,*) rel(*, wi)
![Page 36: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/36.jpg)
36
Predicting constituents
Greedy mutual information based word grouping used to predict constituents– Given a sequence of tokens as input– Compute dependency based mutual information
for all token pairs– Starting at each token in turn attach all other
tokens to it that increase the average mutual information of the token group so far
Variants of this algorithm– Compute the average dependency-based mutual
information across the corpus – use that as the threshold
![Page 37: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/37.jpg)
37
Results
Results from the BioInfer corpus• Compound entities found• Constituent entities predicted • Triples found
Results from the OMIM corpus• Compound entities found• Triples found
![Page 38: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/38.jpg)
38
Evaluations
Problems with automatic evaluations using gold standard– Meant for specific types of entities– Mark entity mentions not compound concepts
Manual Evaluations– Expensive – Requires domain expert
Our Solution– Build a tool to compare S-P-O triple with the
original sentence – Allow evaluator to assess “correctness”
![Page 39: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/39.jpg)
39
Evaluation
Manual Evaluation– Test if the RDF conveys same “meaning” as the
sentence– Juxtapose the triple with the sentence– Allow user to assess correctness/incorrectness of
the subject, object and triple
![Page 40: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/40.jpg)
40
Demo of Evaluation tool
Dataset & Result characteristics– OMIM 90,000 sentences obtained from 1248
records returned for keyword “renal”– 328 Mb of RDF generated
• Containing 155K triples• 4045 entities are related to UMLS classes• 126 of the 136 classes in UMLS are instantiated
– Demo
![Page 41: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/41.jpg)
41
Evaluation conducted using this tool
Evaluation on OMIM RDF– 1938 triple-sentence
comparisons– 2 evaluators (not domain
experts)– Currently system in use
by domain experts at• CCHMC (2 experts)• Awaiting results
![Page 42: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/42.jpg)
42
Results of Manual Evaluation
![Page 43: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/43.jpg)
43
Applications
Triple-based semantic searchSemantic Browser
![Page 44: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/44.jpg)
Semantic Metadata Guided Knowledge Explorations and Discovery
![Page 45: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/45.jpg)
45
Supporting Knowledge Discovery
estrogen
An excessive endogenous or
exogenous stimulation
modified_entity1composite_entity1
modified_entity2
adenomatous hyperplasia
endometrium
hasModifier
hasPart
induces
hasPart
hasPart
hasModifier
hasPart
![Page 46: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/46.jpg)
46
Discovery complex connection patterns
Discovering informative subgraphs (Harry Potter)– Given a pair of end-points (entities) – Produce a subgraph with relationships connecting them such
that• The subgraph is small enough to be visualized• And contains relevant “interesting” connections
We defined an interestingness measure based on the ontology schema – In future biomedical domain scientists will control this with the
help of a browsable ontology– Our interestingness measure takes into account
• Specificity of the relationships and entity classes involved• Rarity of relationships etc.
Cartic Ramakrishnan, William H. Milnor, Matthew Perry, Amit P. Sheth: Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Explorations 7(2): 56-63 (2005)
![Page 47: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/47.jpg)
47
Schema-driven edge weight assignment
company
Entertainmentcompany
Manufacturingcompany
Oilcompany
Automotivecompany
Electronicscompany
Sportinggoods
company
Ford Motors
Cartic’s Company
Schema
Instances
1.0
1.0
1.0
1.0 1.00.67
0.33<0.5
1.0 0.33
![Page 48: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/48.jpg)
48
Heuristics used to bias edge weightsTwo factor influencing interestingness
![Page 49: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/49.jpg)
49
Algorithm
• Bidirectional lock-step growth from S and T• Choice of next node based on interestingness measure• Stop when there are enough connections between the frontiers• This is treated as the candidate graph
![Page 50: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/50.jpg)
50
Algorithm
Model the Candidate graph as an electrical circuit– S is the source and T the sink– Edge weight are treated as conductance values– Using Ohm’s and Kirchoff’s laws
• find maximum current flow paths through the candidate graph from S to T
– At each step adding this path to the output graph to be displayed we repeat this process till a certain number of predefined nodes is reached
![Page 51: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/51.jpg)
51
Results
![Page 52: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/52.jpg)
52
Overview
• Define Knowledge Discovery• Contributions
– Text Mining– Knowledge Discovery
• Past work• Future Work
![Page 53: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/53.jpg)
53
Past work
Automatic hierarchy creation from Text– Input text– Output topic hierarchy
Ranking complex relationships in RDF graphs– Ranking paths in RDF graphs
Conflict-of-interest detection– Identifying COI in peer review process
![Page 54: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/54.jpg)
54
Other major papers influencing my work
Position papers– Three types of semantics
– Survey of Semantic Web technologies
– Relationship Web
![Page 55: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/55.jpg)
55
Overview
• Define Knowledge Discovery• Contributions
– Text Mining– Knowledge Discovery
• Past work– Taxonomy construction– Ranking complex relationships in RDF graphs– Conflict of Interest detection
• Current & Future Work
![Page 56: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/56.jpg)
56
Hypothesis Driven retrieval of Scientific Literature
PubMed
Complex Query
SupportingDocument setsretrieved
Migraine
Stress
Patient
affects
isaMagnesium
Calcium Channel Blockers
inhibit
Keyword query: Migraine[MH] + Magnesium[MH]
![Page 57: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/57.jpg)
57
Strength of a connection
Associating a confidence value with extracted relationships– Information loss in the extraction process– Temporal aspects – Bibliometrics
• Venue impact • Author expertise
– Multiple schemas
![Page 58: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/58.jpg)
58
Mechanistic Models
![Page 59: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/59.jpg)
59
Publications• Conference Publications:
– “Ontology Learning via Unsupervised Joint Entity and Relationship Extraction” Manuscript under preparation WWW2009
– “Semantic Search over Biomedical Literature” Manuscript under preparation WWW2009
– “Joint Extraction of Compound Entities and Relationships to support Semantic Browsing over Biomedical Literature” Cartic Ramakrishnan, Pablo N. Mendes, Rodrigo A.T.S da Gama, Guilherme C. N. Ferreira & Amit P. Sheth Under Review
– "Unsupervised Discovery of Compound Entities for Relationship Extraction" Cartic Ramakrishnan, Pablo N. Mendes, Shaojun Wang and Amit P. Sheth EKAW 2008 - 16th International Conference on Knowledge Engineering and Knowledge Management Knowledge Patterns
– Cartic Ramakrishnan, Krys Kochut, Amit P. Sheth: A Framework for Schema-Driven Relationship Discovery from Unstructured Text. International Semantic Web Conference 2006: 583-596
– Boanerges Aleman-Meza, Meenakshi Nagarajan, Cartic Ramakrishnan, Amit Sheth, Budak Arpinar, Li Ding, Pranam Kolari, Anupam Joshi, and Tim Finin, "Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection", WWW 2006: 407-416 Edinburgh, Scotland, May 2006.
– C. Ramakrishnan et al. Dairy producer's Assistant: A Web-based Expert System for Dairy Producers. In 39th Annual Southeastern Regional ACM Conference, Athens, GA, Mar 16-17, 2001.
– A.L. Bahuman, J. Li, W.D. Potter, C. Ramakrishnan, J.N. Rushton. KublAI: An Intelligent Agent for Microsoft's Age of Kings. Extended abstract of a work in progress. The Artificial Intelligence Center, University of Georgia, Athens, GA. In 39th Annual Southeastern Regional ACM Conference, Athens, GA, Mar 16-17, 2001.
![Page 60: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/60.jpg)
60
Publications
• Journal Publications:– C. Ramakrishnan, W. H. Milnor, M. Perry and A. P. Sheth "Discovering
Informative Connection Subgraphs in Multi-relational Graphs". In SIGKDD Explorations 7(2): 56-63 (2005)
– Boanerges Aleman-Meza, Christian Halaschek-Wiener, Ismailcem Budak Arpinar, Cartic Ramakrishnan, Amit P. Sheth: "Ranking Complex Relationships on the Semantic Web." IEEE Internet Computing 9(3): 37-44 (2005)
– V. Kashyap, C. Ramakrishnan, C. Thomas and A. Sheth "TaxaMiner: An Experimental Framework for Automated Taxonomy Bootstrapping." In International Journal of Web and Grid Services, Special Issue on Semantic Web and Mining Reasoning, September 2005.
– A. Sheth, B. Aleman-Meza, I. B. Arpinar, C. Halaschek, C. Ramakrishnan, C. Bertram, Y. Warke, K. Anyanwu, D. Avant, F. S. Arpinar, and K. Kochut. "Semantic Association Identification and Knowledge Discovery for National Security Applications." In Journal of Database Management, 16(1), 33-53, Jan-March 2005, Eds: L. Zhou and W. Kim.
– Amit P. Sheth, Cartic Ramakrishnan, Christopher Thomas: Semantics for the Semantic Web: "The Implicit, the Formal and the Powerful." In Int. J. Semantic Web Inf. Syst. 1(1): 1-18 (2005).
– Amit P. Sheth, Cartic Ramakrishnan: "Semantic (Web) Technology In Action: Ontology Driven Information Systems for Search, Integration and Analysis." In IEEE Data Eng. Bull. 26(4): 40-48 (2003).
![Page 61: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/61.jpg)
61
Publications• Workshop Publications
– Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibañez, Ismailcem Budak Arpinar, Amit P. Sheth: Peer-to-Peer Discovery of Semantic Associations. P2PKM 2005
• Book Chapters – B. Arpinar, A. Sheth, C. Ramakrishnan, L. Usery, M. Azami & M. Kwan, Geospatial
Ontology Development and Semantic Analytics. In Handbook of Geographic Information Science, Eds: J. P. Wilson and A. S. Fotheringham, Blackwell Publishing.
– “Semantics for the Semantic Web: The Implicit, the Formal, and the Powerful” A. Sheth C.Ramakrishnan, C. Thomas (Chapter 2.8 in Online and Distance Learning: Concepts, Methodologies, Tools and Applications, L. Tomei, Ed., Information Science Reference (an imprint of IGI Global), 2008).
• Posters – V. Kashyap, C. Ramakrishnan, T.C. Rindflesch Towards (Semi-)automatic
Generation of Bio-medical ontologies. In AMIA 2003 Annual Symposium on Biomedical and Health Informatics
– Cartic Ramakrishnan, Pablo N. Mendes and Amit P. Sheth “Ontology-driven data capture, representation, retrieval, and mining - Facilitating Knowledge Discovery in Biomedicine” Ohio Collaborative Conference on Bioinformatics (OCCBIO) 2008
• Tutorials– Text Analytics for Semantic Computing - the good, the bad and the ugly
Instructors: Cartic Ramakrishnan, Meenakshi Nagarajan and Amit Sheth
![Page 62: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/62.jpg)
62
Experiences
Teaching & Mentoring– TA for Semantic Web Fall 2003 and Semantic Enterprise
Fall 2004 @UGA• Helped Dr. Sheth with Course design and grading• Guided graduate students in open-ended course projects
– Mentored interns @ Kno.e.sis Summer 2007 and Summer 2008
Collaborations & Internships– Interned @
• NLM, NIH Summer 2002, Summer 2004 – Dr. Vipul Kashyap• IBM Almaden Summer 2006 – Dr. Tanveer Syeda-Mahmood
– Research collaborations with • CCHMC – Dr. Bruce Aronow• AFRL – Dr. Victor Chan
Grant Writing– NSF CISE grant December 2006 with Dr. Sheth & Dr. Dong– NSF Cyber-discovery Initiative December 2007 Dr. Sheth &
Dr. Bruce Aronow
![Page 63: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/63.jpg)
63
Acknowledgements
Special Thanks to Pablo N. Mendes, Christopher Thomas, Meena
Nagarajan, Karthik Gomadam, Ajith Ranabahu, Dr. Shaojun Wang, Dr. Raymer
Thanks to all other Kno.e.sis members and our Summer 2008 interns Rodrigo Gama, Guilherme De Napoli, Kamal Baid, Hemant Purohit
Special thanks to Dr. Amit Sheth and my committee members
![Page 64: Cartic Ramakrishnan's dissertation defense](https://reader038.vdocuments.site/reader038/viewer/2022103111/54c6de934a79592a2f8b4579/html5/thumbnails/64.jpg)
64
On a lighter note