ontology learning from swedish text

Upload: jdbothma

Post on 03-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Ontology Learning From Swedish Text

    1/29

    Ontology learning from

    Swedish textJan Bothma

    Supervisor: Eva Blomqvist

  • 7/29/2019 Ontology Learning From Swedish Text

    2/29

    What is the problem?

    Computer support implemented manually

    Problem Goals State of the art Contribution Recap

  • 7/29/2019 Ontology Learning From Swedish Text

    3/29

    Problem Goals State of the art Contribution Recap

    Berners-Lee, Tim et al (May 1, 2001)."The Semantic Web". Scientific American

  • 7/29/2019 Ontology Learning From Swedish Text

    4/29

    Problem Goals State of the art Contribution Recap

    Berners-Lee, Tim et al (May 1, 2001)."The Semantic Web". Scientific American

  • 7/29/2019 Ontology Learning From Swedish Text

    5/29

    What is the problem?

    Computer support implemented manually Encode semantics

    Problem Goals State of the art Contribution Recap

  • 7/29/2019 Ontology Learning From Swedish Text

    6/29

    What is the problem?

    Computer support implemented manually Encode semantics

    Problem Goals State of the art Contribution Recap

    Encoding semantics is time consuming

    Domain Expert knowledge Encoding knowledge a challenge in itself Keeping current

  • 7/29/2019 Ontology Learning From Swedish Text

    7/29

    What is the problem?

    Computer support implemented manually Encode semantics

    Encoding semantics is time consuming

    Problem Goals State of the art Contribution Recap

    One approach: Ontology learning Ontology learning from text

    NLP, Computational Linguistics Information Extraction, Data mining, etc

    Ontology learning from Swedish text

    Problem is to encode domain semantics

    from Swedish text

  • 7/29/2019 Ontology Learning From Swedish Text

    8/29

    What is the problem?

    Problem Goals State of the art Contribution Recap

  • 7/29/2019 Ontology Learning From Swedish Text

    9/29

    Motivation and goals

    Prototype system for OL from Swedish Much research in each subtask How does it fit together?

    Problem Goals State of the art Contribution Recap

    Investigate OL from Swedish text Identify relevant tools Target further research

  • 7/29/2019 Ontology Learning From Swedish Text

    10/29

    Motivation and goals

    Specifically Preprocess natural language Swedish corpus Extract evidence for important

    Concepts Subclass relations

    Non-taxonomic relations Open domain Semi-supervised

    Problem Goals State of the art Contribution Recap

    Investigate OL from Swedish text Prototype system for OL from Swedish

  • 7/29/2019 Ontology Learning From Swedish Text

    11/29

    Relevant state of the art

    Problem Goals State of the art

    Preprocess Concepts

    Taxonomy Relations OL Systems

    Contribution Recap

  • 7/29/2019 Ontology Learning From Swedish Text

    12/29

    Relevant state of the art

    Problem Goals State of the art

    Wilson Wong, Wei Liu, and Mohammed Bennamoun. 2012. Ontology learning from

    text: A look back and into the future.ACM Comput. Surv. 44, 4, Article 20(September 2012), 36 pages.

  • 7/29/2019 Ontology Learning From Swedish Text

    13/29

    Relevant state of the art

    Part of Speech tagging

    Compound splitting

    Stemming/Lemmatisation

    Syntactic analysis Chunking

    Syntactic Dependencies

    Word sense disambiguation Named entity recognition

    Coreference resolution

    Problem Goals State of the art

    Preprocess Concepts

    Taxonomy Relations OL Systems

    Contribution Recap

  • 7/29/2019 Ontology Learning From Swedish Text

    14/29

    Relevant state of the art

    Parsers Stanford TnT HunPOS MaltParser

    Saldo Systems

    GATE NLTK Korp

    Corpus Workbench

    Problem Goals State of the art

    Preprocess Concepts

    Taxonomy Relations OL Systems

    Contribution Recap

  • 7/29/2019 Ontology Learning From Swedish Text

    15/29

    Relevant state of the art

    Lexo-syntactic Patterns

    TF-IDF

    C-Value/NC-Value

    PageRank, graph-based

    Markov Logic Network-based syntactic parse

    Problem Goals State of the art

    Preprocess Concepts

    Taxonomy Relations OL Systems

    Contribution Recap

  • 7/29/2019 Ontology Learning From Swedish Text

    16/29

    Relevant state of the art

    General taxonomies (e.g. WordNet)

    Lexico-syntactic Patterns

    Agglomerative Clustering

    Distributional similarity

    Formal Concept Analysis

    Markov Logic Network-based syntactic parse

    Problem Goals State of the art

    Preprocess Concepts

    Taxonomy Relations OL Systems

    Contribution Recap

  • 7/29/2019 Ontology Learning From Swedish Text

    17/29

    Relevant state of the art

    Association rule mining

    Lexico-syntactic patterns

    Graph theory

    Markov Logic Network-based syntactic parse

    Compound splitting/paraphrasing

    Problem Goals State of the art

    Preprocess Concepts

    Taxonomy Relations OL Systems

    Contribution Recap

    P bl

  • 7/29/2019 Ontology Learning From Swedish Text

    18/29

    Relevant state of the art

    Text2Onto

    OntoUSP

    OntoCMaps

    OntoGain

    Cross-language (Hjelm, Volk)

    Problem Goals State of the art

    Preprocess Concepts

    Taxonomy Relations OL Systems

    Contribution Recap

    P bl

  • 7/29/2019 Ontology Learning From Swedish Text

    19/29

    Key idea and contribution

    Problem Goals State of the art Contribution Recap

    Prototype ontology learning from Swedish text

    P bl

  • 7/29/2019 Ontology Learning From Swedish Text

    20/29

    Key idea and contribution

    Problem Goals State of the art Contribution

    P bl

  • 7/29/2019 Ontology Learning From Swedish Text

    21/29

    Preprocessing

    CRP

    tillhr

    en

    grupp

    proteiner

    ,

    s

    k

    pentraxiner

    , och

    utgr

    en

    del

    av

    det

    akuta

    inflammationssvaret

    vid

    t

    ex

    akuta

    infektionssjukdomar

    .

    Problem Goals State of the art Contribution

    Preprocess

    Recap

    Justkidding!

    P bl

  • 7/29/2019 Ontology Learning From Swedish Text

    22/29

    Key idea and contribution

    Korp POS Lemmatisation Semantic dependencies

    GATE Traverse annotations programmatically Pattern match over annotations (JAPE)

    Problem Goals State of the art Contribution

    Preprocess

    Concepts Taxonomy Relations Evaluation

    Recap

    Problem

  • 7/29/2019 Ontology Learning From Swedish Text

    23/29

    Key idea and contribution

    Noun Phrase Linguistic Filter Adj* Noun+

    C-value Reward log(|candidate|) Reward frequency of candidate Penalise occurrence as substring Reward independence from containing candidates

    Problem Goals State of the art Contribution

    Preprocess

    Concepts Taxonomy Relations Evaluation

    Recap

    Term Confidence

    kardiovaskulr sjukdom 1

    lkartidningen nr 0.8797681

    mm hg 0.6492587

    et al 0.6396211

    fysisk aktivitet 0.4494908

    typ 2-diabetes 0.413715

    kardiovaskulra hndelse 0.32784185

    hg blodtryck 0.32001647

    potentiell bindning 0.31713346

    Problem

  • 7/29/2019 Ontology Learning From Swedish Text

    24/29

    Key idea and contribution

    Hierarchical Agglomerative Clustering Similarity Measure

    Reward common phrase head Reward common words Penalise disjoint words

    Head = last noun in NP

    Problem Goals State of the art Contribution

    Preprocess

    Concepts Taxonomy Relations Evaluation

    Recap

    Blodtryck

    DiastoliskBlodtryck SystoliskBlodtryck HgBlodtryck

    Problem

  • 7/29/2019 Ontology Learning From Swedish Text

    25/29

    Key idea and contribution

    Hierarchical Agglomerative Clustering Lexico-syntactic pattern

    Supersuch assub1,sub2,and/orsub3

    Superssomsub1,sub2,och/ellersub3

    Problem Goals State of the art Contribution

    Preprocess

    Concepts Taxonomy Relations Evaluation

    Recap

    Breda ochjourtunga specialiteter, ssomallmnkirurgiochfamiljemedicin,

    JourtungaSpecialitet

    FamiljemedicinAlmnkirurgi

    Problem

  • 7/29/2019 Ontology Learning From Swedish Text

    26/29

    Key idea and contribution

    Problem Goals State of the art Contribution

    Preprocess

    Concepts Taxonomy Relations Ontology Evaluation

    Recap

    Syntactic dependencies

    Ultraljud visade trombos av vena portae

    root

    subject object

    Problem

  • 7/29/2019 Ontology Learning From Swedish Text

    27/29

    Evaluation

    Problem Goals State of the art Contribution

    Preprocess

    Concepts Taxonomy Relations Evaluation

    Recap

    Broad medical domain 1381 articles 12 + 8 hours

    MeSH C14 - Hjrt-krlsjukdomar 312 Articles 3 + 1 hours

    Candidates thousands of terms, hundreds of relations

    Useful evidence Hundreds of terms, dozens of relations Ranking helps

    Problem

  • 7/29/2019 Ontology Learning From Swedish Text

    28/29

    Future work

    Ranking relations Thorough evaluation Provenance

    User interaction, e.g. Corpus and rationale Ontology

    Extensibility Evidence combination

    Methods Cross-language

    Ontology Consistency and Reasoning

    Problem Goals State of the art Contribution

    Preprocess Concepts Taxonomy Relations Evaluation

    Recap

    Problem

  • 7/29/2019 Ontology Learning From Swedish Text

    29/29

    Recap

    Implementing computer support is manual

    Encode semantics

    Semi-supervised Ontology Learning

    Problem Goals State of the art Contribution Result Recap