automatic report generation from ontologies: the miakt approach kalina bontcheva, yorick wilks...

18
Automatic Report Generation from Ontologies: the MIAKT Approach Kalina Bontcheva, Yorick Wilks Department of Computer Science University of Sheffield

Upload: ernest-brown

Post on 16-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Automatic Report Generation from Ontologies: the MIAKT

Approach

Kalina Bontcheva, Yorick WilksDepartment of Computer Science

University of Sheffield

Rationale

• NLG takes as input structured data in a knowledge base or ontology and produces natural language text

• Applied to provide automatic documentation of ontologies or generate textual reports from formal knowledge

• Keeps texts constantly up-to-date so they reflect changes in the ontology

The MIAKT project • Medical Imaging and Advanced Knowledge

Technogies• Breast cancer• Triple assessment process

– Oncologist – clinical assessment– Hystopathologist – cytology– One or more radiologists – X-ray mammograms, MRI scans– Surgeon– Sometimes radiographer

• Types of images – Mammograms, MRI scans, ultrasound…

The MIAKT Demonstrator

Semantic Image Annotation

The Domain Ontology

Generation Service Input

Generation Service Output

Generation Architecture

Removing Repeating Triples• Based on the ontology – inverse properties• <daml:ObjectProperty rdf:about=

"file:/...#involved_in_ta"> <daml:inverseOf rdf:resource= "file:/...#involve_patient"/> …

• involved_in_ta(01401_patient, ta-soton-1069) involve_patient(ta-soton-1069, 01401_patient)

• More complex reasoning will be required to detect facts entailed by already said facts

Discourse Planning• Schemas – capture regular patterns in the

domain; can be applied recursively• Describe-Patient ->

Patient-Attributes,Describe-Procedures

• Patient-Attributes ->

[attribute(Patient, Attribute)],

Patient-Attributes *

The Property Hierarchy

• Special linguistically-motivated properties were introduced to make the NLG modules more generic: – active-action (e.g. involve_patient) – passive-action (e.g., involved_in_ta)– Attribute (e.g. has-age, has-size)– part-whole (e.g., consists-of)

• All properties from the ontology were made sub-properties of one of these 4

• More light-weight approach than having a complete linguistic ontology like GUM (Generalised Upper Model)

Ontology-Based Aggregation• Joining attribute and part-whole properties with

the same first argument to have more coherent sentences

• ATTR(Abnormality: 01401, Mass: 01401_mass)ATTR(Abnormality: 01401, Margin: i_m_microlob)ATTR(Abnormality: 01401, Shape: i_shape_round)ATTR(Abnormality: 01401, Diagnose: i_pr_malig)

• Without aggregation:The abnormality has a mass. The abnormality has a microlobulated margin. The abnormality has a round shape. The abnormality has a probably malignant assessment.

• With aggregation:The abnormality has a mass, a microlobulated margin, a round shape, and a probably …

Surface Realisation• The input is an RDF statement and the

concept which is going to be the subject of the sentence: ATTR(Abnormality: 01401, Mass: 01401_mass) + Abnormality: 1401

• ATTR and PART_OF relations are handled already by an existing realiser (HYLITE) which treats the RDF as a graph and finds a path through it, starting from the focused concept

• Active and passive action properties are mapped to semantic roles like OBJ, PTNT, AGNT

• AGNT(Mammography: 01402, PRODUCE_RESULT)OBJ(PRODUCE_RESULT, Med_Image: 01402_left_cc)

Domain Portability• Availability of lexical resources for the domain, e.g.

UMLS and SPECIALIST or a lexicalised ontology• The classification of the properties into the 4

linguistic ones – possible to do semi-automatically if there are good naming conventions

• The 4 linguistic properties may have to be extended to include others if the domain requires it

• The main effort will be in the text structuring patterns, which require significant understanding of the system in order to modify them

• Machine learning to induce text patterns from labelled examples

Conclusion

• Presented an approach for automatic generation of texts from ontologies

• MIAKT exploits information from the ontology in order to filter out repetitive information and group together similar facts

• Main contribution is in showing how NLG tools can be designed to be easily customisable by non-specialists (through GUI tools)

• New application: sekt.semanticweb.org

Further Info

• http://www.aktors.org/miakt/

• http://www.dcs.shef.ac.uk/~kalina/papers.html

• http://sekt.semanticweb.org

The MIAKT lexicon

• Currently contains 320+ terms lexicalising:– 76 concepts– 153 instances in the MIAKT ontology

• Created manually from:– BI-RADS and NHS documents– Online papers and Medline abstracts to verify

and enrich the term entries with synonyms