deli coling 2002 - w8: nlp & xml - sept. 1st, 2002 cascading xsl filters for content selection...

22
DELi COLING 2002 - W8: NLP & XML - Sept. 1st, 2002 Cascading XSL filters for content selection in multilingual document generation G. Barrutieta, J. Abaitua & J. Díaz 1 Cascading XSL filters for content selection in multilingual document generation G. Barrutieta, J. Abaitua & J. Díaz (DELi) COLING 2002 W8: NLP XML Sept. 1st, 2002

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 1

Cascading XSL filters for content selection in multilingual

document generation

G. Barrutieta, J. Abaitua & J. Díaz

(DELi)COLING 2002

W8: NLP XML

Sept. 1st, 2002

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 2

Introduction – System overview

Course material(multilingual parallel corpus)

User aspects

xml-dtd

Document generation

Document view

COURSE GENERATOR

Generation engine

html-xml-dtd-xsl-javascript

Select content and format in an“intelligent” way.

Inputs

Web browser

......

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 3

Introduction - Corpus

• Multilingual parallel corpus or master document– Gross-grained RST to represent the gross-grained discourse

structure.– XML-DTD to represent digitally the gross-grained RST.

• Text > Data > In between tags• Discourse structure > Metadata > XML tags

– Gross-grained RST provides the framework for an isomorphic multilingual corpus.

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 4

<RST> <RST-S> <PREPARATION> <S> What is knowledge management? </S> </PREPARATION> </RST-S> <RST-N> <S> Knowledge, in a business context, is the organizational memory, which people know collectively and individually </S> <S> Management is the judicious use of means to accomplish an end </S> <S> Knowledge management is the combination of those concepts, KM = knowledge + management </S> </RST-N></RST>

Introduction: Multilingual parallel corpus with gross-grained RST in XML

EN ES EU

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 5

<RST> <RST-S> <PREPARATION> <S> What is knowledge management? </S> </PREPARATION> </RST-S> <RST-N> <S> Knowledge, in a business context, is the organizational memory, which people know collectively and individually </S> <S> Management is the judicious use of means to accomplish an end </S> <S> Knowledge management is the combination of those concepts, KM = knowledge + management </S> </RST-N></RST>

<RST> <RST-S> <PREPARATION> <S> ¿Qué es gestión del conocimiento? </S> </PREPARATION> </RST-S> <RST-N> <S> Conocimiento, en el contexto de los negocios, es la memoria de la organización, lo que la gente sabe colectiva e individualmente  </S> <S> Gestión es el uso juicioso de recursos para alcanzar un fin </S> <S> Gestión del conocimiento es la combinación de esos dos conceptos, GC = gestión + conocimiento </S> </RST-N></RST>

EN ES EU

Introduction: Multilingual parallel corpus with gross-grained RST in XML

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 6

<RST> <RST-S> <PREPARATION> <S> What is knowledge management? </S> </PREPARATION> </RST-S> <RST-N> <S> Knowledge, in a business context, is the organizational memory, which people know collectively and individually </S> <S> Management is the judicious use of means to accomplish an end </S> <S> Knowledge management is the combination of those concepts, KM = knowledge + management </S> </RST-N></RST>

<RST> <RST-S> <PREPARATION> <S> ¿Qué es gestión del conocimiento? </S> </PREPARATION> </RST-S> <RST-N> <S> Conocimiento, en el contexto de los negocios, es la memoria de la organización, lo que la gente sabe colectiva e individualmente  </S> <S> Gestión es el uso juicioso de recursos para alcanzar un fin </S> <S> Gestión del conocimiento es la combinación de esos dos conceptos, GC = gestión + conocimiento </S> </RST-N></RST>

<RST> <RST-S> <PREPARATION> <S> Zer da ezagutzaren kudeaketa? </S> </PREPARATION> </RST-S> <RST-N> <S> Kudeaketa, negozioetan, erakundearen memoria da, jendeak bakarka eta taldeka dakiena </S> <S> Kudeaketak erabideen erabilera zuzena du helburu </S> <S> Ezagutzaren kudeaketa bi kontzeptu hauen nahasketa da, EK = ezagutza + kudeaketa </S> </RST-N></RST>

EN ES EU

Introduction: Multilingual parallel corpus with gross-grained RST in XML

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 7

Introduction – User Aspects

Specific User Aspects Discrete values

Subject Language processors

Moment in time Before the course / Period 1 / Period 2 / … / After the course (review)

Languages EN/ ES/ EU

General User Aspects Discrete values Level of expertise Null / Basic / Medium / High

Reason to read To get an idea / To get deep into it

Background Not related to the subject / Related to the subject

Opinion or motivation Against / Without an opinion or motivation / In favour

Time available A little bit of time / Quite some time / Enough time

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 8

CSA – Parallel selection

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 9

CSA – Horizontal filtering

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 10

CSA – Vertical filtering

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 11

CSA – Vertical filteringLevel of expertise

If level_expertise = “null” or level_expertise = “basic”Then no relation-satellite is discarded; If level_expertise = “medium” or level_expertise = “high”Then discard example, exercise, background and preparation relation-satellites;

Rationale for the rule: Any user with a null or basic level of expertise on the selected subject will need all the information available to understand the text. Alternatively, a user with a medium or high level of expertise will not require examples, exercises, background, preparation and similar relation-satellites.

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 12

CSA – Vertical filteringReason to read

If reason_to_read = “to get an idea”Then discard exercise and elaboration (all the types of elaboration: textual elaboration, link elaboration and image elaboration) relation-satellites; If reason_to_read = “to get deep into it”Then no relation-satellite is discarded;

Rationale: Any user wishing to broaden his knowledge in the selected subject will need additional information. Conversely, a user with the intention of just getting an idea does not need any exercise, elaboration, or similar relation-satellites, which often require a more active role on the part of the user.

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 13

CSA – Vertical filteringProfessional background

If job_studies = “not related subject” Then no relation-satellite is discarded; If job_studies = “related subject” Then discard background and preparation relation-satellites;

Rationale: Any user whose professional background is not related to the subject will need all the additional supporting text to understand its meaning. Conversely, if the user is related to the selected subject, we may assume that background, preparation and similar relation-satellites will be unnecessary.

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 14

CSA – Vertical filteringOpinion or motivation

If opinion_motivation = “against” or opinion_motivation = “without an opinion or motivation” Then no relation-satellite is discarded; If opinion_motivation = “in favour” Then discard motivate, antithesis, concession and justify relation-satellite;

Rationale: A motivated or favourable user will not require additional motivation and, therefore, the motivate, antithesis, concession, justification, and similar relation-satellites will be disregarded, since they play a role in changing the opinion of the user to be in favour of the course material.

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 15

CSA – Vertical filteringTime available

If time_available = “a little bit of time” Then discard all the relation-satellites; If time_available = “quite some time”Then discard exercise relation-satellite; If time_available = “enough time”Then no relation-satellite is discarded;

Rationale: Time availability is a crucial user aspect. If the user is in a rush or has little time, the system has to provide only the most elementary information. In such case only nuclei will be generated. If the user has a bit more time, but not much, exercises are not offered, since they are usually quite time consuming and they require an active participation of the user. Finally, if the user has plenty of time, all the additional information is delivered.

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 16

CSA – Vertical filteringComments

• The order of application of the filters is irrelevant, each filter acts upon certain parts of the text independently.

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 17

Implementation

Javascript implementation

XSL implementation

objData.loadXML(sResult);

objStyle.load(sXSL1);

sResult=objData.transformNode(objStyle);

<xsl:template match="BACKGROUND">

<xsl:copy>

<xsl:apply-templates/>

</xsl:copy>

</xsl:template>

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 18

Experimentation

• The main objective of the experimentation is to validate the hypothesis expressed in the filtering rules letting people judge the generated document and also the actual filtering mechanism of the CSA.

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 19

Demo

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 20

Demo

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 21

Conclusions

• Increase the size of the corpus:– As long as this is done following the same DTD and

RST model, the algorithm will not have to change at all.

• Augment the user model:– New user aspect requires only a new filter– New values for an existing user aspect requires a change

in the corresponding filter

• Therefore none of this modifications increase the complexity of the system and are not difficult to implement.

DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002

Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 22

• Questions

• Comments• Further information

• Suggestions

Thank you for your attention.

This research work has been partly supported by the Basque

Goverment