wag: sentence generation manual - wagsoft homepageonce the generation module is loaded, you can...

Mick O'Donnell

Department of Linguistics,

University of Sydney,

Australia, 2006

email: [email protected]

June 1995

WAG:

Sentence Generation

Manual

ii

Contents

Contents ii

Chapter 1 The WAG Sentence Generator ....................................1

1. Introduction 12. Sentence Generation and Multi -Sentential Text Generation 13. Loading the Sentence Generator 24. How to Use the Sentence Generator 2

Chapter 2 Writing Sentence Specifications..................................3

1. Semantic Specifications 32. Roles of the Speech-Act 43. Ideational Specification 4

3.1 Generating Sentences Directly from the Knowledge-Base 53.2 Ideation Specified within Sentence Specifications 6

4. Textual Specification 64.1 Theme 64.2 Relevant-Entities 74.3 Recoverable-Entities 74.4 Shared-Entities 7

5. Additional Fields of the Input Specification 75.1 Constraint 75.2 Preselect 85.3 Prefer 8

6. Simpli fying the Semantic Specification 86.1 Semantic Defaulting 86.2 Macros 9

7. The Speaker and Hearer Roles 10

Chapter 3 Running Sentence Specifications...............................11

1. Running Sentence Specifications 112. Evaluating Sentence Specifications in a Buffer 113. The Generator Menu 114. Setting Generator Preferences 12

4.1. Display Mode 124.2. Selection Mode 134.3. Control Strategy 134.4. Modes of Use 13

5. Setting Feature Defaults 145.1 Feature Preferences 14

6. Speech Output 147. Debugging Lexification 148. The Generator Interface 15

8.1. The Generation Process 168.2. The ‘Actions” of the Stepper 168.3. Setting Stepper Breakpoints 168.4. Stepping Through the Generation 17

WAG Generation Manual iii

8.5. Other Buttons 178.6. Viewing Sentence Structure 18

10. Random Sentence generation 1811. Using the Coder to Drive Generation 18

Chapter 4 Architecture of the WAG Sentence Generator...........19

1. Theoretical Issues 191.1. Control Strategies 191.2. Deterministic vs. Non-Deterministic Generation 20

2. WAG's Generation Process 222.1. The General Algorithm 222.2. Initial Processing of the Input 222.3. Lexico-Grammatical Construction 232.4. Lexical Selection 282.5. Text and Speech Output 29

Chapter 5 Comparison With Penman.........................................30

1. Similarities to Penman 302. Differences from Penman 303. Summary 32

Appendix A Example Semantic Forms..........................................33

1. A Simple Utterance 332. Changing Tense 333. Progressive Aspect 344. Referring To Entities 34

4.1 Identifiabilit y 344.2 Recoverabilit y 354.3 Speaker and Hearer Roles 35

5. Changing the Theme 366. Varying The Speech Act 37

6.1 eli cit-polarity 376.2 eli cit-content 376.3 Proposing in response to an eli citation 386.4 Imperative (negotiate-action) 386.5 Other Responding Moves 38Not yet working: 386.6 Salutary Moves 39

7. Controlli ng Modality 398. Varying the Process Type 40

8.1 Material Processes 408.2 Verbal Processes 408.3 Mental Processes 408.4 Relational Processes 41

9. Types of Circumstances & Qualiti es 4110. Clause Complexes 4111. Grammatical Metaphor 4212. Content Selection 43

Bibliography ..................................................................................44

1

Chapter 1

The WAG Sentence Generator

1. Introduction

The WAG system includes a single-sentence generation program, which allows theuser to specify the semantics of a sentence, and have the program generate a sentenceexpressing the semantics. This manual describes various aspects of this sentencegenerator, including the input specification language, how to use the WAG interface tocontrol generation, and how to view the results of generation. An overview of thegeneration algorithm is also provided, and a large number of example sentencesdemonstrating how various syntactic forms can be achieved.

2. Sentence Generation and Multi-Sentential Text Generation

The aim of a typical text generation system is to produce a text which satisfies someset of pre-stated goals. Such systems are provided with a knowledge base -- whichcontains information to be expressed -- and a set of goals. The system then organisesthis information into sentence-length chunks, realises these chunks as sentences, andprints or speaks the text. Figure 1 shows a typical application of a text generationsystem: a weather satellit e beams down weather information to a receiver dish, whichpasses the information to a computer. The computer draws upon this knowledge base(and perhaps other sources) to generate a weather report.

WAG Generation Manual 2

Weather Data from Satel ite

Today's WeatherToday's weather should be mild, with occasional showers in the late afternoon........................................ ....................................................... .......................................................

Figure 1: A Weather Report Generation System

Ideation Base

Discourse Goals

Content Selection

Text Organisation

Speech/Text

Sentence Generation

Figure 2: From Knowledge-base to Text/Speech

A possible architecture for this text generation process is shown in figure 2. Thethree stages (which can be inter-mixed) of this architecture are:

1) Content Selection: Determining which of the facts in the ideational (knowledge)base need to be expressed to best achieve the discourse goals.

2) Text Organisation: Splitti ng the selected content into segments realisable assingle sentences (semantic specifications), and ordering these segments into asequence which best achieves the discourse goals. Different discourse goals mayresult in different orderings.

3) Sentence Generation: realising these segments as sentences.

The WAG system handles only the last of these stages. However, users who wish tobuild their own multi -sentential text generation systems can use the WAG system to


handle sentence generation. The users system need only supply semantic specificationsof sentences, and leave the details of syntactic generation and lexical selection to WAG.

This manual will i ntroduce the WAG Sentence Generation sub-system, and detail it suse. Section 2 discusses how to write sentence specifications for the generator. Section 3details how to run these specifications. Section 4 details some of the internal workingsof the generator, and section 5 compares the WAG sentence generator with the well -known sentence generator, Penman (Mann 1983; Mann & Matthiessen 1985). TheAppendix provides many examples of sentences, showing how to generate specificforms.


Chapter 2

Writing Sentence Specifications

0. How to Use the Sentence Generator

The generator may already be part of your distribution. If not, type (load-generator)into your li sp listener.

To produce sentences, you can either use an existing resource model (the Dialogresource model is best for this purpose), or write your own. If you are just learningWAG, it is best to use the Dialog resource model. The rest of this section providesbackground information about writing sentence-specifications, and the Appendixprovides many examples, showing how to produce particular variations.

If you are new to Lisp environments, you can make a sentence-specification run byeither:

i) cut/paste the sentence-sepcification into your li sp listener (the lisp prompt),followed by a carriage return; or

ii ) placing the cursor after the last parenthesis of the form and pressing the eval-li spkey (Apple-e on a Mac, control-x e in many Sun-based lisps).

More advanced users may wish to link the sentence-realiser into a multi -sententialtext-planner. For this purpose, you would need to use the say-example function (see fileProcesses/Generator/say.lisp). Its arguments are described in Appendix B below.

1. Semantic Specifications

Once the Generation module is loaded, you can generate sentences by evaluatingsemantic specifications.1 A semantic specification is a specification of the semantics ofa single utterance. It is basically the specification of a speech-act, including the speech-function (elicit, inform, greet, etc.), the ideational content, modality, polarity, etc., andtextual information (e.g., relevance, recoverabilit y, themacity etc.).

Figure 3 shows a sample semantic specification, from which the generator wouldproduce: I’ d li ke information on some panel beaters. The distinct contributions of thethree meta-functions are separated by the grey boxes.

1WAG can also be set to generate without semantic constraint, by making random or default lexico-

grammatical selections, or by allowing a human to make these decisions (see sections X & Y).


Interactional Specification

Ideational Specification

Textual Specification

( say di al og- 5 : i s ( : and i ni t i at e pr opose) : speaker ( Cal l er : i s mal e : number 1) : Hear er ( Oper at or : i s f emal e : number 1) : pr oposi t i on ( P5 : i s l i ke : senser Cal l er : phenomenon ( i nf o : i s ( : and i nf or mat i on gener i c- t hi ng) : mat t er ( pb : i s panel - beat er : number 2) ) : pol ar i t y ( pol 5 : i s posi t i ve) : modal i t y ( mod5 : i s ( : and vol i t i onal condi t i onal ) ) ) : t heme Cal l er : r el evant - ent i t i es ( P5 i nf o pol 5 Cal l er pb) : r ecover abl e- ent i t i es ( Speaker Cal l er ) : shar ed- ent i t i es ni l )

Figure 3: The Semantic Specification for"I'd li ke information on some panel beaters."

say is the name of the lisp function which evaluates the semantic specification, andcalls the generation process.

dialog-5 is the name of this particular speech-act -- each speech-act is given a uniqueidentifier, its unit-id.

The :is field specifies the features of the unit. This is used both for the speech-act asa whole, and for any unit in the ideational content. In this example, the speech-act isprovided with a feature-specification (:and initiate propose). The proposition isprovided with a single ideational feature: li ke. The feature-specification can be a singlefeature, or a logical combination of features (using any combination of :and, :or or:not). One does not need to specify features which are systemically implied, e.g.,specifying propose is equivalent to specifying (:and move speech-act negotiatorypropose).

2. Roles of the Speech-Act

Most of the colon-marked fields in figure 3 specify the roles of the units, and theirfill er. For example, the following specifies that the Speaker role is fill ed by an entitywith unit-id Caller, which is of type male.

:speaker (Caller :is male)

We can specify the fill er of a role in two ways:

a) Unit-Id Only: We can refer to the unit using just an identifier, e.g., :SpeakerCaller. If an entity with this name has already been defined, then the Speakerrole will point to this entity. If no entity of this name has been defined, then anew entity is defined and inserted into the knowledge-base.

b) Unit-Definition: If the role-fill er has not been introduced before, we can definethe entity within the role-fill er slot. For instance, :speaker (Caller :is male)


defines an instance Caller, declares the instance to be of type male, and makesthe Speaker role point to this entity. A unit-definition has the structure:

(<unit-id>:is <feature-specification>:role 1 <unit-specification 1>

:role 2 <unit-specification 2>

..... )

Units can also be defined separately from the ‘say’ f orm, for instance, by pre-loadinga knowledge-base (see section 2.2.1 below).

The possible roles of the speech-act are:

• Proposition: the ideational content of the speech-act -- a unit-specification;

• Speaker: the unit-specification of the speaking entity;

• Hearer : the unit-specification of the hearing entity;

• Required: for eliciting moves, the unit-id of the wh- element. This is a pointerto the element of the ideational content which is being elicited;

• Elicited: for proposing moves in response to an elicitation, indicates whichelement corresponds to the Required element in the elicitation. Fragmentaryresponses may include just the elicited element.

3. Specifying Content: Ideational Specification

One fundamental difference between WAG's input language, and that of Penman,involves the relation between sentence specifications and the knowledge-base (KB). Inboth systems, the KB is used to represent the world we are expressing, the entities ofinterest, the processes they partake in, and the relations between these participants andprocesses.

In Penman, the ideational component of a sentence plan is not part of the KB, butrather a re-expression of the knowledge in a form closer to language. SPLs areconstructed with reference to the KB, but there is no necessary correspondence betweenthe form of the knowledge and the form of the SPL. This is important for Penman, sinceSPLs are designed to work with a variety of different knowledge-base systems. Penmanusers supply a function to construct SPLs from the knowledge-base. SPLs may also beconstructed by hand, without any knowledge-base being attached to the system at all .

3.1 Generating Sentences Directly from the Knowledge-BaseThe WAG sentence generator, on the other hand, is designed to be used hand-in-hand

with its own KRS, so the two are more highly integrated. The standard form of asentence-specification does not itself contain a specification of ideational-structure,rather it contains a pointer into the knowledge-base -- the fill er of the :proposition roleis usually the unit-id of an entity already defined in the knowledge-base.2 The otherfields of the sentence-specification are used to tailor the expression of the indicatedknowledge.

2As we will see below, we can actuall y supply an ideational specification in the :proposition slot, but this

should be seen as a short-hand form, allowing assertion of knowledge into the knowledge-base at the same time asspecifying a sentence.


To summarise, a Penman-based text-generator needs to build an ideational structurere-representing the content of the KB, which the realisation component then operateson, rather than the KB itself, while a WAG-based text-generator just includes in thesentence-plan a pointer into the KB, and the realisation component then refers directlyto the KB to generate a sentence.

To demonstrate WAG's approach, we show below the generation of some sentencesin two stages -- firstly, assertion of knowledge into the KB, and then the expression ofindicated sections of this KB. The following asserts some knowledge about John andMary, about how Mary left a party because John arrived at the party. tell is a li sp macroform used to assert knowledge into the KB.

; Participants (tell John :is male :name "John") (tell Mary :is female :name "Mary") (tell Party :is spatial)

;Processes (tell arrival :is motion-termination :Actor John :Destination Party)

(tell leaving :is motion-initiation :Actor Mary :Origin Party)

;relation (tell causation :is causative-perspective :head arrival :dependent leaving)

Now we are ready to express this knowledge. The following sentence-specificationindicates that the speaker is proposing information, and that the head of this informationis the leaving process. It also indicates which of the entities in the KB are relevant forexpression (and are thus included if possible), and which are identifiable in context (andcan thus be referred to by name). The generation process, using this specification,produces the sentence: Mary left because John arr ived.

(say gramm-met1 :is propose :proposition leaving :relevant-entities (John Mary arrival leaving causation) :identifiable-entities (John Mary))

=> Mary left because John arrived.

As we stated, this approach to sentence specification does not require the sentence-specification to include any ideational-specification, except for a pointer into the KB.The realisation operates directly on the KB, rather than on an embedded ideationalspecification.

Different sentence-specifications can indicate different expressions of the sameinformation, including more or less detail , changing the speech-act, or changing thetextual status of various entities. The expression can also be altered by selecting adifferent entity as the head of the utterance. For instance, the following sentence-specification uses the cause relation as the head, producing a substantially differentsentence:


(say gramm-met2 :is propose :proposition causation :relevant-entities (John Mary arrival leaving causation) :identifiable-entities (John Mary))

=> John's arrival caused Mary to leave .

For details about how to assert information into the KB, see The WAG KRL Manual.

3.2 Ideation Specified within Sentence SpecificationsSometimes it is more convenient to specify ideational content within the sentence

specification, as in Penman's SPLs. WAG allows this form of expression also: if thefill er of the :proposition field is an ideational specification rather than a unit-id, then thespecification is asserted into the KB, and generation proceeds from there. This approachwas exempli fied in figure 3 above.

The syntax for this ideational specification is as follows (see The WAG KRL Manualfor fuller explanation).

Syntax: (<instance-id> &rest <Keys >)

<instance-id> - the id of an instance - if it exists, the info will be added to theexisting instance, else a new instance is created. If the instance-id is "_", then WAG-KRL will provide a unique instance-id of its own, e.g., (_ :actor Fred)

<Keys> any of the following:

:is <le> a description of the features the item has. <le> is a logical expression offeatures e.g., human, (:and human male), (:not human), (:or process quality).

:<role> <unit-id> e.g. :location Sydney - sets the fill er of the role to the specifiedunit. If the role already exists for this unit, the new role fill er is unified with theexisting one. For instance, if we assertedf the speaker of a say form to be:speaker (S1 :is female), then we can use S1 as a role fill er in the proposition,e.g., :actor S1.

:<role> <unit-description> Rather than proving an unit-id, we can provide aspecification of the fill er in more full -form. Basically, we replace the unit-idwith a full specification, e.g.,

(say Example1 :proposition (p1 :is material-process :actor (fred :is human :name "Fred") :actee (_ :name "Mary")))

:<role> <string/number> e.g. :name "John" - creates an instance for the string ornumber (see special values in the KRL Manual), and makes that instance thefill er of the role. If an instance with that value already exists, it is used ratherthan creating a new instance.

:<role> (:and unit-id1 unit-id2 ...) e.g. :actor "(:and John Mary Paul) - for roleswhich allow multiple-role-fill ers, creates a set-unit which groups theseinstances. They will be generated as a conjunction, e.g., John, Mary and Paul..

:<role> <variable-reference> e.g. :actor * focus* .Designer - sets the role-fill er tothe instance pointed to by the reference. References can only be a variable-reference (see section on reference forms in the KRL Manual).


:<role-chain> <instance-id/tell -form/special/var iable-reference> - In any of theabove forms, the role specification can be replaced by a role-chain, e.g.,:actor.location.name "Sydney"

:constraints <constraint> allows the use of the non-macro-mode logic (seethe KRL manual) e.g., (tell p1 :constraints (:or (:type Actor male) (:fill Actee female)))

4. Controlli ng Expression: Textual Specification

The sentence-specification includes several fields which specify various textualstatuses of the entities in the knowledge-base:

4.1 ThemeThis field specifies the unit-id of the ideational entity which is thematic in the

sentence. If a participant in a process, it will t ypically be made Subject of the sentence.If the Theme plays a circumstantial role in the proposition, it is usually realised as asentence initial adjunct. WAG's treatment of Theme needs to be extended to handle thefull range of thematic phenomena.

4.2 Relevant-EntitiesThis field contains a list of the ideational entities which are in the relevance space

(see chapter 5 of my thesis), and are thus selected for expression. In the example infigure 3, five entities are nominated as relevant:

:relevant-entities (P5 info pol5 Caller pb)

This field is not necessary when an explicit ideational specification is included in the‘say’ f orm. In such cases, the generator assumes that all the entities included within thespecification are relevant, and no others.

However, when the :proposition slot contains only a pointer into the knowledge-base, the :relevance field specifies which elements of the KB to express. See chapter 5of my thesis for an example using the relevance space to select out successive chunks ofa KB (there called a macro-ideational structure).

4.3 Recoverable-EntitiesThis field contains a list of the ideational entities which are recoverable from context,

whether from the prior text, or from the immediate interactional context (e.g., thespeaker and hearer). See chapter 5 of my thesis for detail .

4.4 Shared-EntitiesThis field contains a list of the ideational entities which the speaker wishes to

indicate as known by the listener, e.g., by using definite reference. See chapter 5 fordetails.


5. Additional Fields of the Input Specification

Some additional fields are allowed in the semantic specification, extending theexpressive power of the input language.

5.1 ConstraintThis :constraint field allows the user to assert structural information which cannot be

expressed by simply specifying features or roles of elements of the speech-act. Forinstance, tense/aspect is specified in the WAG system by specifying the relativeordering of three points of time (following Reichenbach 1947):

Speaking-Time: when the utterance is made;

Event-Time: when the event takes place;

Reference-Time: A reference point adopted by the speaker.

We can provide a :constraint field in the semantic specification to express theserelations:

(say utterance-1 :is (:and initiate propose) :proposition P1 :speaker Caller :constraint (and (< Proposition.Event-time Reference-time) (= Speaking-time Reference-time)) )

5.2 PreselectThis field allows the user to ‘preselect’ f eatures of the lexico-grammatical structure.

Some grammatical decisions may not be semantically constrained, and this field allowsthe user to specify which feature to choose, e.g.,

:preselect ((p3 indefinite-pronom-group))

The first element of a preselection specification (p3 in this example) is the unit-id ofan ideational unit, the second is a feature-specification which the grammatical realisateof the ideational unit must have. This feature-specification must not conflict with therest of the constraints on that unit, meaning that it must be compatible with the usuallexico-grammatical preselections, and also with the feature’s selection-constraint.

5.3 PreferWhile :preselect specifies features that particular grammatical units must have,

:prefer allows the user to specify feature defaults, e.g.,

:prefer (passive)

During the generation process, there is often more than one feature in a systemappropriate to express the semantic specification. This is true when no feature in thesystem is preselected, and the selection-constraints on more than one feature are met. Inthese cases, an arbitrary choice needs to be made. By placing a feature in the :preferfield, the user can cause the preferred feature to be chosen in such cases.

Feature preferences can also be set globally using the * feature-preferences* variable.This variable is also used for semantic defaulting, as discussed just below.


6. Simpli fying the Semantic Specification

Hovy (1993) points out that as the input specification language gets more powerful,the amount of information required in the input specification gets larger and morecomplex. The Penman system uses a couple of methods to avoid the growingcomplexity of the input specification. These have been adapted in WAG as follows.

6.1 Semantic DefaultingWhen the input-specification leaves particular semantic systems unresolved, Penman

chooses a feature on a default basis. For instance, the following features are the defaultwhen not stated in the input specification: Speech-function: statement; Tense: simple-present; Polarity: positive; Modality: none.

WAG also uses feature defaults. A variable * feature-preferences* is defined, whichholds a list of the default (or preferred) features. Before generation begins, the processorgoes through each unit of the semantic specification and ensures that, for those systemswith no preselected choice, the default feature is selected, if its selection-constraint ismet.

Defaulting is necessary since the WAG system uses a deterministic generationstrategy -- each grammatical choice must be resolvable as it is met (see section 4.1.3below). Grammatical choices will not be resolvable unless the semantic decisions theydepend on have already been resolved. WAG thus forces those semantic decisionswhich have not been resolved by the input specification.

Below is shown the Say form from figure 3, this time in a reduced form relying ondefaults:

(say dialog-5 :speaker Caller :proposition (P5 :is like :senser Caller :phenomenon (info :is (:and information generic-thing)) :matter (pb :is panel-beater :number 2)) :modality (mod5 :is (:and volitional conditional))))

6.2 MacrosPenman allows the user to define macros -- short forms in the input specification

which expand out to more extensive forms. For instance...:tense present-continuous

...in an input specification is replaced with the following before processing begins:

:speech-act-id (?sa / Speech-act :speaking-time-id (?st / time :time-in-relation-to-speaking-time-id ?st :time-in-relation-id (?st ?et ?st) ?et :precede-q (?st ?et) notprecedes)) :event-time (?et / time :precede-q (?et ?st) notprecedes))

WAG does not use macros. To serve the same function, we can add features to thenetworks which represent complex specifications. For instance, a system has been addedto the speech-act network, which includes features such as present-continuous, past-


perfect, etc., each feature being associated with realisations which assert the necessarystructural constraint (see figure 4). These features can then be included in the feature-field of the sentence specification, acting as a short-form for the associated structuralconstraint.

simple-present(:and (= Speaking-Time Reference-Time) (<= Reference-Time Proposition.Event-Time ))

past-perfect

(:and (< Reference-Time Speaking-Time ) (< Proposition.Event-Time Reference-Time))

present-perfect

speech-act

Figure 4: Adding Features as a Form of Macro

7. The Speaker and Hearer Roles

The Speaker and Hearer fields are presently used for two purposes:

• Pronominalisation: The Speaker and Hearer roles are used to test ifpronominalisation is appropriate: if the fill ers of these roles are also part of theproposition being expressed, then pronominalisation is called for, e.g., I, you,etc.

• Voice Selection: WAG checks the gender feature of the Speaker to determinewhich voice to use in Macintosh’s text-to-speech system.

Note that the attributes of the Speaker and Hearer do not need to be re-defined foreach sentence. We can pre-define the speech-participants as entities in the knowledge-base. Each speech-act specification thence only needs to refer to the unit-id of thespeaker and hearer.

In theory, the Speaker and Hearer fields are available for user-modelli ng purposes:lexico-grammatical choices can be constrained by reference to attributes specified in theSpeaker and Hearer roles (cf. Paris 1993; Bateman & Paris 1989b; Hovy 1988a). Sincethe fill ers of the Speaker and Hearer roles are ideational units, they can be extensivelyspecified, including their place of origin, social class, social roles, etc. Relationsbetween the speaker and hearer could also be specified, for instance, parent/child, ordoctor/patient relations. Lexico-grammatical decisions can be made by reference to thisinformation: tailoring the language to the speaker’s and hearer’s descriptions. This hasnot, however, been done at present: while the implementation is set up to handle thistailoring, the resources have not yet been appropriately constrained.


Chapter 3

Generation Interfaces: Macintosh

1. Running Sentence Specifications

Once written, semantic specifications can be run in a number of ways. This sectiondescribes these. The three main ways, described below, are:

• Evaluating Sentence Specifications in a Buffer;

• Using the Window-Based step-through generation Interface;

• Using the Coder to step through the generation, with the user making eachchoice.

2. Evaluating Sentence Specifications in a Buffer

Once you have written a semantic specification, you can evaluate it. As with all Lispforms, you evaluate it by placing the cursor after the last parenthesis, and then chooseEval Selection from the Eval menu (alternatively, type Cmd-e).

You can also evaluate all sentence specifications within an open file buffer byselecting Eval Buffer from the Eval menu (alternatively, type Cmd-h).

You can evaluate an unopened file of sentence specifications using the Load Fileoption from the Eval menu.


3. The Generator Menu

The generation menu offers some other options which may be useful during sentencegeneration. See figure 5.

Figure 5: The Generation Menu

• Debug Lexification: Brings up an interface which allows you to step throughthe lexification process for each leaf of the grammatical structure. Helps you tolocate where the process went wrong. See below.

• Re-Display Results: Displays the last sentence again. Use in conjunction withchanged display modes (see Preferences above) to show alternative views of thesentence.

• Show Realisations of Top: Prints out the realisations associated with the top-level of the grammatical structure.

• Show Realisations of Current: Prints out the realisations associated with thecurrent grammatical unit, assuming generation broke at some point beforecompletion.

Graph Current Gram Unit: Produces a graph of the current sentence structure.

Graph Current Speech-Act: Produces a graph of the current speech-act beingexpressed.

Show Current Gram Unit: Brings up the Resource Explorer card for sentence.

Show Current Speech-Act: Brings up the Resource Explorer card for thecurrent speech-act being expressed.

Load Example: Loads in an example form [Not Currently Working].

Re-Generate Current Speech-Act: Re-generates the current speech-act, mostuseful after some grammatical changes have been made.


4. Sett ing Generator PreferencesChoose Preferences... from the Generator menu to change some options in the

generation process. See figure 6.

Clear KB before running Say

Print Time Taken in Says

Print each word as Generated

Control Strategy Target-Driven

Selection Mode Last

Display Mode Text-Only

OK Cancel

Figure 6: The generation Preferences Dialog

4.1. Display ModeControls how the generated sentence is displayed:

a) Text Only: Only the final sentence is display.

b) Continuous Text: If a series of sentence-specifications are evaluated, thegenerator displays these as a single paragraph. A new paragraph is startedwhenever the fill er of the speaker role changes.

c) Function Structure: prints the top-level function-structure of the sentence,and the features of this unit.

d) Long Structure: prints the full function-structure and feature structure of thesentence.

e) Internal: prints the MCL-internal representation of the sentence.

f) Speech: if the Macintosh Speech manager is installed, the sentence isuttered. If the gender of the speaker is supplied in the sentence-specification,then an appropriate voice is selected.

4.2. Selection ModeControls the order in which features are tested:

• First: Features are tested in the order in which they appear in the systemdefinition, with the exception that explicitl y stated feature-preferences (seeabove) are ordered first.

• Last: Features are tested in the reverse of the order in which they appear in thesystem definition, with the exception that explicitl y stated feature-preferences(see above) are ordered first. This ordering is the default, since usuallySystemicists place those features with no realisation last, and thus selecting thismode results in the simplest structures.

• Random: Features are tested in a random order.


4.3. Control StrategyThis choice controls whether or not the generation of grammatical structure is

constrained by the semantic:

• Target-Dr iven: Use the Feature Selection Constraints to constrain the choice ofeach feature, and in lexical selection, use the semantic referent as a constraint.

• None: No semantic constraint is used -- the first suitable feature in each systemis selected, as ordered in regards to the selection mode discussed above.

4.4. Modes of UseCheck or uncheck the check-box to switch between different generation modes:

• Print Each Word as Generated: If on, each word is printed as it is selected(incremental display). Otherwise, we produce nothing until the entire sentencestructure is completed.

• Print Time Taken: If on, the program prints how much time the generation ofthe sentence took.

• Clear KB before Running Say: If on, the program clears the knowledge basebefore each ‘say is evaluated. This is useful i f you are changing the ideationalcontent of the say-form, in a way which is contradictory. If you are generatingfrom a KB (see section 2.2) then this mode will not work.

5. Sett ing Feature Defaults

5.1 Feature PreferencesSetting feature defaults (either semantic or grammatical) at present requires you to

edit a file. Open the load file for the grammar you are using, and look for the :feature-preferences field. Add the feature you want to this li st, abd evaluate it.

Undefaulted Systems List: XXXX

6. Speech Output

If the Macintosh Speech manager is installed on your machine, then you can haveWAG speak the generated text. To do this, you need to select Preferences... from theGeneration menu, then switch Display Mode to Speech.

Two other menu items are used for speech. Look under the General menu:

• Speech Voice: lets you pick the current speech voice.

• Speak Selection: the currently highlighted selection will be spoken.


7. The Generator Interface

WAG includes a stepping interface for the generator. This interface allows you toview each step in the generation of a sentence, and see the structure as it is built up.

Once you have evaluated a sentence specification, you can select GenerationInterface from the Generator menu. You will be presented with a window, as shown infigure 7.

GoSingle Step Break Reset Graph Speech Act Graph Sentence

Set Breakpoints Load Speech Act Force Choice

Current System: CLAUSE-ADVERBIAL-SYS

no-adverbial-…adverbial-modi…

Current Feature: ADVERBIAL-MODIFIER RealisationsRequire Adv-Qualif Type Adv-Qualif adverbOrder Pred Adv-Qualif

Semantic ConstraintExists Referent.Process-Quality Relevant Referent.Process-Quality Same Referent.Process-Quality Adv-Qualif.Referent

Present Unit

TOP

Selected Features

fullclause-simplexclause

Last Action: Advance to Next Feature

Result: T

Next Action Test current feature against preselections

Systems to Enter

clause-adverbial-…primary-circ-sysbeneficiary-choiceobject-insertionprocess-typeprogressiveperfectvoicedependence

Grammatical Structure

tensesubjectpred(verb)

Figure 7: The Generation Interface

7.1. The Generation ProcessWAG uses a similar generation algorithm to that used in Penman. Generation is

basically driven by the grammar system network. The program steps through thisnetwork from the left (choosing first between clause, group, and word), referring toboth the preselections on the unit, and to the feature-selection constraints on eachfeature, to see which feature to choose in each system. See Section 4.2.4 below for moredescription of this process. See O'Donnell -Thesis, chapter 6, for more information onfeature-selection constraints.

7.2. The ‘Actions” of the StepperGeneration proceeds via a series of actions -- subtasks in the generation process.

These include, for instance, test-semantic-constraint, assert-realisations, test-preselections, order-constituents, select-lexeme, etc.

Towards the bottom of the interface is a display which shows which action was justperformed, what result was recorded (the sequence of actions depends on the result ofthe prior action), and what the next action will be.


You can set Stepper Breakpoints (see below) so that the program only stops beforecertain actions. I generally stop only on test-semantic-constraints, which means there isonly one break per feature in the traversal.

7.3. Sett ing Stepper BreakpointsIt is often desirable to let the generation process run, and only stop at certain places.

Pressing the Set Breakpoint button will bring up a dialog which allows you to controlthe various breakpoints (see figure 8). Two sorts of breaking are possible:

• Action Breaking: generation will break whenever the actions in the right-handcolumn are reached.

• Feature Breaking: generation will break whenever the features in the right-hand column are reached.

To switch between these types of breaking, click down on the Break Type fieldwhich says "Action" or "Feature" -- this is a popup menu.

Double-clicking on an item in either column will move it to the other column.

Break Type: Action

Don't Break On Break On

enter-next-systemtry-next-featuretest-preselectionstest-semantic-constraintsassert-realisationsadd-new-systemsorder-the-constituentsrealise-constituentsrealise-next-constituenttest-if-lexical-unitprint-results

Done

Double-Click on item to Move it

to the other list.

Figure 8: The Set Breakpoints Dialog

7.4. Stepping Through the GenerationFour buttons control the progress of the generation.

GO: The generator proceeds until the next break-point is reached.

Single-Step: The generator performs the next action only.

Break: Generation is stopped as if a break-point was reached. Press Go or Single-Step to proceed.

Reset: Resets the interface, clearing all generated structure, and ready for startingagain. Note that the current speech-act is not cleared, so Go will start thegeneration again.


7.5. Other ButtonsSome other buttons are available:

Force Choice: Allows the user to go against the default feature order, selecting afeature of lower priority. The feature may fail to be selected if it goes againsteither the preselections or its feature-selection constraint fails.

Graph Speech Act: Click here to display a graph of the current speech act.

Graph Sentence: Click here to display a graph of the sentence structure so far.

Load Speech Act: Allows the user to load in a new speech-act from the memory-resident examples (defined using the Examples facilit y). (TEMPORARILLYNOT FUNCTIONAL).

7.6. Viewing Sentence StructureIn the bottom-left-hand corner of the window is a display of the grammatical

structure of the current unit as it is built up. As each feature is selected, its realisationrules are asserted, and the accumulated structure is shown here.

Alternatively, press the Graph Sentence button to see a graph of the sentence as sofar generated. Use the graph menu (click on any node of the graph) to explore thestructure using the Resource Explorer.

Alternatively, double-click on the Present-Unit identifier, shown in the top-right-hand corner of the interface. This will t ake you directly to the Resource Explorer cardfor this unit.

8. Debugging Lexification

Sometimes the generation process results in a sentence other than the one you want.Often, this is because an unexpected lexical choice was made. To help discover wherethe lexification process went wrong, WAG includes a lexification debugger, whichallows you to step through the lexification process, so you can see where the problemarises.

To open the Debugger, select Debug Lexification from the Generation menu. Awindow similar to that in figure 9.


Lex-ID: UNIT323 Sem-id: M3Lex-IDsunit323unit324unit330unit364unit351

Lex Featurespositive-modalnonreduced-m…

Inflect Features

Candidateswill-auxwould-auxmust-auxmight-auxcan-aux1can-aux2could-aux2could-aux1may-aux

Sem Featuresabilitynonvolitionalnonconditionalmodal-qualityqualityideational-unitroot

Lexifiers

Step Current:

Selection:

Message:

Figure 9: Debugging the Lexification Process

Lex-IDs: The Lex-id box contains a list of all l exical items contained in the lastgenerated sentence. Click on one of these and the interface will display informationabout this lexical item, namely:

Lex-Id: The unit-identifier of the word-rank grammatical-unit.

Sem-Id: The unit-identifier of the referent of the item.

Lex Features: The lexical features which the generation process required for thelexical item.

Inflect Features: The inflectional features which the generation process requiredfor the lexical item.

Candidates: A list of the lexical-items which are syntactically appropriate, beforesemantic filtering.

Sem Features: The semantic features of the referent, ordered in terms ofdecreasing delicacy.

Lexifiers: The lexical items which express the currently selected semantic feature.

The lexification proceeds as follows: the program takes each semantic feature in turn,and looks up the lexical items which include that feature. This set is intersected with theset of grammatical candidates, and the intersection is displayed in the Lexifiers field.Each of these are unified in turn against the full semantic constraint, to see if all of thesemantic features of the lexeme are compatible with sem-id. if so, the item is selected.Otherwise, the next lexifier is tried. If no lexifiers of this feature are appropriate, thenthe next semantic feature is tried, until some appropriate lexical item is found.

Use the Step button to step through this process. To reset and start again, click on theitem in the Lex-Ids field.

10. Random Sentence generation

[To be Described]


11. Using the Coder to Dr ive Generation

The WAG Coder can be used as an interface to generation, allowing you to stepthrough the feature selection of a unit, making all the choices.

This interface is ideal for testing sub-parts of the grammar. The Coder interfaceallows you to try alternative selections in systems, so that you can test a range ofstructures which may be of interest.

See the WAG Coder Manual for instructions in its loading and use.

• Load the grammar from which you want to generate.

• Specify the Coder Start feature to be grammatical-unit (or whatever the root ofyour grammar network is).

• Step through feature selection. At any point, press the Generate button, and theCoder will default the remaining choices and generate a unit.

• In this manner, you can generate units of any kind, e.g., clauses, groups orwords.

Often we know the sentence we want, but not what features it has. We can use theCoder to try grammatical variants until we produce a sentence with the samegrammatical structure we are aiming at. We can then extend/modify the semanticconstraints to ensure these features are selected.


Chapter 4

Architecture of the WAG Sentence Generator

This section explores the inner workings of the WAG Sentence Generation System,firstly in terms of its theoretical architecture, and then in terms of the processingalgorithms.

1. Theoretical Issues

This section discusses some methodological issues in the construction of a sentencegeneration system.

1.1. Control StrategiesMost NLP can be viewed as a process of translating between strata: building a target

representation based on a source representation. Control strategies handle the mappingbetween any two representational levels. In a tri-stratal system, we need two controlstrategies (assuming a conduit architecture): one between micro-semantics and lexico-grammar; and one between lexico-grammar and graphology.

a) Between Micro-Semantics and Lexico-Grammar : To produce a lexico-grammatical representation from a semantic representation, we can use either a source-driven (data-directed), or a target-driven (goal-directed) strategy:

i) Source-Dr iven: each feature of the semantic specification has associated lexico-grammatical constraints (the lexico-grammatical consequences of the semanticfeature). To build a lexico-grammatical structure, we take each feature of thesemantic specification in turn, and apply its lexico-grammatical realisations.This is repeated for each unit in the structure. In this way we build up a lexico-grammatical structure.

ii ) Target-Dr iven: the inter-stratal mapping constraints are represented as semanticconstraints on lexico-grammatical features. To build a lexico-grammaticalstructure that encodes the semantic input, we traverse the lexico-grammaticalnetwork, choosing a feature in each system whose semantic constraint matchesthe semantic specification. Basically, we build a lexico-grammatical structure inthe grammar’s own terms, although the choices are constrained by thesemantics.

Most of the Systemic generation systems use the target-driven approach (e.g.,Penman, Proteus, Genesys). The following quote from Matthiessen (1985) demonstratesthis for the Penman system:

"In Nigel ... initiative comes from the grammar, the general control of what happenscomes from the entry conditions of the systems. It is not the case that the semantic


stratum has its own control, does its work and presents the results to the grammarfor realisation. Instead, it is controlled by the entry conditions of the systems."

Patten’s SLANG system is the exception: grammatical features are preselected as therealisations of the semantic features:

“Features at the semantic stratum may have realisation rules which preselectgrammatical features. Similarly, grammatical features may preselect features fromthe phonological/orthographic stratum.” (1988, p44).

We can also distinguish between resource-driven and representation-driven systems:a resource-driven system uses the resources to select the next rule or constraint to apply,while a representation-driven system uses the information in the representation tocontrol the structure-building. Penman uses a mixture of both -- firstly, it isrepresentation-driven to the extent that the unit-of-focus -- the element being expanded -- is chosen in reference to the lexico-grammatical representation: we start with the topelement (the clause), and successively expand elements down towards the leaves of thetree. This represents a top-down, depth-first generation strategy. However, within eachunit, the construction is resource-driven: the system network is used to control theconstruction of each unit’ s internal structure. The unit is constructed by a forward-traversal through the network, asserting the realisations of each feature selected. Insimpler terms, the node-selection strategy is representation-driven, and the rule-selection strategy is resource-driven.

The WAG generator follows the Penman tradition, using the lexico-grammar tocontrol the generation, expanding units in a top-down, depth-first manner. Each unit isconstructed as a result of a traversal of the system network. Section 4.2.4 below willdiscuss the strategy in more detail .

b) Between Lexico-grammar and Graphology: While Penman and WAG both use atarget-driven control strategy between semantic and lexico-grammar, in thegraphological construction, control is source-driven. The source, in this case, is thelexico-grammatical structure. The process finds the leaves of this structure (word-rankelements), and recovers the associated lexical-items. Using these items, and inflectionalfeatures, the appropriate graphological-forms are generated, and printed (withformatting, e.g., capitalisation, spacing, etc.). Graphological generation in the WAGsystem will be discussed more fully in section 4.2.6 below.

1.2. Deterministic vs. Non-Deterministic GenerationThis issue is most often discussed in relation to parsing -- whether the parser resolves

each choice before continuing (deterministic parsing), or whether it explores eachalternative (non-deterministic parsing).

These same possibiliti es apply for generation also. We may reach a point in thegeneration process where two alternative means of expressing the semantics both seemvalid. Often, each of the choices will l ead to appropriately generated sentences, thechoices representing alternative means of realising the meaning.3 In other cases,however, some choices may lead to a dead-end in the generation process -- noappropriate realisation is possible. This has been called a generation gap (Meteer 1990).

3In a full y-constrained system, all differences in form would be linked back to differences in meaning.

However, at present, it is diff icult to assign meaning differences to all form differences, e.g., the semanticdifference between “ I said that he was coming” and “ I said he was coming” . Such differences are defaulted in theWAG system.


Generation gaps occur because choices are often dependent on each other -- if wemake the wrong choice at one point, there may be no valid alternatives at a later choice-point. For instance, Meteer (1990, p63) gives an example of the generation of a processinvolving someone deciding something important. At one point in the generation, weface a choice between congruent realisation -- He decided -- or an incongruentrealisation -- He made a decision. Both choices seem equally valid. However, theincongruent choice allows the ‘ important’ characteristic to be expressed -- He made animportant decision -- while the congruent choice does not -- *He decided importantly.

When the decisions on which a particular choice depends are not made before thechoice is reached, then we have a determination problem -- the choice cannot beresolved. I will discuss below the two types of solution to this problem -- forcing adecision (deterministic generation), and following all alternatives (non-deterministicgeneration).

Non-Deterministic Generation: A non-deterministic generator doesn’ t make a definitedecision between alternatives, but either chooses one tentatively, or follows allalternatives simultaneously. The same strategies that are available for non-deterministicparsing are also available for generation:

• Simultaneous Generation: all options are carried forward at the same time.This option includes ‘chart generation’ , along the lines of chart parsing (cf.Haruno et al. 1993).

• Backtracking Generation: at each choice-point, an arbitrary decision is made.When a generation dead-end is reached, the generator backtracks to the lastchoice-point, makes a different choice, and proceeds from there. At one stage Imodified the WAG generator to allow backtracking. However, this generationwas very ineff icient due to the large size of the backtracking stack which neededto be saved. For this reason I have switched to deterministic generation4.

Deterministic Generation: In deterministic generation, the process resolves choices asthey are reached. A problem for this approach is that there is not always suff icientinformation to make the decision available.

Matthiessen (1988a) points out one problem-case for non-deterministic Systemicgeneration: “How is the situation to be avoided where a chooser is entered before all thehub associations needed are in place?” (p775). In terms of WAG, this problem is statedas follows: the generator wishes to test a feature selection-constraint which includes areference to the Referent role of some unit. However, the fill er of the Referent role hasnot yet been established. The establishment of the Referent role is performed in someother system, which has not yet been entered. The feature selection-constraint thuscannot be tested.

This kind of problem is common in writing Systemic grammars of reasonablecomplexity. For instance, when choosing between the features single-subject and plural-subject (concerning Subject-Finite agreement), the selection-constraints refer toSubject.Referent, but the Subject’s Referent role may not have been established yet. It isestablished in a simultaneous system, where the Subject is conflated with either theAgent, Medium or Beneficiary.

4A variation of this approach stores only the choice made at each decision point, and not the generation

environment. When generation fail s, the process goes back to the beginning of the generation and re-creates thestructure, varying only the last choice. This approach has the advantage of far less storage space requirements.However, the same structure-building work might be done over and over, meaning that this approach will be slowif any degree of backtracking occurs. The approach is appropriate if the number of backtracks is assumed to bevery small , e.g., the first path is li kely to succeed, but we allow for the possibilit y of failure.


Nigel and WAG have avoided such non-deterministic problems, by careful writing ofthe lexico-grammatical and interstratal resources. However, this is a case where theresources are being shaped by the needs of the process, a practice which should beavoided, if possible, since the resources lose their process-neutrality.

Matthiessen (1988a) proposed one solution which avoids the re-wiring of thegrammar. He proposes a least-commitment strategy -- whenever a grammatical choicecannot be resolved, then we should make no commitment, but rather postpone thedecision until a later point. There are potentially other grammatical decisions which canbe made without waiting for this one (e.g., simultaneous systems). The system is pushedto the end of the systems-to-be-resolved queue.

This is a good solution for some cases, since it doesn’ t require any change to theresources -- only the traversal algorithm is affected. However, the solution cannot beused in two situations:

1) Inter-Dependency: there may be cases where two decisions depend on eachother. Each decision cannot be resolved until the other decision is resolved.

2) Referent resolved in more delicate system: sometimes the Referent is resolvedin a more delicate system, rather than in a simultaneous system. No amount ofdelay will solve the problem.

We could write the resources to avoid these situations (while allowing cases whichcould be solved using least-commitment). Alternatively, we could introduce morecomplex processes which know how to obtain as needed the information required toresolve the choices. I will not discuss this further here, except to say that the concept of‘ look-ahead’ fr om parsing could perhaps be applied profitably.

2. WAG's Generation Process

This section describes the algorithms for sentence generation used in the WAGsystem. These algorithms are fairly identical to Penman’s at a gross level, but differ inthe way these steps are implemented. A list of the ways the WAG implementationimproves on the Penman system are given in section 5 below. Note that WAG doesn’ tinclude any code from Penman, it is a total re-write.


2.1. The General Algor ithmWAG’s sentence generation algorithm is shown in figure 10. Each of these steps will

be discussed below.Micro-Semantic Specification

Pre-Processing of Micro-Semantic Specification

Bui ld Lexico-Grammatical Structure

Select Appropriate Lexical Items

Bui ld Graphological String

Text

Figure 10: The Sentence Generation Algorithm

2.2. Initial Processing of the InputBefore generation begins, the input is processed. This processing involves three

steps:

1. Assertion of the Semantic Specification into the KRS: the input specificationis ‘parsed’ , analysing it in terms of the various roles and fields, and thisinformation is asserted into WAG’s knowledge-representation system.

2. Deriving Implied Structure: the program derives any additional structuralinformation it can from the partial specification. For instance,

• Deriving feature information from asserted roles;• Deriving additional roles from asserted features.

These steps are repeated for each element of the semantic specification, in a top-down manner, until all roles are processed.

3. Defaulting of Unspecified Choices: After the prior step, there will still besystems which are unresolved. Some of these systems are defaulted. Onlysystems containing features drawn upon in the interstratal mapping constraintsneed to be defaulted -- others can be left unspecified. In those systems which aredefaulted, features are chosen arbitrarily,5 except where the user has expressed apreference (see discussion on feature defaulting above).

The result of the input processing stage is what I term a fully-specified semanticform. ‘Fully-specified’ refers to the fact that -- in each unit of the semantic

5If no user-default is specified, the program takes the last feature in a system. This is because many systems

have a no-reali sation alternative, and Systemicists tend to place these features last. A more intelli gent programwould automaticall y discover the no-reali sation alternative.


representation -- the features which are relevant for lexico-grammatical processing havebeen specified. This is required for deterministic generation, as discussed above.

2.3. Lexico-Grammatical ConstructionThe goal of the Lexico-Grammatical Construction stage is to build a lexico-

grammatical structure which encodes the semantic input. Section 4.1.2 above comparedtwo different control strategies for lexico-grammatical construction: source-driven andtarget-driven. The WAG system, in common with most Systemic generators, is target-driven -- the construction is based on expanding the lexico-grammatical representation(constrained by the micro-semantics), rather than by realising the semanticrepresentation.

Figure 11 shows the basic algorithm behind lexico-grammatical construction inWAG. It defines a top-down, breadth-first, left-to-right construction process (seechapter 9 of my thesis for a description of these terms). In other words, we first buildthe structure of the top-most unit (the clause or clause-complex), and then build thestructure of each of the unit’ s constituents, and so on down to word-rank units. Thistype of generator can thus be called a ‘rank-descent’ generator.

Push clause unit onto empty stack

Pop next unit from stack

Build Current Unit's Immediate Structure

Push Consti tuents of Current Unit onto Stack

succeed

succeed

succeed

succeed

failEnd

Figure 11: WAG’s Lexico-Grammatical Construction Process

This algorithm uses a stack data-structure. A stack is a data-structure used for storingitems. It is basically a last-in, first-out queue. You ‘push’ an item onto the stack -- placean item at the front of the queue. You can push other items on top of this. You can also‘pop’ an item, meaning that you take the item from the top of the stack. See figure 12.


Uni t

top-of-stack

push pop

Stack

Uni t

Uni t

Figure 12: The Unit Stack

The stack is used to store the constituents-to-be-processed. The process starts offwith only one element on the stack -- the sentence unit. At this point, the information inthis unit is minimal, just a specification that the unit is a clause6, and a pointer to theReferent (semantic content) that this clause-unit is expressing.

Processing then begins: the top element is ‘popped’ off the stack, the system networkis traversed to build up its feature-list, and the realisation statements associated withthese features are applied, thus building the immediate structure of the unit.

When the element’s immediate structure is complete, we then need to complete thestructure of each of its constituents. So we push each of these constituents onto the Unit-Stack, and cycle back to the beginning of the process: pop the next unit off the stack,process this, and so on.

We continue popping and processing units until there are no units left to process.This occurs when all constituents of the sentence-tree have been fully specified. We thusgo on to the bubble labelled ‘End’ in figure 11. We are now ready to move onto the nextstage of the generation process -- lexical selection7.

Immediate-Structure Construction: I will now provide more detail about theimmediate-structure building stage of the generation process. Following sections willfocus on two aspects of this stage -- forward-traversal and constituency ordering.

The construction within each unit of the target is resource-driven -- controlled by thetraversal through the system network (from left to right). In each system, the programchooses a feature whose semantic constraints are compatible with the semantic input.The structural realisations of this feature are then asserted, and the process advances tothe next enterable system. When all enterable systems are processed at that rank, theunit is complete. Figure 13 shows the algorithm for generating the immediate structureof a unit. It is reasonably similar to the flowchart proposed by Matthiessen & Bateman(1991, p106), but has been developed separately.

6The feature clause leads on to both clause-simplex and clause-complex.

7The lexical selection process could be performed intermixed with the lexico-grammatical construction. If so,then the processor would then advance to graphological reali sation.


Set *waiting-systems* list to 'rank-system'

succeed

succeed

succeed

fai l

End

Enter Next System

Advance to Next Feature

Test Feature Against Preselections

Order Consti tuents

fai l Error

fai l

Test Feature's Semantic Constraints

Add New Systems to *waiting-systems*

Assert Feature's Realisations

succeed

succeed

fail

succeed

succeed

Figure 13: The Immediate-Structure Building Algorithm

A brief summary of each of these steps follows:

1. Set *Waiting-systems* list to ‘ rank-system’ : the variable *Waiting-systems*contains the list of systems which are waiting for processing, i.e., those systemswhose entry conditions are satisfied at the present point of traversal, but whichhave not yet been 'entered' (no feature has been selected as yet).

2. Enter next system: The next system from the *Waiting-systems* list isretrieved, becoming the *current-system*. At this point, the features of thesystem are ordered by various means.

a) Initial ordering: Internally, the features of a system are ordered as theyappear in the system definition. The user can set a variable * feature-selection-mode* which will change this default ordering in two ways:

• Reverse-order: the list of features is reversed before further processing.This is useful for sentence generation, since the last feature in a system isusually the one with no realisation attached.

• Random-order: the list of features is jumbled.

b) Preferential ordering: The user can specify a li st of ‘ preferred features’ --features that will be considered before any other features. These preferredfeatures are put at the front of the list resulting from (a) above.

3. Advance to next feature: the next feature from the system is selected. If thereare no more features in the system (no valid alternatives), then processing fails -- either the resources contain inconsistencies, or we have reached a generationgap.

4. Test feature against preselections: the feature is tested against thepreselections for this unit. If the feature is consistent with the preselections,processing continues, else the process returns to (3) to try another feature fromthe system.


5. Test feature’s semantic constraints: the feature’s selection constraint (seechapter 6 of my thesis on inter-stratal mapping) is tested, and if it is consistent,it is chosen, otherwise the process goes back to try another feature from thesystem.

6. Assert feature realisations: the realisations of the feature are asserted, with theexception for the :order and :partition rules, which are stored for laterapplication. These realisations are applied at the end of the traversal, when weknow which roles conflate, which are presumed, and which of the optional roleswere actually inserted. If the assertion of the realisations fails (the realisationsare inconsistent with grammatical information from other features), an error issignalled. This should not happen as Systemic grammars should be soconstructed in a way that any legal combination of features is realisable.

7. Add new systems: The program checks for any systems which becomeenterable with the addition of the chosen feature. These systems are added to the*Waiting-systems* list. Penman and WAG both keep a list, for each feature, ofthe entry-conditions which the feature appears in. Thus, after selecting a feature,we do not need to check all systems to see if they have become enterable, butonly those on this li st. Also, any systems which have already been entered areautomatically ignored.

8. Order constituents: When all systems have been processed, the sequence rules(order and partition) are processed, placing the units in their surface ordering.See section below for more details.

Forward-Traversal Algor ithms: I have outlined Systemic generation as forward-traversal through the system network. If there were no simultaneous systems in a systemnetwork, traversal would be a simple matter of selecting a series of features in a singlepath from root to leaf. However, networks allow simultaneous systems, so the order inwhich systems are processed is not totally determined. Simultaneous systems can beentered in any arbitrary order. This gives rise to two alternative strategies for networktraversal:

1) Depth-first: The systems which extend from the last selected feature areprocessed before simultaneous systems. Traversal follows one branch of thenetwork to the leaves before exploring others. Depth-first traversal is achievedby placing newly activated systems at the front of the *waiting-systems* list, sothat they will be processed first.

2) Breadth-first: Systems at the same systemic depth are processed before thesystems which depend on them. Breadth-first traversal is achieved by placingnewly activated systems at the end of the *waiting-systems* list, so that theywill be processed last. Systems which were already on the list representsimultaneous systems, and they will be processed first.

The choice between these strategies doesn’ t affect processing eff iciency, since allentered systems have to be processed anyway. The order of entry should not affect theresults of the generation process.

Sequencing of Constituents: This section describes the algorithm used to sequencegrammatical constituents in the WAG generator. It represents a very succinct methodfor sequencing units systemically. Sequencing is applied after the traversal is complete,since it is only at this point that we can be sure which of the elements marked asoptional are actually included, and also which functions conflate together.


The WAG formalism uses two sequence operators:

• order: indicates absolute ordering (adjacency), e.g., (:order A B C) indicatesfunction A immediately precedes function B, which immediately precedesfunction C.

• partition: indicates relative ordering, e.g., (:partition A B) indicates function Aprecedes function B, but not necessarily adjacently.

Two other aspects need to be discussed:

• Optionali ty: Elements of a sequence rule can be optional (need not actuallyoccur in the final structure). Optional elements are designated by beingparenthesised, e.g., (:order Subject Finite (Negator) ).

• Front & End: The sequencing of a unit in relation to the front and end of thegrammatical unit can be indicated by inclusion of pseudo-functions 'Front' and'End' in the sequence rule, e.g., (:order Front Subject ), (:order PunctuationEnd).

To exempli fy the processing, I will assume a clause which, after all systems areentered, has the following sequence rules:

Order : Punct ^ End;Pred ^ Object;Subject ^ Finite ^ (Negator)

Partition : (Modal) # (Perf) # (Prog) # (Pass) # Pred

1. Sequence Rule Preparation: The order and partition rules are standardised throughthree steps:

a) Removal of optional elements: the order/partition rules may contain optionalelements. Any optional element which is not present in the structure is removedfrom the rule. The sequence rules shown above simpli fy to those below:

Order : Punct ^ End;Pred ^ Object;Subject ^ Finite

Partition : Modal # Pred

b) Standardisation of Role-Labels: Each constituent may have multiple rolelabels (due to conflation). Different rules may refer to one constituent usingdifferent role labels, e.g., assuming that Finite and Modal are conflated, then thefinal two sequence rules in the set from above contain references to one unitusing different role-names. The order/partition rules are standardised so thatonly one role per role-bundle is used.

Order : Punct ^ End;Pred ^ Object;Subject ^ Finite

Partition : Finite # Pred

c) Spli tt ing into two-element rules: Each sequencing of more than two elements issplit i nto a number of binary sequencers. The rules of this example are all binaryafter the elimination of the optional elements, but this is not always the case. Forexample:


Finite # Prog # Pred => Finite # Prog; Prog # Pred

2. Processing of Sequence Rules

To merge the information contained in these sequence-rules, an ordering graph isused -- a data-structure which represents the ordering between any pair of elements. Thesequencing rules are applied to the graph one at a time, as shown in the followingexample.

a) Initial State: The units start out unordered in respect to each other, but orderedin respect to the front (FRONT) and end (END) of the unit, as demonstrated infigure 14(a). In these ordering graphs, a continuous line between roles indicatesadjacency, a line broken by a || indicates that other elements may intercede(partitioned).

b) Order Cycle: Each order rule is applied in turn. The successive effect on theorder graph is shown in Figure 14(b-d).

c) Partition Cycle: Each partition rule is then applied. Figure 14(e) shows theapplication of the one partition in this example.

d) Reading off t he Sequencing: After all the sequence rules are applied, we canread off the sequencing from the graph. In most cases, there is only one paththrough the graph. However, in some situations, sequence is not totallydetermined, and alternative orderings may be possible (for instance, the WAGgrammar does not totally determine the sequence of multiple Circumstances, ornominal Quali fiers). In such cases, the process just takes the first orderingalternative.

a) The initial state of the order graph

End

Subj

Pred

Fin

Punct

Object

Front

b) After applying Punct ^ End

Front EndPunct

Subj

Pred

Fin

Object

c) After applying Pred ^ Object

End

Subj

FinFront Punct

ObjectPred

d) After applying Subj ^ Fin

End

Subj

Front Punct

ObjectPred

Fin

e) After applying Fin # Pred

EndFront PunctObjectPredSubj Fin

Figure 14: Successive States of the Order Graph


2.4. Lexical SelectionIn Penman’s lexical selection algorithm, semantic filtering is applied first: the

semantics provides the set of candidate lexemes which express the ideational type of thereferent. These candidates are then filtered grammatically, and one of the remainingcandidates is chosen:

"Abstractly, there are two ways in which sets of candidate lexical items areconstrained and denotational appropriateness is the first kind of constraint applied.Then grammatical constraints -- such as the requirement that the lexical item be anen-participle -- are used to filter the set of denotationally appropriate terms."(Matthiessen 1985).

The WAG system filters on grammatical grounds first. As a general case, I believethat the Penman approach (semantic filtering first) is best. However, for a variety ofreasons, lexical selection in WAG is quickest when grammatical filtering is performedfirst. This is true particularly for closed-class lexical-items (pronouns, prepositions,conjunctives, verbal-auxili aries, etc.), of which most can be totally resolved throughgrammatical selection only. Even for open class items, grammatical filtering seems tobe quicker.

However, as the size of the lexicon grows, starting with semantic filtering will nodoubt prove more eff icient. it would also be useful i f we cold identify, a priori, whethera particular item was fill ed by an open-class or a closed class item. We could perform asemantics-first filtering for open-class items, and a grammar-first filtering for closed-class items. Unfortunately, it is diff icult to specify exactly what grammatical classes areopen- or closed-class, especially since it is our policy not to build any resourceinformation into the program itself.

The Penman system associates each lexeme with a single ideational feature (orconcept in Penman's terms). The WAG system overcomes this limitation, allowing eachlexeme to be associated with a set of ideational features. For instance, to specify thesemantics for the word "woman", Penman would need to create an ideational featurewoman, which inherits from both female and adult.8 In the WAG system, we canspecify that the lexeme's semantics is (:and female adult), avoiding the need to create anew concept for each combination of features.

2.5. Text and Speech OutputThe sentence generator can produce either text (a graphologically formatted

sentence), or speech output.

1. Text Output: The text output is derived from the lexico-grammatical structure(including lexical items) constructed during the prior stages. The mappingsbetween lexico-grammatical and graphological form are not stated declaratively-- they are encoded in the lexico-grammar-to-text procedure. These resourceswill eventually be declarativised. The graphological string is derived as follows:

a) the lexical items are extracted from the leaves of the lexico-grammaticaltree, in order of occurrence from left to right.

8It is not necessary to specify the feature human, since female in the Penman Upper Model inherits from

human.


b) an appropriate graphological form is generated for each lexeme, given itsinflection feature.

c) the left-most graphological form is capitalised.

d) Spacing: a space character is placed between each graphological form. Somepunctuation symbols modify this rule:

No space before: . , ; ? ! ' " (close quotes)

No Space after: ' " (open quotes)

2. Speech Output: If the speech-output option is selected, the WAG system speaksthe generated sentence using the Macintosh Speech Manager (a text-to-speechprogram). WAG will check the designated gender of the ‘Speaker’ role of theinput specification, and choose a voice appropriately.

Chapter 5

Comparison With Penman

In building the generation component of WAG, I have borrowed strongly fromPenman’s general architecture. However, I have attempted to correct many of the short-comings in the Penman system. Note that WAG uses none of Penman’s code.

1. Similar ities to Penman

The areas in which WAG has borrowed from Penman are:

1) Grammar-dr iven control: WAG uses Penman’s grammar-driven controlstrategy, in common with the majority of Systemic generators.

2) Traversal Algor ithm: The WAG traversal algorithm is similar to Penman's,although the choosing of features in a system is different in WAG - based onevaluating feature selection conditions, rather than traversing a chooser-tree.

2) Upper Modelli ng: Penman's Upper Model is the basis of WAG's ideationalrepresentation, although WAG's Upper Model is represented as a systemnetwork, rather than as a LOOM inheritance network. WAG has also adoptedPenman's means of handling domain knowledge - subsuming domain conceptsunder upper-model concepts rather than relating them directly to the lexico-grammar (see chapter 3 of thesis).

3) Resource Definition and Access: For reasons of resource-model compatibilit ybetween Penman and Nigel, WAG accepts system definitions in Penman'sformat, and can export to this format (although a modified format is preferredfor WAG). Many of WAG's resource-access functions (functions for accessingthe stored Systemic resources) are named identically to Penman's, although theinternal storage of the resources is different. This has been done for codecompatibilit y reasons.


2. Differences from Penman

WAG improves on Penman in several directions:

1. Input Specification: WAG’s semantic specification form is not dissimilar toPenman’s input form -- Sentence Plan Language (SPL) -- except for severalimprovements. In summary, these are:

a) Extended and linguistically-based speech-act network: the speech-actnetwork has been extended to handle a wider range of speech-acts. Thespeech-act categories rest on a firm theoretical basis in the Berry-Martintradition.

b) Treatment of proposition as par t of speech-act: The relation ofpropositional content to speech-act was rather ad-hoc in Penman, where thespeech-act was tacked on to the proposition to be expressed. Speech-functionseems to have been added on as an afterthought to an originally declarative-only system. In the WAG system, the relationship has been clarified, withthe propositional content being treated as a role of the speech-act to beexpressed.

c) Generation directly from KB: WAG allows sentence-specifications toinclude just a pointer into the KB, while Penman requires an ideationalstructure to be specified within each sentence-specification.

d) Designation of Wh-element in elicitations: it is diff icult to designate theelement which should be the wh-element of a question in Penman, the userneeds to directly specify the answer to an identification inquiry, a processwhich involves some knowledge of the internal working of the Penmansystem. WAG allows the user to designate the wh-element (the Requiredelement) in the input specification, in a simple, theoretically-based manner.

e) Extended textual specification: WAG has extended the range of textualspecification possible in the input form. This includes the recoverabilit y,identifiabilit y, and relevance of entities. To match these features in SPL, theuser needs to include inquiry preselections -- forced responses to Penman’sinquiries -- a rather low-level approach.

f) Complex ideational feature specifications: In Penman, each ideational unitcan have only a single feature (e.g., ship ), or at most a conjunction offeatures. WAG allows the user to specify the type of semantic unit using anylogical combination of features, using conjunction, disjunction or negation.

2. Interstratal Mapping: Penman's chooser-inquiry interface has provenproblematic for two reasons:

a) The chooser-inquiry interface is partially procedural, and thus not re-usablefor analysis. The WAG system has replaced the chooser-inquiry interfacewith a declarative mapping system (following Kasper's approach), used forboth analysis and generation.

b) When extending Penman's resources to generate new sentences, it is oftendiff icult to work out what input specification is necessary to get a particularlexico-grammatical form. The main reason for this is the need to work onfour levels of representation:

• Upper-model concepts and structures

• Inquiry specifications


• Chooser Trees

• The System Network

The WAG system simpli fies the mapping process by mapping directly fromgrammatical features to upper-model concepts. It is thus easier to discoverwhat input specification is needed to produce a particular lexico-grammaticalstructure.

3. Structure Building: WAG improves on Penman's structure building in severalways:

a) Conciseness: The Penman code for lexico-grammatical construction is quitelong and involved, having been developed by several programmers over tenyears. Some parts are diff icult to penetrate, even by those who maintain it.The WAG system has the advantage of being designed rather than evolved,and takes advantage of the progress made in Penman. It has also beenimplemented by a single programmer, so is more highly integrated.

b) Sequencing: The Penman formalism system uses four sequencing operators(order, partition, order-at-front, order-at-end). However, most ordering isactually derived from a set of default ordering rules. These default orderingsare not part of the Systemic formalism, but rather an ad-hoc extension.Penman's sequencing information is not suff icient for parsing, since theresources provide mostly default ordering, rather than all possible orderings.

The WAG system has extended the sequencing formalism to allow optionalelements in sequence rules. All ordering in the WAG grammar is donewithout the default ordering resource. The WAG grammar is thus suitablefor parsing as well as generation.

c) Proper handling of disjunctive preselections: Penman does not handlepreselections properly where the preselection includes some disjunction. TheWAG system corrects this problem.

d) Use of a generalised KRS: The Penman system has specialised code fordealing with realisation rules. All of WAG's processing is based on top ofWAG's KRS, which is used for asserting realisation statements duringgeneration or parsing, for asserting or testing feature selection conditions,and for asserting knowledge into the knowledge-base.

4. Lexical Selection: The Penman system associates each lexeme with a singleideational concept. The WAG system overcomes this limitation, allowing eachlexeme to be associated with a set of ideational features.

5. Lexical network into system network: In Halli day's Systemic grammar,lexical features are organised under the lexico-grammar system network - theword-rank sub-network. Penman does not follow this approach.9 For generationpurposes, the lexical features are not organised in terms of a network, andPenman cannot check on the inheritance relations between lexical features. Thefeatures are organised into an inheritance network only for lexical acquisition(see Penman Project 1989), and this information is organised in a Loominheritance network, rather than as part of the Nigel lexico-grammaticalnetwork. The WAG system incorporates the lexical features into the lexico-

9John Bateman has a version of Penman which partiall y corrects this problem.


grammatical network, under the word feature. This resource is used inprocessing to test compatibilit y of lexical features.

6. Single formalism for all l evels: The WAG system uses the same knowledgerepresentation system for all structural representation, including the internalrepresentation of the semantic input (speech-act and ideational content) andlexico-grammatical form. Penman has two systems - Loom is used to representknowledge, and Penman provides its own internal knowledge representationsystem for representing lexico-grammatical structures.

3. Summary

While the WAG generator has only been under development for a few years, and bya single author, in many aspects it meets, and in some ways surpasses, the functionalityand power of the Penman system, as discussed above. It is also easier to use, havingbeen designed to be part of a Linguist’s Workbench -- a tool aimed at linguists withoutprogramming skill s.

The main advantage of the Penman system over the WAG system is the extensivelinguistic resources available. While the WAG system can work with the Nigel grammarat least in the lexico-grammar, I have not yet connected Nigel to the micro-semantics, sosemantic generation using Nigel is not yet possible. The writing of appropriate featureselection-constraints is a task for future development.


Appendix A: Example Semantic Forms

Below I provide examples which demonstrate how particular grammatical forms canbe achieved. These all assume the Dialog resource model is loaded.

As stated in section 2.2 above, WAG has two generation modes:

• *Clear-KB-on-Say* = t : the knowledge base (KB)is cleared between each'say', allowing successive say-forms to refer to the same instances withoutcausing inconsistencies to arise if these instances are assigned differentstructure.

• *Clear-KB-on-Say* = nil : the KB is not cleared between successive say-forms. This allowing successive say-forms to express different subsets of theKB, or allows each evaluated say-form to add information to the KB.

This toggle can be set by selecting Preferences... from the Generation menu.Alternatively, evaluate the following lisp expression before evaluating the say-forms:

(setq *Clear-KB-on-Say* t)

The first set of examples will assume the first mode. Set the mode appropriately.

The examples from this section can be found in the file: Demos:GenerationDemos:Manual Examples .

1. A Simple Utterance

For a simple utterance, we need provide only the speech-act and a proposition:

(say example-1 :text "A man goes to a city." :is propose :proposition (P1 :is (:and motion-process origin-perspective) :actor (Mark :is (:and male adult)) :Destination (Sydney :is city)))

2. Changing Tense

We can add a tense-choice to the speech-act specification:(say example-2 :text "A man went to a city." :is (:and propose simple-past) :proposition (P1 :is (:and motion-process origin-perspective) :actor (Mark :is (:and male adult)) :Destination (Sydney :is city)))

(say example-3 :text "A man will have gone to a city." :is (:and propose future-perfect) :proposition (P1 :is (:and motion-process origin-perspective) :actor (Mark :is (:and male adult)) :Destination (Sydney :is city)))


The tense possibiliti es are:simple-past: I went.simple-present: I go.simple-future: I will go.past-perfect: I had gone.present-perfect: I have gone.future-perfect: I will have gone.

These tense features have no theoretical status, existing only to make semanticspecificatione easier, as discussed in section 6.2 above. Tense can be specified directlyusing the constraint field, which is useful when the event-times of processes are alreadydefined in the KB. For example:

(say example-4 :text "A man had gone to a city." :is propose :proposition (P1 :is (:and motion-process origin-perspective) :actor (Mark :is (:and male adult)) :Destination (Sydney :is city)) :constraint (:and (< Reference-Time Speaking-Time) (< Proposition.Event-Time Reference-Time)))

3. Progressive Aspect

To generate progressive aspect, include continuing-event in the speech-actspecification:(say example-5 :text "A man was going to a city." :is (:and propose simple-past continuing-event) :proposition (P1 :is (:and motion-process origin-perspective) :actor (Mark :is (:and male adult)) :Destination (Sydney :is city)))

4. Referr ing To Entities

Given certain additional information, WAG will automatically select a particular wayto refer to each semantic entity. The various means for controlli ng the this expressionare shown below.

4.1 Identifiabili tyWhen a participant is part of the shared knowledge between the speaker/hearer (or

the speaker believes such), then the speaker can refer to that entity using identifiable-reference, e.g.,

• Definite Deixis: The boy

• Naming: John (if the name is known)

Otherwise indefinite deixis (a boy, some boys) is used.

Identifiabilit y is marked by including the unit-id of the identifiable entity in the:identifiable-entities field of the say-form. Entities are assumed to be unidentifiableunless included in this field. The exception to this is that recoverable entities (seebelow) are assumed identifiable. Refer to chapter 5 of my thesis.


(say example-6 :text "The man went to the city." :is (:and propose simple-past) :proposition (P1 :is (:and motion-process origin-perspective) :actor (Mark :is (:and male adult)) :Destination (Sydney :is city)) :identifiable-entities (Sydney Mark))

If the name is known, it is used (for exceptions, see under relevance below).

(say example-7 :text "Mark went to Sydney." :is (:and propose simple-past) :proposition (P1 :is (:and motion-process origin-perspective) :actor (Mark :name "Mark") :Destination (Sydney :name "Sydney")) :identifiable-entities (Sydney Mark))

4.2 Recoverabili tyShared information which has already been introduced to the discourse, or is part of

the immediate environment (e.g. the speaker or hearer), is called recoverableinformation. Recoverable information can be referred to using pronouns. Other forms ofidentifiable reference are also appropriate (see chapter 5 of my thesis).

Recoverabilit y is marked by including the unit-id of recoverable entities in the:mentioned-entities field of the say-form:

(say example-8 :text "He went to here" :is (:and propose simple-past) :proposition (P1 :is (:and motion-process origin-perspective) :actor (Mark :is (:and male adult)) :Destination (Sydney :is city)) :mentioned-entities (Sydney Mark))

Note: He came here. would be the preferred generation. This is an issue for futurework.

4.3 Speaker and Hearer RolesIf a participant in the proposition has the same unit-id as the fill er of the speaker or

hearer role, then the participant will be lexicalised appropriately, e.g., "I", "my" etc.(say example-9 :text "I went to your city" :is (:and propose simple-past) :speaker (Mark :is (:and male adult)) :hearer (Mary :is (:and female adult)) :proposition (P1 :is (:and motion-process origin-perspective) :actor Mark :Destination (Sydney :is city :owner Mary)))


5. Changing the Theme

The user can specify which entity is to be the theme of the utterance, by including a:theme role in the say-form. Theme is by default the Agent (Actor, Senser, Sayer, etc.)of the process. See Chapter 5 of my thesis for more details.

Nominating a Medium (Actee, Phenomenon, Verbiage, etc.) as Theme will force apassive sentence. Nominating a Circumstance as Theme will front that Circumstance.

(say example-10 :text "Mary was sent by me to Sydney." :is (:and propose simple-past) :speaker (Mark :is (:and male adult)) :proposition (P1 :is sending-process :actor Mark :actee (Mary :name "Mary") :Destination (Sydney :name "Sydney")) :theme Mary :identifiable-entities (Sydney Mary))

To generate an Agentless passive, do not include the :actor role in the say-form, asin example 11 below (alternatively, see the discussion of relevance below). In suchcases (no Agent is contained in the expression), a passive form will result automatically,so the :theme field is not necessary.

(say example-11 :text "Mary was sent to Sydney." :is (:and propose simple-past) :proposition (P1 :is sending-process :actee (Mary :name "Mary") :Destination (Sydney :name "Sydney")) :theme Mary :identifiable-entities (Sydney Mary))

Specifying the head of a circumstantial role as theme will t hematicise that role, asshown in example 12:

(say example-12 :text "To your city, I sent John" :is (:and propose simple-past) :speaker (Mark :is (:and male adult)) :hearer (Mary :is (:and female adult)) :proposition (P1 :is sending-process :actor Mark :actee (John :name "John") :Destination (Sydney :is city :owner Mary)) :theme Sydney :identifiable-entities (John))

It might be desired to produce something of the form Sydney is where I sent John. Atpresent, WAG cannot handle themes being specified below the top level of theproposition. This sentence can however be achieved using a say-form using anidentifying-relation.


6. Varying The Speech Act

Chapter 4 of my thesis sets out the various speech-acts which are possible in theWAG system. The range of speech-acts are represented in figure 15.

el ici t propose support deny-knowledge contradict request-repeat

el ici t-polarity el ici t-content

SPEECH FUNCTION

OBJECT OF NEGOTIATION

action-negotiating information-negotiating

TURN MANA GEMENT

keep-turn pass-turn

negotiatory salutory

greet farewell thank

ini tiate respond

INITIATION

SPEECHACT TYPE

speech-act

Figure 15: The Speech-act Network

6.1 elicit-polar ity

(say example-13 :text "Is Mark going to Sydney?" :is (:and elicit-polarity continuing-event) :proposition (P1 :is (:and motion-process origin-perspective) :actor (Mark :name "Mark") :Destination (Sydney :name "Sydney")) :identifiable-entities (Sydney Mark))

6.2 elicit-contentThe Wh- element is specified by including its unit-id in a :required field:

(say example-14 :text "Where is Mark going?" :is (:and elicit-content continuing-event) :proposition (P1 :is (:and motion-process origin-perspective) :actor (Mark :name "Mark") :Destination (L1 :is 2d-spatial-object)) :identifiable-entities (Mark) :Required L1 :Theme L1)


(say example-15 :text "Who is going to Sydney?" :is (:and elicit-content continuing-event) :proposition (P1 :is (:and motion-process origin-perspective) :actor (X :is human) :Destination (Sydney :name "Sydney")) :identifiable-entities (Sydney) :Required X :Theme X)

6.3 Proposing in response to an elicitationFollowing an elicitation, the speaker need only supply the element which was

elicited. The say-form can specify which element was Required, using a :elicited field,containing the unit-id of the element that was required in the prior elicitation.

(say example-16 :text "Sydney." :is (:and propose required-only) :proposition (P1 :is (:and motion-process origin-perspective) :actor (John :is human) :Destination (Sydney :name "Sydney")) :identifiable-entities (Sydney) :Elicited Sydney)

6.4 Imperative (negotiate-action)The prior examples have all defaulted to negotiate-information (producing

interrogatives and declaratives). Example 17 shows an action-negotiating move.

(say example-17 :text "Go to Sydney!" :is (and propose negotiate-action) :hearer (Mark :is (:and male adult)) :proposition (P1 :is (:and motion-process origin-perspective) :Actor Mark :Destination (Sydney :name "Sydney")) :identifiable-entities (Sydney))

6.5 Other Responding Moves

(say example-18 :text "Sorry?" :is request-repeat)

Not yet working:(say example-19 :text "No." :is contradict)

(say example-20 :text "Yes." :is support)

(say example-21 :text "I don't know." :is deny-knowledge)


6.6 Salutary MovesSince the sentence generator has been designed to operate in a interactive dialogue

environment, salutary moves are also possible (e.g., "hello", "goodbye", "thank you").(say example-22 :text "Good Morning." :is temporal-greeting)

(say example-23 :text "Thank you." :is (:and initiate thank))

(say example-24 :text "You're welcome" :is (:and respond thank))

(say example-25 :text "Good bye." :is farewell)

7. Controlli ng Modali ty

Modality is stated as a role of the proposition. The role is fill ed by a modal-quality,which requires the specification from two systems: Voliti onality (voliti onal vs.nonvoliti onal), and Conditionality (conditional vs. nonconditional). Nonvoliti onalmodality has three sub-types: necessity, possibilit y and abilit y. The lexification of thesetypes is shown below (negated forms not shown):

nonconditional conditional

voli tional will would

necessity must might

nonvoli tional possibili ty can, may could, might

abili ty can could

Table 1: The Modal Semantics

This modal system is borrowed from Penman's Upper Model.

(say example-26 :text "Can I help you?" :is elicit-polarity :speaker (Operator :is human :number 1) :Hearer (Caller :is human) :proposition (P3 :is (:and help-action dispositive) :actor Operator :actee Caller :modality (M3 :is (:and ability nonconditional))) :theme Operator :relevant-entities (Operator Caller))


8. Varying the Process Type

Halli day classifies processes into material, mental, verbal, relational, behavioural andexistential. The Upper Model (and thus WAG) uses all of these categories except forbehavioural. This section will demonstrate the generation of various process types.Materials have already been demonstrated, so I will not show them again.

8.1 Material Processes

8.1.1 Simple Material

8.1.2 Ditransitive Processes

Active: mary gave a book to John.

Recipio-Passive: John was given a book by Mary.

Medio-Passive: A book was given by Mary to John.

8.2 Verbal ProcessesNote that there is at present no way to directly specify the tense of the projected

clause.

(say example-27 :text "John said Mark was going to Sydney." :is (:and propose simple-past) :proposition (P1 :is nonaddressee-oriented :Sayer (John :name "John") :Saying (P2 :is (:and motion-process origin-perspective) :actor (Mark :name "Mark") :Destination (Sydney :name "Sydney") :Constraint (:and (< Event-Time.Start *Speech-Act*.Reference-Time) (> Event-Time.End *Speech-Act*.Reference-Time)))) :identifiable-entities (Mark John Sydney))

(say example-28 :text "John told Mark to go to Sydney." :is (:and propose simple-past) :proposition (P1 :is addressee-oriented :Sayer (John :name "John") :Addressee (Mark :name "Mark") :Saying (P2 :is (:and motion-process origin-perspective) :actor Mark :Destination (Sydney :name "Sydney"))) :identifiable-entities (Mark John Sydney) :prefer (finite conjuncted))

8.3 Mental ProcessesI thought that John was coming.

I believed John to be coming.

I wanted to come.


8.4 Relational Processes

8.5.1 Possession

8.5.2 Att r ibution

8.5.3 Identity

9. Types of Circumstances & Quali ties

10. Clause Complexes

Most cases which Halli day calls a clause-complex, the WAG grammar models as aclause with clausal adjunct, e.g., the following sentence:

I will go when you go.

...consists of a main clause: I will go, and a circumstantial adjunct: when you go.

[GRAPH THIS]

(say example-29 :text "If Mark goes to Sydney, I will eat my hat." :is (:and propose simple-present) :speaker (Jim :is (:and male adult)) :proposition (P1 :is condition :Head (P1a :is eat :actor Jim :actee (h1 :is hat-object :owner Jim)) :Dependent (P1b :is origin-perspective :actor (Mark :name "Mark") :Destination (Sydney :is city :name "Sydney"))))


DIAGRAM of GRAMMATICAL STRUCTURE

11. Grammatical Metaphor

The following examples are stored in file Demos: Generation Demos: Gram-Metafor. These examples require *Clear-KB-on-Say* set to nil . See the instructions atthe beginning of this appendix.

; Stop the KB being reset on each say(setq *Clear-KB-on-Say* nil)(setq *Time-Says* nil)

;;; DECLARE THE KNOWLEDGE BASE;participants(progn (clear-worlds)

; Participants (tell John :is male :name "John") (tell Mary :is female :name "Mary") (tell Party :is spatial)

;Processes (tell arrival :is motion-termination :Actor John :Destination Party)

(tell leaving :is motion-initiation :Actor Mary :Origin Party)

;relations (tell causation :is causative-perspective :head arrival :dependent leaving)

(complete-the-structure 'causation))

(say gramm-met1 :text "Mary left because John arrived." :is (and propose simple-past) :proposition leaving :relevant-entities (John Mary arrival leaving causation) :identifiable-entities (John Mary))

(say gramm-met2 :text "John's arrival caused Mary to leave." :is (and propose simple-past) :proposition causation :relevant-entities (John Mary arrival leaving causation) :identifiable-entities (John Mary) :prefer (nominal-subject active-indirect-agent))


(say gramm-met3 :text "I saw the phoning" :is (:and initiate propose simple-past) :speaker (Caller :is male :number 1) :proposition (P1 :is (:and perception mental-active) :senser Caller :phenomenon (I1 :is phoning :actor (John :name "John") :polarity (P2 :is positive)) :identifiable-entities (John P1 phoning Mary))

12. Content Selection

Relevance


Appendix B

Useful Functions

1. Introduction

This appendix outlines the lisp functions which can be accessed to drive generationfrom other processes, for instance, in a multi -sentential gneration system.

• Say (macro)Description: This is the main form for generating sentences.

Arguments: name &rest Keys

where Keys can be any shown in Table 2 below.

Key Value Description

:isfeature-struct

Sets the the speech-act of the utterance tothe logical expression. Other aspects, suchas tense and aspect, can also be set throughhere. When absent, defaults to propose.

:speaker

unit-id orunit-definition

Either the unit-id of the entity in the KBwhich is the speaker, or a unit-definition(Optional). This role needs only beprovided if you need to refer to thespeaker in the proposition, or for voiceselection in text-to-speech.

:hearer

unit-id orunit-definition

Either the unit-id of the entity in the KBwhich is the hearer, or a unit-definition.(Optional). This role needs only beprovided if you need to refer to the hearerin the proposition, or for voice selection intext-to-speech.

:propositionunit-id or

unit-definition

Either a definition of the proposition, orthe unit-id of the semantic head of theproposition to be expressed. See chapter 2,section 3 above.

:relevant-roleslist of

(unit-id Role1 Role2...)

This list contains, for each entity in theproposition, the roles which are relevantfor expression. Only of use whengenerating from a pointer into the KB.


:themeunit-id

The ideational entity which the speakerwishes thematicised (for English, thismeans fronted position).

:identifiable-entities list of unit-id

List of entities which the speaker assumesknown to the hearer, thus allowing definitereference, e.g., the President.

:mentioned-entitieslist of unit-id

List of ideational entities which havealready been mentioned in the discourse,which allows pronominalisation to beused.

:elicited-entitieslist of unit-id

List of the entities which are being elicitedin an elicit-content move. Typically asingle element.

:preferlist of feature

List of features which will become thedefault during generation. These can beideational, speech-act, or grammaticalfeatures.

:fronted list of roles When the grammar inadequatelyconstrains the ordering of grammaticalroles, this li st is referred to in order toorder them.

:textstring

The text which the say-writer expects to begenerated. Not used in the generationprocess. Provided purely to remind uswhat it is supposed to do.

:commentstring

Any comments you choose to associatewith the form. This field can occur severaltimes, although all are ignored ingeneration.

Say-Example

Generating Directly.Your multisentential text generator might wish to maintain the the various discourse

history variables as part of its own workings. In that case, the say-example for may betoo verbose. Below we outline how to duplicate the effects of this function:

1. Maintaining Var iables

Var iable Use

* identifiable-entities* Maintain this li st as

*mentioned-entities

*elicited-entities*


*temp-preferred-features* Place any features you want as the default temporarill yon this li st. This can be used, for instance, to force aparticular referential expression out of the generator.For permanent defaults, push the element onto the*preferred-features* li st.

* fronted-units*

* relevant-roles*

*gen-display-mode*

*speech-act*

* top* When generation completes, this variable will hold apointer to the top of the grammatical structure. Notsettable by user.

(if *Clear-KB-on-Say*

(clear-worlds)

(in-world 'world1))

(setq *speech-act* Name)

(tell -1 name example-args)

(complete-the-structure name)

(setq *control-strategy* :target-driven)

(setq *top* (my-intern (append-strings (string Name) "-lxg")

(symbol-package Name)))

(generate-utterance * top*)))

2. Calli ng The Generator.

(defun say-example (Name example-args)

(let ((target-text (getf&remf example-args :text)))


Bibliography

Haruno, Masahiko, Yasuharu Den & Yuji Matsumoto 1993 “Bidirectional ChartGeneration Algorithm”, Proceeedings of the 4th European Workshop on NaturalLanguage Generation, Pisa, Italy.

Hovy, Eduard 1993 “On the Generator Input of the Future”, in Helmut Horacek &Michael Zock (eds.), New Concepts in Natural Language Generation: Planning,Realisation and Systems, London: Pinter, pp283-287.

Mann, Willi am C. & Christian Matthiessen 1985 “Demonstration of the Nigel TextGeneration Computer Program”, in Benson & Greaves (eds.), Systemic Perspectiveson Discourse, Volume 1. Norwood: Ablex, pp50-83.

Mann, Willi am C. 1983 “An Overview of the Penman Text Generation System”,USC/ISI Technical Report RR-84-127.

Matthiessen, Christian & John Bateman 1991 Text Generation and Systemic FunctionalLinguistics: Experiences from English and Japanese, London: Pinter Publishers.

Matthiessen, Christian 1985 "The systemic framework in text generation: Nigel", inJames Benson & Willi am Greaves (eds.) Systemic Perspectives on Discourse:Selected Theoretical Papers from the 9th International Systemic Workshop, Norwood,N.J.: Ablex.

Matthiessen, Christian 1988a Text-generation as a linguistic research task, UCLA Ph.D.Dissertation.

Meteer, M. 1990 The “Generation Gap”: the problem of Expressibilit y in Text Planning,Ph.D. Thesis, Computer and Information Science Department, University ofMassachusetts.

O'Donnell , Michael 1994 Sentence Analysis and Generation: A Systemic Perspective.Ph.D. Dissertation, Dept. of Linguistics, University of Sydney.

Paris, Cécile 1993 User Modelli ng in Text Generation, London & New York: Pinter.

Patten, Terry 1988 Systemic text generation as problem solving, Cambridge: CambridgeUniversity Press.

Reichenbach, H. 1947 Elements of Symbolic Logic, Macmillan.

Penman Project 1989 "The Nigel Manual", Penman System Documentation,USC/Information Sciences Institute.

wag: sentence generation manual - wagsoft homepageonce the generation module is loaded, you can...

Documents