gretel showcase

25
GrETEL an introduction Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde CCL, KU Leuven

Upload: liesbetha

Post on 28-Jan-2015

122 views

Category:

Education


6 download

DESCRIPTION

Introduction to GrETEL, a search engine for linguists to query treebanks by example

TRANSCRIPT

Page 1: Gretel showcase

GrETEL an introduction

Liesbeth Augustinus

Vincent Vandeghinste Ineke Schuurman Frank Van Eynde

CCL, KU Leuven

Page 2: Gretel showcase

GrETEL

GrETEL (Greedy Extraction of Trees for Empirical Linguistics)

• linguistic search engine taking examples in natural language as input

– you don’t need to know a formal query language

– you don’t need to be familiar with the annotation scheme used in a specific treebank

Page 3: Gretel showcase

3

Nederbooms

GrETEL is created as part of the CLARIN-VL

project “Nederbooms”.

• Originally, only the LASSY treebank (written

Dutch, +/- 1 million words) was supported

• Currently (version 1.2), the CGN treebank

(spoken Dutch, also +/- 1 million words) is

also supported

Page 4: Gretel showcase

4

GrETEL

In the present version of GrETEL, you first

are to decide whether you are

interested in a corpus of spoken Dutch

(CGN) or one of written Dutch (Lassy).

This can be done using the links at the

left, or those in the main text.

Page 5: Gretel showcase

5

Nederbooms.png

Page 6: Gretel showcase

6

Further options

Note that the left navigation bar (previous slide) also offers the possibility to use

• XPath search (formal query language)

• String search (regular text search plus regular expressions)

We suggest not to use these until you are familiar with GrETEL, including the XPath options offered there (with a fall-back option!)

Page 7: Gretel showcase

7

Further options 2

Furthermore, the left navigation bar offers

access to

• Manuals and further documentation

(papers, slide shows, …)

• The Alpino parser, i.e., the parser used for

the corpora as they appear in GrETEL

• Tree viewers, i.e., tools for visualizing

syntactic tree structures

Page 8: Gretel showcase

8

How to …

In the next slides we will show how

GrETEL can be used, step-by-step.

Success !!

Page 9: Gretel showcase

Step 1

• Insert a sentence (or part of a sentence) representative for the type of construction you are interested in.

– Note that this construction plays a major role in what follows. When you are looking for een aantal mensen (several people) as subject, it should function as subject in the example sentence as well

• Click on Submit

Page 10: Gretel showcase

Upper part, 2nd page

(See next slides for explanations wrt the

matrix shown above)

Page 11: Gretel showcase

Lower part, 2nd page

The guidelines for filling out the matrix. They

explain the meaning of the various options (cf

slides 12/13).

Below you can submit your preferences (as

stated in the matrix), return to the previous

page to adapt the input sentence (back), or

insert a new query (at this level same result)

Page 12: Gretel showcase

12

2nd page, step 1

The matrix asks you to state how similar

to the input sentence the results should

be:

The first option (pos) results in similar

constructions

The last option (token) results in exactly the

same construction

Note, however, that for full sentences, the

chance of finding an identical sentence

will be small!

Page 13: Gretel showcase

13

2nd page, step 1 (bis)

The option extended pos allows you to make a

more fine-grained selection. For example to

differentiate between singular nouns and

plural nouns.

• Note that for pronouns, this option will have

more or less the same result as token, as these

have very detailed tags (± 190) in both Lassy

and CGN, almost one per token

The option lemma will search for sentences

with the same word, but not necessarily the

same form of that word.

Page 14: Gretel showcase

14

2nd page, step 1 (bis)

Note that you can mix the options, and that also part of the example can be stated to be optional (cf. slide 10)

• Leaving all parts optional, however, will result in an error message. In that case, clicking on ‘back’ will return you to the page where you can specify what you want.

Clicking on ‘show parse tree’ results in a tree at the bottom of the page (cf next slide)

Page 15: Gretel showcase

Bottom, 2nd page

This is the tree that was automatically created for

your example sentence. It allows you to check

whether the sentence is analyzed correctly. If

not, you may want to adapt the input sentence

slightly (using the back option).

Page 16: Gretel showcase

16

Page 2, step 2

After having selected the relevant parts in the matrix, you have the option to specify whether

• the word order should be respected, i.e. should the subject be in first position, as in the example sentence?

• the dominating node should be ignored. This is mainly relevant for the distinction between main clause and subordinate clause

• extended pos should be split. Recommended in case you may want to adapt the XPath query (later in the process)

This can be done using the Options, cf. slide 10

Page 17: Gretel showcase

Upper part, 3rd page

The trees show which parts were selected: in red

in the left tree, isolated in the right tree

Page 18: Gretel showcase

Lower part, 3rd page

Page 19: Gretel showcase

19

Step 4 (optional)

Below the trees, you will see the XPath

query that was automatically generated,

and reflects your choices. It will be used to

select similar constructions in the corpus.

• you may want to adapt it, but you don’t have to!! Note

that you can always fall back to the original one (link just

above the box with the query)

• Cf below the effect of splitting up a complex tag, thus

making it easier to adapt the query (cf slide 19)

Page 20: Gretel showcase

Step 5

• Here you can select the parts of the corpus you

want to use. Click on Treebank when you want to

use part of the corpus, and after that on the parts

you are interested in.

• You can also specify whether you are interested in

some context.

At the bottom you can submit your choices, reset

them (returns the default state), go back one page, or

submit a new query

Page 21: Gretel showcase

Upper part, 4th page

Above you see the (modified) XPath, which you may want to

download for further use. Below listings of results.

The right one can be shown or hidden

Page 22: Gretel showcase

Middle part, 4th page

Midpage, selected sentences will be shown (cf next

slide)

Page 23: Gretel showcase

Lower part, 4th page

Page 24: Gretel showcase

Lower part, 4th page

This table contains all

results:

• Clicking SENTENCE ID will

show the tree at this page

(previous slide)

• Clicking at the right, either

a full page tree shows up,

or the XML-format

• At the top you may

download all results,

whether in a printer-

friendly or ‘machine-

friendly’ format.

Page 25: Gretel showcase

25

And now …

Give GrETEL a try !!

Success !