more xkwic and tgrep

29
1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006

Upload: renata

Post on 12-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

More Xkwic and Tgrep. LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006. Resources – Laura is bugging me to make a CU Corpora page…. Like this http://www.stanford.edu/dept/linguistics/corpora/cas-home.html - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: More Xkwic and Tgrep

1

More Xkwic and Tgrep

LING 5200Computational Corpus LinguisticsMartha PalmerMarch 2, 2006

Page 2: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

52002

Resources – Laura is bugging me to make a CU Corpora page… Like this

http://www.stanford.edu/dept/linguistics/corpora/cas-home.html

TGREP http://www.stanford.edu/dept/linguistics/corpora/cas-tut-tgrep.html

Page 3: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

52003

Searching with pos tags and !

[word = "[tT]he" & !( pos = "DT" ) ]; wsj

[ !(word = "water" | pos = "NN")]; [ !(word = "water") & !( pos = "NN")]; [ word != "water" & pos != "NN" ];

Page 4: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

52004

Operator precedence

The precedence properties of the (logical) operators are defined by the following list, i.e. if operator x is listed before operator y, operator x has precedence over y. Operators are evaluated left-right

=, !=, !, &, | [ ! word = "water" & ! pos = "NN" ];

disambiguates as [ !(word = "water") & !( pos = "NN")];

Page 5: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

52005

Searching sequences with | and ? "Bill" [pos = "NP"];

[pos = "NP"] [pos = "NP"] [pos = "NP"];

([pos = "NP"] [pos = "NP"]) | ([pos = "NP"] "of" [pos = "NP"]); ([pos = "NP"] "of“? [pos = "NP"]); Note: First match applies

Page 6: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

52006

Corpus Position: wild cards and contexts "give" []* "up"; "give" []{0,5} "up"; "give" []* "up" within 7; "Clinton" expand to 5; "Clinton" expand left to 5; "Clinton" expand right to 5;

Page 7: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

52007

Assignments and Intersect

Q1 = "rain"; Q2 = [pos="NN"]; intersect Q1 Q2;

Q1 = [pos = "JJ"] [pos = "NN"]; Q2 = "acid" "rain"; intersect Q1 Q2; [word = "acid" & pos = "JJ"] [word =

"rain" & pos = "NN"]

Page 8: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

52008

Structural restrictions

"give" []* "up" within s;

("gain" []* "profit") | ("profit" []* "gain") within 3 s;

("gain" []* "profit") | ("profit" []* "gain") within article;

"Clinton" expand left to 2 s;

Page 9: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

52009

Defining structural restrictions

Nounphrase = [pos = "DT"] [pos = "JJ"] [pos = "NN"];

Nounphrase;

[pos = “JJ”]

Go back to select

Page 10: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520010

For fun

<s> [pos = "V.*"][pos = "PN.*”] </s>

<s> []* [pos = "V.*"][pos = "PN.*”] </s>

( [pos = “V.*”] [pos = “PN.*”]) within s

Not a question, not beginning of sentence…

Page 11: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520011

less is more

less <filename> cat ??/* | less Switches

SPACE – next screenful b– previous screenful /<reg exp pattern> /RNR search for pattern ?<reg exp pattern> search backwards for

pattern q - quit

Page 12: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520012

Searching for a word

tgrep Halloween – what happens? Why don’t you have to specify a file?babel>grep tgrep .cshrc

# tgrep stuff

#setenv TGREP_CORPUS /corpora/treebank2/tbl_075/tgrepabl/brwn_cmb.crp

setenv TGREP_CORPUS /corpora/treebank2/tgrepabl/wsj_mrg.crp

Count results: tgrep research | wc –l cat ??/* | grep Halloween | wc -l

Page 13: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520013

Tgrep Switches

-a Match on all patterns in a sentence -w Return the whole sentence -n Put the entire string on one line -t Print only the terminals

Page 14: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520014

Viewing it in sentential context tgrep –wn Halloween | more

tgrep –wn research | more (20,865 hits)

Can also use less

Page 15: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520015

Viewing it in sentential context tgrep –wn research | more

Page 16: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520016

Searching by POS tgrep NNS | more

Another way to do your sanity check

Page 17: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520017

See more data?

tgrep NNS | grep . | more

Page 18: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520018

Sentential context (again) tgrep –wn NNS | more

Page 19: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520019

Searching by syntactic constituent tgrep NP | more

Page 20: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520020

Single-line outputs tgrep –n NP | more

Page 21: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520021

Viewing tree-like output tgrep –w NP | head 20

Page 22: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520022

Searching for relations between nodes tgrep ‘NP < CC’ | head -16

Page 23: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520023

tgrep –g (whole language)

A < B – A immediately dominates B A < B – A is immediately dominated by B A << B – A dominates B A >> B – A is dominated by B A . B – A immediately precedes B A .. B – A precedes B A<<,B – B is the leftmost descendent of A A<<‘B – B is the rightmost descendent of

A

Page 24: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520024

Alternation

node names can be ORed e.g. tgrep ‘Clinton|Gore’ | head

Page 25: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520025

Character classes

Regular expressions tgrep ‘/[Cc]hild/’ | egrep . | head

Page 26: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520026

Working towards that weird example… tgrep ‘/[Pp]resident/’ | head

Page 27: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520027

Combining alternation and a regular expression tgrep ‘Clinton|Gore|[Pp]resident/’ |

head

Page 28: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520028

Searching for a transitive verb

tgrep -w 'VP << like < NP << DT' | more

Page 29: More Xkwic and Tgrep

LING 5200, 2006 BASED on Kevin Cohen’s LING

520029

Verbs + Particles

tgrep -w 'VP << kick' > kick

tgrep 'VP << /kick.*/ <2 PRT' kick

tgrep 'VP <1 VB <2 PRT' kick

tgrep -nw 'VP <1 /VB.*/ <2 PRT' kick

tgrep 'VP <1 (VB < kick) <2 PRT' kick

tgrep 'VP <1 (/VB.*/ < kick) <2 PRT' kick