universal dependencies for inference

© 2017 Nuance Communications, Inc. All rights reserved.

Universal Dependencies for Inference Valeria de Paiva AI & NLU Lab, Sunnyvale (joint work with A. Rademaker and F. Chalub, IBM Research, Brazil) February, 2017

© 2017 Nuance Communications, Inc. All rights reserved. 2

Outline

•  The corpus SICK •  Universal Dependencies •  Processing SICK with

FreeLing & Parsey •  Analysis •  What’s Next?


•  Stands for Sentences Involved in Compositional Knowledge,

•  Result of 5-year European project COMPOSES(2011-2016) •  SICK data set consists of about 10,000 English sentence

pairs, from captions of images •  original sentences were normalized to remove unwanted

linguistic phenomena; sentences were expanded to obtain up to three new suitable for evaluation; all the sentences were paired to obtain the final data set.

•  Each sentence pair was annotated for (relatedness and) entailment by means of crowdsourcing techniques.

•  entailment annotation led to 5595 neutral pairs, 1424 contradiction pairs, and 2821 entailment pairs

•  All available from http://clic.cimec.unitn.it/composes/sick.html

The Corpus SICK


–  Simple, common sense like sentences: People are walking or One man in a big city is holding up a sign and begging for money.

–  After de-duplication of sentences we have 6076 sentences, with 512 unique verb lemmas, 269 unique adjectives, 148 unique adverbs and 1076 unique nouns.

–  Created from captions of pictures, SICK is about daily activities and entities; ones you can see in pictures taken by real people and which require common sense concepts: people, cats, dogs, running, climbing, eating, etc.

The SICK corpus


–  Full details, code and data included, can be found in our GitHub repository https://github.com/own-pt/rte-sick –  We run the sentences through FreeLing and use its tokenization to match the

UD dependencies, obtained using Parsey McFace. –  From the CoNLL representation of each sentence we obtain its bag-of-

concepts using the Princeton WordNet (PWN) mappings provided by FreeLing.

–  We use FreeLing’s Word Sense Disambiguation (WSD) -- based on Aguirre’s UKB state-of-the-art Personalized PageRank algo -- to pick a favorite sense.

–  We use PWN to SUMO mappings to obtain logical concepts.

Processing the sentences Parsey McFace, FreeLing


Available from http://nlp.cs.upc.edu/freeling/

FreeLing

–  Established project, led by Lluís Padró, UPC (Universitat Politècnica de Catalunya), since 2004. latest release FreeLing 4.0, May 2016

–  FreeLing is a C++ library providing language analysis functionalities: –  morphological analysis, named entity detection, PoS-tagging, parsing, Word Sense

Disambiguation, Semantic Role Labelling, etc. –  for a variety of languages: English, Spanish, Portuguese, Italian, French, German,

Russian, Catalan, Galician, Croatian, Slovene, etc. –  We have been using FreeLing since 2014

A multilingual, open source language analysis tool suite


From https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html

SyntaxNet

–  Project, led by Slav Petrov, Google Research, first release May 2016 –  SyntaxNet, is an open-source neural network framework implemented in TensorFlow

that provides a foundation for Natural Language Understanding (NLU) systems.

–  Parsey McParseface is an English parser trained by SyntaxNet and used to analyze English text, producing Stanford dependencies. Given a sentence as input, it tags each word with a part-of-speech (POS) tag that describes the word's syntactic function, and it determines the syntactic relationships between words in the sentence, represented in the dependency parse tree.

–  SyntaxNet applies neural networks to the ambiguity problem using beam search in the model.

A multilingual, open source language analysis tool suite, based on NNs


Globally Normalized Transition-Based Neural Networks is their paper

Parsey McParseface

–  Given some data from the Google supported Universal Dependencies project, you can train a parsing model on your own machine, they say.

–  On a standard benchmark (the 20 year old Penn Treebank), Parsey McParseface achieves 94% accuracy, beating their own previous state-of-the-art results. On Web text over 90% parse accuracy. (linguists trained for this task agree in 96-97% of the cases) –  Coupled this with Manning’s famous 97% accuracy on pos-tagging… –  But: Alice drove down the street in her car

The English face of SyntaxNet


“our work is still cut out for us: we would like to develop methods that can learn world knowledge and enable equal understanding of natural language across all languages and contexts”

Slav Petrov Research Engineer, Google Research


Processing of SICK

–  Want to do inference with SICK sentences –  Want sentences into 3 buckets:

–  Prop, Prop&Prop, notProp –  Not so easy to bucket them.. –  Need to find all negations, need to find all conjunctions

of propositions –  Examples: A woman is rock climbing , pausing and

calculating the route –  vs –  A man and a woman are walking through the woods

6076 sentences, but only 1814 unique lemmas, yay!


Processing of SICK

What the CoNLL representations look like? Similar to this

6076 sentences, but only 1814 unique lemmas, yay!


Processing of SICK

–  SICK sentences were “expanded” from a core set of sentences in visual corpora, via human constructed transformations such as adding passive or active voice, adding negations, adding adjectival modifiers, etc.

–  Idea was to simplify the linguistic structure, and to create comparisons of different linguistic phenomena.

–  They say: [caption corpora] contain sentences (as opposed to paragraphs) that describe the same picture or video and are thus near paraphrases, are lean on named entities and rich in generic terms.

–  They call this process “normalization”, they provide us with 11 normalization rules. –  Examples next slide

How did they construct their corpus?


Processing of SICK





How did they construct their corpus? (their LREC2014 paper)


Processing of SICK





How did they construct their corpus? More “normalization” rules


Processing of SICK 1.  There are still non-sentences like “A couple standing on the curb”, rule #10. 2.  There are 41 possessive dependencies (genitives like An animal is biting somebody 's hand) and 341 poss ones (for possessive pronouns) as in A man is holding a mask in *his* raised hand, rule #1. 3. There are still a few NE’s in the corpus (e.g “Seadoo”, a kind of jet ski) and ATVs (all terrain vehicles) rule #2 4. There are a few non “present continuous sentences” like (The speaker likes parrots) rule #3 5. Rule #4 ``Replace complex verb constructions into simpler ones” is difficult to judge. But examples similar to the one given (A man is attempting to surf down a hill made of sand è A man is surfing down a hill made of sand) happen in the corpus several times. We have 4 of ``trying to catch” and 37 xcomp dependency relations all together. 6. Rule #5 (Simplify verb phrases with modals and auxiliaries) is hard to check as modals are not marked in the dependencies, as far as I can see. 7. Rule #6 (Replace phrasal verbs with a synonym if verb and preposition are not adjacent) only helps a little. There are 324 particles, most of them corresponding to particle verbs that change meaning (e.g. A toddler is standing up ) gets one concept for standing and one for up. 8. Rule #7(Remove multiword expressions), we have 1148 noun-noun dependency relations and many are mwes. 9. Rule #8 (Remove dates and numbers) worked fairly well, we only have one temporal modifier (a sunny day) but 748 number dependencies and a need to tie this up to real numbers 10. Rule #9 (Turn subordinates into coordinates) also seems to have worked ok. 11. Rule #11(Remove indirect interrogative and parenthetical phrases). Well there are at least 4 real appositives, e.g Four middle eastern children , three girls and one boy , are climbing on the grotto with a pink interior.

The construction rules didn’t work too well…


Results of SICK

–  The curators of the corpus also made an effort to reduce the amount of “encyclopedic knowledge” about the world that is needed.

–  They say “To ensure the quality of the data set, all the sentences were checked for grammatical or lexical mistakes and disfluencies by a native English speaker.” –  Reasonably well, we expect? –  But the negations tell a different story. All (or almost all)

the expletives (539 of them) are negative, but negation is not marked. This wouldn’t be a problem, if these sentences made sense. Many do not.

How well did they do the job of simplifying the language?


Processing of SICK Meaning-preserving Transformations (from LREC2014 paper again)


Processing of SICK Meaning-altering Transformations


Results of SICK

According to stats we have 436 neg dependency relations. They really seem to correspond to negation using the adverb “not”. Clearly we are missing many negations (most if not all the 539 expletives are negative), but the determiners ‘no’ are not marked as negative. This we can correct semi-automatically. But there are many atrocious sentences that do not make sense at all: A person is ignoring the motocross bike that is lying on its side and there is no one is racing by; (maybe the third ``is” is a typo?) or There is no man wearing clothes that are covered with paint or is sitting outside in a busy area writing something. How can we measure how many bad sentences there are? We decided to investigate a reduced sub-corpus.

Negation in SICK-PARSEY


The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

3-4-5: a hand-checked sub-corpus

–  Only one conjunction: Paper and scissors both cut –  13 copulas (4 wrong of them are wrong: The man is training, The man is rock climbing, A woman is grating carrots, The woman is dicing garlic ) –  19 negations marked, but 26 neg expletives, 10 nobodies, 1

no person è should be 56 negations, needs work, we know. –  Compounds? Have 20 nns, some are real compounds: –  ping pong, golden retriever, sumo wrestlers, tiger cub, sumo

ringers, baby pandas, cartoon airplane –  Some are processing mistakes, gerund for the noun, lots (14

in 390) hamster singing, lion walking, panda climbing, etc. –  Also 9 cases of particle verbs (need to update PWN?)

390 sentences (385 nsubj+ 5 nsubjpass)

Problems pale in comparison to WSD


–  Mapping has many issues –  QAD (quick and dirty, Tetrault&Lee) –  Rewriting of CoNLL is necessary for

most sentences, future work…

Princeton WN to SUMO mapping

–  SUMO is intended as a foundation ontology for a variety of information processing systems.

–  First released in December 2000, it uses a version of the language SUO-KIF (first-order logic with bells&whistles) in a LISP-like syntax.

–  A mapping from WordNet synsets to SUMO which makes it special.

–  Many domain ontologies to extend it –  “the largest formal public ontology in

existence today.”

Why SUMO? Language is important

SUMO is open source, upper ontology


Princeton WN-SUMO mapping

text = People are walking 1 People people NOUN NNS _ 3 nsubj _NNS|07942152-n GroupOfPeople= 2 are be VERB VBP _ 3 aux _ VBP|02604760-v|Entity+ 3 walking walk VERB VBG _ 0 ROOT _ VBG|01904930-v|Walking= Kinds of mapping: 1.  Equality: the synset corresponds exactly to the concept (as in Walking= above) 2.  Containment: Concept+, the synset is a subconcept of the SUMO concept 3.  Negation: Concept[ (the negation of the concept) silent= not communicating 4.  A-box: shouldn’t be happening here, but it does. ok for Asian, German?

Why SUMO?

SUMO’s mappings: the theory


Examples

text = Men are sawing 1 Men man NOUN NNS _ 3 nsubj _ NNS|02472293-n|Hominid= 2 are be VERB VBP _ 3 aux _ VBP|02604760-v|Entity+ 3  sawing saw VERB VBG _ 0 ROOT _ VBG|01559590-v|Cutting+

“saw” is a very specific verb and has only one synset in PWN: 01559590-v saw | (cut with a saw; "saw wood for the fireplace") But SUMO has no exact match for this concept, so it maps to Cutting+, a kind of Cutting, the one done with a “saw”. Much worse is the situation with “Man” that occurs in 33 PWN synsets, the WSD alg decided for Hominid=, but Man= (equivalent mapping) would be much better.

Containment


Examples

text = Some people are silent 1 Some some DET DT _ 2 det _ DT|?|? 2 people people NOUN NNS _ 4 nsubj _NNS|07942152-nGroupOfPeople= 3 are be VERB VBP _ 4 cop _ VBP|02604760-v|Entity+ 4 silent silent ADJ JJ _ 0 ROOT _ JJ|00942163-a|LinguisticCommunication[

“silent” is an adjective in 6 synsets, mapped to LinguisticCommunication[, which is to be read as non-LinguisticCommunication. This seems the wrong WSD, but the correct one would give us Communication[ (negated subsuming mapping).

Negation


Examples

1 An a DET DT _ 3 det _ DT|?|? 2 American american ADJ JJ _ 3 amod _ JJ|02927303-a|LandArea@ 3 footballer footballer NOUN NN _ 5 nsubj _ NN|10101634-n|ProfessionalAthlete+ 4 is be VERB VBZ _ 5 aux _ VBZ|02604760-v|Entity+ 5 wearing wear VERB VBG _ 0 ROOT _ VBG|00075021-v|RecreationOrExercise+ 6 the the DET DT _ 10 det _ DT|?|? 7 red red ADJ JJ _ 10 amod _ JJ|00381097-a|Red= 8 and and CONJ CC _ 7 cc _ CC|?|? 9 white white ADJ JJ _ 7 conj _ JJ|00393105-a|White= 10 strip strip NOUN NN _ 5 dobj _ NN|09449510-n|SelfConnectedObject+ The mapping of ‘American’ to related to the continent America is a reasonable hook for future work

A-box


Analysis The numbers

Total SICK mappings: 38671 ? Most need to be discarded 29606 + most need to be made specific 16477 = 171 @ 67 [ Need to deal with negated, a-box concepts Need more bodily process (sleeping..), Need more baby-zoology (tiger, kitten not the same) But the killer is WSD


Conclusion QAD with FreeLing WSD does NOT work

Could repeat Petrov’s words. Could make a more careful corpus, checking its processing. Could rethink WSD. Have done so and included all the PWN senses, for the time being. It’s less wrong!! Considering how to make sure that ontologies pay for their upkeeping, without throwing away our Neural Networks gains.


Future Work Enhancing dependencies with friends

Want to use what we’ve learnt during the summer with the presentations of Schuster, Lewis, Reddy and Stanovsky.They all had rules that applied to dependencies to make them more `semantic’. We also have the AKR rules and the matcher rules to use and improve. At the moment, working on Stanovsky’s (and Fauke’s rules, they work for EN and DE) with Prateek Jain. Also working on the Entailment pairs of SICK. (note that simply concepts can give useful info2)


CODA We are the UD-Portuguese now!

There were two totally automatic treebanks in Portuguese in UDs. Zeman’s version started from Linguateca’s Bosque and did conversions. Google’s UD-BR used web data and no manual input. Started re-annotating Bosque in Oct 2016. We’re now the official UD-PT, as we automatically converted and manually checked (or are checking) Bosque. Paper submitted.

universal dependencies for inference

Education