roee aharoni and yoav goldberg - acl member portal › anthology › attachments ›...
TRANSCRIPT
![Page 1: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/1.jpg)
Split and Rephrase: Better Evaluation and a Stronger Baseline
Roee Aharoni and Yoav GoldbergNLP Lab, Bar Ilan University, Israel
ACL 2018
![Page 2: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/2.jpg)
Motivation
![Page 3: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/3.jpg)
Motivation• Processing long, complex sentences is hard!
![Page 4: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/4.jpg)
Motivation• Processing long, complex sentences is hard!
• Children, people with reading disabilities, L2 learners…
![Page 5: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/5.jpg)
Motivation• Processing long, complex sentences is hard!
• Children, people with reading disabilities, L2 learners…
• Sentence level NLP systems:
![Page 6: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/6.jpg)
Motivation• Processing long, complex sentences is hard!
• Children, people with reading disabilities, L2 learners…
• Sentence level NLP systems:
• Dependency Parsers
McDonald & Nivre, 2011
![Page 7: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/7.jpg)
Motivation• Processing long, complex sentences is hard!
• Children, people with reading disabilities, L2 learners…
• Sentence level NLP systems:
• Dependency Parsers
• Neural Machine TranslationKoehn & Knowles, 2017
![Page 8: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/8.jpg)
Motivation• Processing long, complex sentences is hard!
• Children, people with reading disabilities, L2 learners…
• Sentence level NLP systems:
• Dependency Parsers
• Neural Machine Translation
• Can we automatically break a complex sentence into several simple ones while preserving its meaning?
Koehn & Knowles, 2017
![Page 9: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/9.jpg)
The Split and Rephrase Task
![Page 10: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/10.jpg)
The Split and Rephrase Task• Narayan, Gardent, Cohen & Shimorina, EMNLP 2017
![Page 11: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/11.jpg)
The Split and Rephrase Task• Narayan, Gardent, Cohen & Shimorina, EMNLP 2017
• Dataset, evaluation method, baseline models
![Page 12: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/12.jpg)
The Split and Rephrase Task• Narayan, Gardent, Cohen & Shimorina, EMNLP 2017
• Dataset, evaluation method, baseline models
• Task definition: complex sentence -> several simple sentences with the same meaning
![Page 13: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/13.jpg)
The Split and Rephrase Task• Narayan, Gardent, Cohen & Shimorina, EMNLP 2017
• Dataset, evaluation method, baseline models
• Task definition: complex sentence -> several simple sentences with the same meaning
Alan Bean joined NASA in 1963 where he became a member of the Apollo 12 mission along with Alfred Worden as back up pilot and David Scott as commander .
![Page 14: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/14.jpg)
The Split and Rephrase Task• Narayan, Gardent, Cohen & Shimorina, EMNLP 2017
• Dataset, evaluation method, baseline models
• Task definition: complex sentence -> several simple sentences with the same meaning
Alan Bean joined NASA in 1963 where he became a member of the Apollo 12 mission along with Alfred Worden as back up pilot and David Scott as commander .
![Page 15: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/15.jpg)
The Split and Rephrase Task• Narayan, Gardent, Cohen & Shimorina, EMNLP 2017
• Dataset, evaluation method, baseline models
• Task definition: complex sentence -> several simple sentences with the same meaning
Alan Bean served as a crew member of Apollo 12 . Alfred Worden was the backup pilot of Apollo 12 . Apollo 12 was commanded by David Scott . Alan Bean was selected by Nasa in 1963 .
Alan Bean joined NASA in 1963 where he became a member of the Apollo 12 mission along with Alfred Worden as back up pilot and David Scott as commander .
![Page 16: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/16.jpg)
The Split and Rephrase Task• Narayan, Gardent, Cohen & Shimorina, EMNLP 2017
• Dataset, evaluation method, baseline models
• Task definition: complex sentence -> several simple sentences with the same meaning
• Requires (a) identifying independent semantic units (b) rephrasing those units to single sentences
Alan Bean served as a crew member of Apollo 12 . Alfred Worden was the backup pilot of Apollo 12 . Apollo 12 was commanded by David Scott . Alan Bean was selected by Nasa in 1963 .
Alan Bean joined NASA in 1963 where he became a member of the Apollo 12 mission along with Alfred Worden as back up pilot and David Scott as commander .
![Page 17: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/17.jpg)
This Work
![Page 18: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/18.jpg)
This Work
• We show that simple neural models seem to perform very on the original benchmark due to memorization of the training set
![Page 19: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/19.jpg)
This Work
• We show that simple neural models seem to perform very on the original benchmark due to memorization of the training set
• We propose a more challenging data split for the task to discourage memorization
![Page 20: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/20.jpg)
This Work
• We show that simple neural models seem to perform very on the original benchmark due to memorization of the training set
• We propose a more challenging data split for the task to discourage memorization
• We perform automatic evaluation and error analysis on the new benchmark, showing that the task is still far from being solved
![Page 21: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/21.jpg)
WebSplit Dataset Construction (Narayan et al. 2017)
![Page 22: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/22.jpg)
WebSplit Dataset Construction (Narayan et al. 2017)
<Alan_Bean | NASA selection | 1963>
Simple RDF Triples (facts from DBpedia)
<Alan_Bean | nationality | United_States>
<Alan_Bean | mission | Apollo_12>
![Page 23: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/23.jpg)
WebSplit Dataset Construction (Narayan et al. 2017)
<Alan_Bean | NASA selection | 1963>
Simple RDF Triples (facts from DBpedia)
<Alan_Bean | nationality | United_States>
<Alan_Bean | mission | Apollo_12>
Alan Bean is a US national.
Simple Sentences
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Alan Bean is a US national.
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Alan Bean is a US national.
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
![Page 24: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/24.jpg)
WebSplit Dataset Construction (Narayan et al. 2017)
<Alan_Bean | nationality | United_States, Alan_Bean | mission | Apollo_12,
Alan_Bean | NASA selection | 1963>
Sets of RDF triples
<Alan_Bean | NASA selection | 1963>
Simple RDF Triples (facts from DBpedia)
<Alan_Bean | nationality | United_States>
<Alan_Bean | mission | Apollo_12>
Alan Bean is a US national.
Simple Sentences
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Alan Bean is a US national.
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Alan Bean is a US national.
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
![Page 25: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/25.jpg)
WebSplit Dataset Construction (Narayan et al. 2017)
<Alan_Bean | nationality | United_States, Alan_Bean | mission | Apollo_12,
Alan_Bean | NASA selection | 1963>
Sets of RDF triples
<Alan_Bean | NASA selection | 1963>
Simple RDF Triples (facts from DBpedia)
<Alan_Bean | nationality | United_States>
<Alan_Bean | mission | Apollo_12>
Alan Bean, born in the United States, was selected by NASA in 1963 and served as a crew member of
Apollo 12.
Complex Sentences
Alan Bean, born in the United States, was selected by NASA in 1963 and served as a crew member of
Apollo 12.
Alan Bean, born in the United States, was selected by NASA in 1963 and served as a crew member of
Apollo 12.
Alan Bean is a US national.
Simple Sentences
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Alan Bean is a US national.
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Alan Bean is a US national.
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
![Page 26: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/26.jpg)
WebSplit Dataset Construction (Narayan et al. 2017)
<Alan_Bean | nationality | United_States, Alan_Bean | mission | Apollo_12,
Alan_Bean | NASA selection | 1963>
Sets of RDF triples
<Alan_Bean | NASA selection | 1963>
Simple RDF Triples (facts from DBpedia)
<Alan_Bean | nationality | United_States>
<Alan_Bean | mission | Apollo_12>
Alan Bean, born in the United States, was selected by NASA in 1963 and served as a crew member of
Apollo 12.
Complex Sentences
Alan Bean, born in the United States, was selected by NASA in 1963 and served as a crew member of
Apollo 12.
Alan Bean, born in the United States, was selected by NASA in 1963 and served as a crew member of
Apollo 12.
Alan Bean is a US national.
Simple Sentences
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Alan Bean is a US national.
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Alan Bean is a US national.
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Matching via RDFs
![Page 27: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/27.jpg)
WebSplit Dataset Construction (Narayan et al. 2017)
<Alan_Bean | nationality | United_States, Alan_Bean | mission | Apollo_12,
Alan_Bean | NASA selection | 1963>
Sets of RDF triples
<Alan_Bean | NASA selection | 1963>
Simple RDF Triples (facts from DBpedia)
<Alan_Bean | nationality | United_States>
<Alan_Bean | mission | Apollo_12>
Alan Bean, born in the United States, was selected by NASA in 1963 and served as a crew member of
Apollo 12.
Complex Sentences
Alan Bean, born in the United States, was selected by NASA in 1963 and served as a crew member of
Apollo 12.
Alan Bean, born in the United States, was selected by NASA in 1963 and served as a crew member of
Apollo 12.
Alan Bean is a US national.
Simple Sentences
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Alan Bean is a US national.
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Alan Bean is a US national.
Alan Bean was on the crew of Apollo 12.
Alan Bean was hired by NASA in 1963.
Matching via RDFs ~1M examples
![Page 28: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/28.jpg)
Preliminary Experiments
![Page 29: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/29.jpg)
Preliminary Experiments• ~1M training examples
![Page 30: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/30.jpg)
Preliminary Experiments• ~1M training examples
• “Vanilla” LSTM seq2seq with attention
comp lex sen ten ce
2ple 1 sim ple simsim ple 3
![Page 31: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/31.jpg)
Preliminary Experiments• ~1M training examples
• “Vanilla” LSTM seq2seq with attention
• Shared vocabulary between the encoder and the decoder
comp lex sen ten ce
2ple 1 sim ple simsim ple 3
![Page 32: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/32.jpg)
Preliminary Experiments• ~1M training examples
• “Vanilla” LSTM seq2seq with attention
• Shared vocabulary between the encoder and the decoder
• Simple sentences predicted as a single sequence
comp lex sen ten ce
2ple 1 sim ple simsim ple 3
![Page 33: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/33.jpg)
Preliminary Experiments• ~1M training examples
• “Vanilla” LSTM seq2seq with attention
• Shared vocabulary between the encoder and the decoder
• Simple sentences predicted as a single sequence
• Evaluated using single-sentence, multi-reference BLEU as in Narayan et al. 2017
comp lex sen ten ce
2ple 1 sim ple simsim ple 3
![Page 34: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/34.jpg)
Preliminary Results
![Page 35: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/35.jpg)
Preliminary Results
• Our simple seq2seq baseline outperform all but one of the baselines from Narayan et al. 2017
0
20
40
60
80
seq2seq (ours) hybridseq2seq multi-seq2seqsplit-multi split-seq2seq
![Page 36: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/36.jpg)
Preliminary Results
• Our simple seq2seq baseline outperform all but one of the baselines from Narayan et al. 2017
• Their best baselines were using the RDF structures as additional information
0
20
40
60
80
seq2seq (ours) hybridseq2seq multi-seq2seqsplit-multi split-seq2seq
Text Only Text + RDFs
![Page 37: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/37.jpg)
Preliminary Results
• Our simple seq2seq baseline outperform all but one of the baselines from Narayan et al. 2017
• Their best baselines were using the RDF structures as additional information
• Do the simple seq2seq model really performs so well?
0
20
40
60
80
seq2seq (ours) hybridseq2seq multi-seq2seqsplit-multi split-seq2seq
Text Only Text + RDFs
![Page 38: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/38.jpg)
BLEU can be Misleading
![Page 39: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/39.jpg)
BLEU can be Misleading• In spite of the high BLEU scores, our neural models suffer from:
![Page 40: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/40.jpg)
BLEU can be Misleading• In spite of the high BLEU scores, our neural models suffer from:• Missing facts - appeared in the input but not in the output
![Page 41: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/41.jpg)
BLEU can be Misleading• In spite of the high BLEU scores, our neural models suffer from:• Missing facts - appeared in the input but not in the output• Unsupported facts - appeared in the output but not in the input
![Page 42: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/42.jpg)
BLEU can be Misleading• In spite of the high BLEU scores, our neural models suffer from:• Missing facts - appeared in the input but not in the output• Unsupported facts - appeared in the output but not in the input• Repeated facts - appeared several times in the output
![Page 43: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/43.jpg)
A Closer Look
![Page 44: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/44.jpg)
A Closer Look• Visualizing the attention
weights we find an unexpected pattern
![Page 45: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/45.jpg)
A Closer Look• Visualizing the attention
weights we find an unexpected pattern
• The network mainly attends to a single token instead of spreading the attention
![Page 46: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/46.jpg)
A Closer Look• Visualizing the attention
weights we find an unexpected pattern
• The network mainly attends to a single token instead of spreading the attention
• This token was usually a part of the first mentioned entity
![Page 47: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/47.jpg)
A Closer Look• Visualizing the attention
weights we find an unexpected pattern
• The network mainly attends to a single token instead of spreading the attention
• This token was usually a part of the first mentioned entity
• Consistent among different input examples
![Page 48: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/48.jpg)
A Closer Look• Visualizing the attention
weights we find an unexpected pattern
• The network mainly attends to a single token instead of spreading the attention
• This token was usually a part of the first mentioned entity
• Consistent among different input examples
![Page 49: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/49.jpg)
A Closer Look• Visualizing the attention
weights we find an unexpected pattern
• The network mainly attends to a single token instead of spreading the attention
• This token was usually a part of the first mentioned entity
• Consistent among different input examples
![Page 50: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/50.jpg)
Testing for Over-Memorization
![Page 51: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/51.jpg)
Testing for Over-Memorization• In this stage we suspect that the network heavily memorizes entity-fact pairs
![Page 52: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/52.jpg)
Testing for Over-Memorization• In this stage we suspect that the network heavily memorizes entity-fact pairs
• We test this by introducing it with inputs consisting of repeated entities alone
![Page 53: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/53.jpg)
Testing for Over-Memorization• In this stage we suspect that the network heavily memorizes entity-fact pairs
• We test this by introducing it with inputs consisting of repeated entities alone
• The network indeed generates facts it memorized about those specific entities
![Page 54: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/54.jpg)
Testing for Over-Memorization• In this stage we suspect that the network heavily memorizes entity-fact pairs
• We test this by introducing it with inputs consisting of repeated entities alone
• The network indeed generates facts it memorized about those specific entities
![Page 55: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/55.jpg)
Searching for the Cause: Dataset Artifacts
![Page 56: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/56.jpg)
Searching for the Cause: Dataset Artifacts• The original dataset included overlap between the training/development/test sets
![Page 57: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/57.jpg)
Searching for the Cause: Dataset Artifacts• The original dataset included overlap between the training/development/test sets
•When looking at the complex sentences side, there is no overlap
Train Complex
Dev Complex
Test Complex
source
![Page 58: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/58.jpg)
Searching for the Cause: Dataset Artifacts• The original dataset included overlap between the training/development/test sets
•When looking at the complex sentences side, there is no overlap
•On the other hand, most of the simple sentences did overlap (~90%)
Train Complex
Dev Complex
Test Complex
source Train Simple
Dev Simple
Test Simple
target
![Page 59: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/59.jpg)
Searching for the Cause: Dataset Artifacts• The original dataset included overlap between the training/development/test sets
•When looking at the complex sentences side, there is no overlap
•On the other hand, most of the simple sentences did overlap (~90%)
•Makes memorization very effective - “leakage” from train on the target side
Train Complex
Dev Complex
Test Complex
source Train Simple
Dev Simple
Test Simple
target
![Page 60: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/60.jpg)
New Data Split
![Page 61: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/61.jpg)
New Data Split
• To remedy this, we construct a new data split by using the RDF information:
![Page 62: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/62.jpg)
New Data Split
• To remedy this, we construct a new data split by using the RDF information:
• Ensuring that all RDF relation types appear in the training set (enable generalization)
![Page 63: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/63.jpg)
New Data Split
• To remedy this, we construct a new data split by using the RDF information:
• Ensuring that all RDF relation types appear in the training set (enable generalization)
• Ensuring that no RDF triple (fact) appears in two different sets (reduce memorization)
![Page 64: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/64.jpg)
New Data Split
• To remedy this, we construct a new data split by using the RDF information:
• Ensuring that all RDF relation types appear in the training set (enable generalization)
• Ensuring that no RDF triple (fact) appears in two different sets (reduce memorization)
• The resulting dataset has no overlapping simple sentences
Original Split New Splitunique dev simple sentences in train 90.9% 0.09%unique test simple sentences in train 89.8% 0%
% dev vocabulary in train 97.2% 63%% test vocabulary in train 96.3% 61.7%
![Page 65: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/65.jpg)
New Data Split
• To remedy this, we construct a new data split by using the RDF information:
• Ensuring that all RDF relation types appear in the training set (enable generalization)
• Ensuring that no RDF triple (fact) appears in two different sets (reduce memorization)
• The resulting dataset has no overlapping simple sentences
• Has more unknown symbols in dev/test - need better models!
Original Split New Splitunique dev simple sentences in train 90.9% 0.09%unique test simple sentences in train 89.8% 0%
% dev vocabulary in train 97.2% 63%% test vocabulary in train 96.3% 61.7%
![Page 66: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/66.jpg)
Copy Mechanism
![Page 67: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/67.jpg)
Copy Mechanism• To help with the increase in unknown words in the harder split, we incorporate a
copy mechanism
![Page 68: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/68.jpg)
Copy Mechanism• To help with the increase in unknown words in the harder split, we incorporate a
copy mechanism
• Gu et al. 2016, See et al. 2017, Merity et al. 2017
![Page 69: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/69.jpg)
Copy Mechanism• To help with the increase in unknown words in the harder split, we incorporate a
copy mechanism
• Gu et al. 2016, See et al. 2017, Merity et al. 2017
• Uses a “copy switch” - feed-forward NN component with a sigmoid-activated scalar output
![Page 70: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/70.jpg)
Copy Mechanism• To help with the increase in unknown words in the harder split, we incorporate a
copy mechanism
• Gu et al. 2016, See et al. 2017, Merity et al. 2017
• Uses a “copy switch” - feed-forward NN component with a sigmoid-activated scalar output
• Controls the interpolation of the softmax probabilities and the copy probabilities over the input tokens in each decoder step
copy switch
1 - copy switch
attention weights (copy)
softmax output
![Page 71: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/71.jpg)
Results - New Split
![Page 72: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/72.jpg)
Results - New Split
• Baseline seq2seq models completely break (BLEU < 7) on the new split
0
22.5
45
67.5
90
original split new split
seq2seq +copy
![Page 73: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/73.jpg)
Results - New Split
• Baseline seq2seq models completely break (BLEU < 7) on the new split
• Copy mechanism helps to generalize
0
22.5
45
67.5
90
original split new split
seq2seq +copy
![Page 74: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/74.jpg)
Results - New Split
• Baseline seq2seq models completely break (BLEU < 7) on the new split
• Copy mechanism helps to generalize
• Much lower than the original benchmark - memorization was crucial for the high BLEU
0
22.5
45
67.5
90
original split new split
seq2seq +copy
![Page 75: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/75.jpg)
Copying and Attention
![Page 76: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/76.jpg)
Copying and AttentionNo-Copy With-Copy
The copy-enhanced models spread the attention across the input tokens while improving results
![Page 77: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/77.jpg)
Error Analysis
![Page 78: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/78.jpg)
Error Analysis• On the original split the
models did very well (due to memorization) with up to 91% correct simple sentences
0
12.5
25
37.5
50
original split new split
correct repeatedmissing unsupported
![Page 79: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/79.jpg)
Error Analysis• On the original split the
models did very well (due to memorization) with up to 91% correct simple sentences
• On the new benchmark the best model got only up to 20% correct simple sentences
0
12.5
25
37.5
50
original split new split
correct repeatedmissing unsupported
![Page 80: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/80.jpg)
Error Analysis• On the original split the
models did very well (due to memorization) with up to 91% correct simple sentences
• On the new benchmark the best model got only up to 20% correct simple sentences
• The task is much more challenging then previously demonstrated
0
12.5
25
37.5
50
original split new split
correct repeatedmissing unsupported
![Page 81: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/81.jpg)
Conclusions
![Page 82: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/82.jpg)
Conclusions
• Simple neural models seem to perform well due to memorization
![Page 83: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/83.jpg)
Conclusions
• Simple neural models seem to perform well due to memorization
• We propose a more challenging data split for the task to discourage this
![Page 84: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/84.jpg)
Conclusions
• Simple neural models seem to perform well due to memorization
• We propose a more challenging data split for the task to discourage this
• A similar update was proposed by Narayan et al. in parallel to our work (WebSplit v1.0)
![Page 85: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/85.jpg)
Conclusions
• Simple neural models seem to perform well due to memorization
• We propose a more challenging data split for the task to discourage this
• A similar update was proposed by Narayan et al. in parallel to our work (WebSplit v1.0)
• We perform automatic evaluation and error analysis on the new benchmarks, showing that the task is still far from being solved
![Page 86: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/86.jpg)
More Broadly
![Page 87: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/87.jpg)
More Broadly• Creating datasets is hard!
![Page 88: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/88.jpg)
More Broadly• Creating datasets is hard!
• Think how models can “cheat"
![Page 89: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/89.jpg)
More Broadly• Creating datasets is hard!
• Think how models can “cheat"
• Create a challenging evaluation environment to capture generalization
![Page 90: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/90.jpg)
More Broadly• Creating datasets is hard!
• Think how models can “cheat"
• Create a challenging evaluation environment to capture generalization
• Look for leakage of train to dev/test
![Page 91: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/91.jpg)
More Broadly• Creating datasets is hard!
• Think how models can “cheat"
• Create a challenging evaluation environment to capture generalization
• Look for leakage of train to dev/test
• Numbers can be misleading!
![Page 92: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/92.jpg)
More Broadly• Creating datasets is hard!
• Think how models can “cheat"
• Create a challenging evaluation environment to capture generalization
• Look for leakage of train to dev/test
• Numbers can be misleading!
• Look at the data
![Page 93: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/93.jpg)
More Broadly• Creating datasets is hard!
• Think how models can “cheat"
• Create a challenging evaluation environment to capture generalization
• Look for leakage of train to dev/test
• Numbers can be misleading!
• Look at the data
• Look at the model
![Page 94: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/94.jpg)
More Broadly• Creating datasets is hard!
• Think how models can “cheat"
• Create a challenging evaluation environment to capture generalization
• Look for leakage of train to dev/test
• Numbers can be misleading!
• Look at the data
• Look at the model
• Error analysis
![Page 95: Roee Aharoni and Yoav Goldberg - ACL Member Portal › anthology › attachments › P18-2114.Presentation.pdfThe Split and Rephrase Task • Narayan, Gardent, Cohen & Shimorina, EMNLP](https://reader033.vdocuments.site/reader033/viewer/2022042400/5f0f42ad7e708231d4434796/html5/thumbnails/95.jpg)
Thank You!