updating a name tagger using contemporary unlabeled data
DESCRIPTION
Presentation at ACL-IJCNLP 2009 of Cristina Mota & Ralph Grishman (2009a). “Updating a name tagger using contemporary unlabeled data.” Proc. of the Joint conference of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, August, 2009, Singapore.TRANSCRIPT
![Page 1: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/1.jpg)
Updating a Name Tagger Using
Contemporary Unlabeled Data
ACL-IJCNLP 2009Singapore, August 3rd - 5th
Cristina Mota1,2 and Ralph Grishman2
1IST & L2F INESC-ID (Portugal)2New York University (USA)
(Advisors: Ralph Grishman & Nuno Mamede)
This research was funded by Fundacao para a Ciencia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
![Page 2: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/2.jpg)
Motivation
0 1 2 3 4 5 6 7
0.79
0.80
0.81
0.82
0.83
0.84
0.85
Time gap (year)
F−m
easu
re
y=−0.00391x+0.82479R2=0.3647
The performance of a co-trainednamed entity tagger decreases asthe time gap increases betweentraining and test sets (Mota &Grishman, 2008)
Do we need to update the seeds or the unlabeled data?
Does more older data help?
![Page 3: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/3.jpg)
Motivation
0 1 2 3 4 5 6 7
0.79
0.80
0.81
0.82
0.83
0.84
0.85
Time gap (year)
F−m
easu
re
y=−0.00391x+0.82479R2=0.3647
The performance of a co-trainednamed entity tagger decreases asthe time gap increases betweentraining and test sets (Mota &Grishman, 2008)
Do we need to update the seeds or the unlabeled data?
Does more older data help?
![Page 4: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/4.jpg)
Related Work
“More data are better data” (Church & Mercer, 1993)Enlarge labeled data as a way of improving performance
Contemporary (labeled) data reduces out-of-vocabulary rates
Time-adaptive language model (Auzanne et al., 2000)Generation of offline name lists (Palmer & Ostendorf, 2005)Daily adaptation of the language model of a broadcast newstranscription system (Martins et al., 2006)
![Page 5: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/5.jpg)
Data Sets
Data sets were drawn from the Politics section of CETEMPublicocorpus (Santos & Rocha, 2001)
Language: Portuguese
Time span: 8 years (1991-1998)
Time gap: 1=6 months
For each six month period
Seeds (S): names collected from first 192 extracts∗
Test data (T): next 208 extractsUnlabeled data (U): next 7856 extracts
∗1 extract = app. 2 paragraphs
![Page 6: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/6.jpg)
Named Entity Tagger
Identification
Pairs (spelling features,
contextual features)
Co-training
Spelling +
contextual rules
Seeds
Unlabeled text
Training
Based on a co-training classifier(Collins & Singer, 1999)
Includes propagation step
Needs few seeds andperformance is high (above80%)
Performance is parametrized bycombination of seeds,unlabeled set and test set:(S,U,T)
Tagger is evaluated afterpropagation with HAREMscoring programs
![Page 7: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/7.jpg)
Named Entity Tagger
Test text
Labeled Pairs
Text with classified NE
Identification
Classification
Propagation
Pairs (spelling features,
contextual features)
Co-training
Spelling +
contextual rules
Seeds
Unlabeled text
TestingTraining
Based on a co-training classifier(Collins & Singer, 1999)
Includes propagation step
Needs few seeds andperformance is high (above80%)
Performance is parametrized bycombination of seeds,unlabeled set and test set:(S,U,T)
Tagger is evaluated afterpropagation with HAREMscoring programs
![Page 8: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/8.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
![Page 9: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/9.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
![Page 10: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/10.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
![Page 11: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/11.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
![Page 12: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/12.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
![Page 13: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/13.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
![Page 14: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/14.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
![Page 15: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/15.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
![Page 16: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/16.jpg)
Update seeds or unlabeled data?0.
740.
760.
780.
800.
820.
84
Training epoch
F−m
easu
re
(i,i,98b)(98b,i,98b)(i,98b,98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Performance decays as thetime gap increases (Mota &Grishman, 2008)
v v v v v v v v v v v v v v v v v v v v v v v v
![Page 17: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/17.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
![Page 18: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/18.jpg)
Update seeds or unlabeled data?
Timeline
Tn
Seeds
Unlabeled
examples Ui
Test
91a 98b
Sn
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
![Page 19: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/19.jpg)
Update seeds or unlabeled data?
Timeline
Tn
Seeds
Unlabeled
examples Ui
Test
91a 98b
Sn
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
![Page 20: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/20.jpg)
Update seeds or unlabeled data?
Timeline
Tn
Seeds
Unlabeled
examples Ui
Test
91a 98b
Sn
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
![Page 21: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/21.jpg)
Update seeds or unlabeled data?
Timeline
Tn
Seeds
Unlabeled
examples Ui
Test
91a 98b
Sn
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
![Page 22: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/22.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
![Page 23: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/23.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
![Page 24: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/24.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
![Page 25: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/25.jpg)
Update seeds or unlabeled data?0.
740.
760.
780.
800.
820.
84
Training epoch
F−m
easu
re
(i,i,98b)(98b,i,98b)(i,98b,98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Contemporary seeds slightlyattenuate the decrease
v v v v v v v v v v v v v v v v v v v v v v v v
![Page 26: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/26.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Un
Test
91a 98b
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
![Page 27: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/27.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples
Test
91a 98b
Un
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
![Page 28: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/28.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples
Test
91a 98b
Un
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
![Page 29: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/29.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples
Test
91a 98b
Un
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
![Page 30: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/30.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples
Test
91a 98b
Un
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
![Page 31: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/31.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples
Test
91a 98b
Un
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
![Page 32: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/32.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Un
Test
91a 98b
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
![Page 33: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/33.jpg)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Un
Test
91a 98b
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
![Page 34: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/34.jpg)
Updating the unlabeled data is better than
updating the seeds0.
740.
760.
780.
800.
820.
84
Training epoch
F−m
easu
re
(i,i,98b)(98b,i,98b)(i,98b,98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Contemporary unlabeled datamaintain the performance
v v v v v v v v v v v v v v v v v v v v v v v v
![Page 35: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/35.jpg)
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
![Page 36: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/36.jpg)
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
![Page 37: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/37.jpg)
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
UiUi
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
![Page 38: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/38.jpg)
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
UiUiUi
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
![Page 39: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/39.jpg)
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
UiUiUiUi
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
![Page 40: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/40.jpg)
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
UiUiUiUiUi
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
![Page 41: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/41.jpg)
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples
Test
91a 98b
UiUiUiUiUiUiUi
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
![Page 42: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/42.jpg)
Augment unlabeled data?0.
740.
760.
780.
800.
820.
84
Time frame (semester)
F−m
easu
re
(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards
Blue line: Different seeds for each tagger; sameunlabeled data for all taggers (98b)
Larger amounts of olderunlabeled data does not alwaysresult in better performance
![Page 43: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/43.jpg)
Augment unlabeled data?0.
740.
760.
780.
800.
820.
84
Time frame (semester)
F−m
easu
re
(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards
Blue line: Different seeds for each tagger; sameunlabeled data for all taggers (98b)
Larger amounts of olderunlabeled data does not alwaysresult in better performance
![Page 44: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/44.jpg)
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 5: Enlarge the size of unlabeled data and varyseeds
![Page 45: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/45.jpg)
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
![Page 46: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/46.jpg)
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
![Page 47: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/47.jpg)
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
![Page 48: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/48.jpg)
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
![Page 49: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/49.jpg)
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui Ui Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
![Page 50: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/50.jpg)
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui Ui Ui Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
![Page 51: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/51.jpg)
Updating the unlabeled data is better than
accumulating older unlabeled data0.
740.
760.
780.
800.
820.
84
Time frame (semester)
F−m
easu
re
(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Violet line: Seeds in the same time frame asunlabeled set being added; unlabeled data isenlarging backwards
Blue line: Seeds are the same as in the violetline; same unlabeled data for all taggers (98b)
Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards
Larger amounts of unlabeleddata is worse than training withcontemporary unlabeled data
Larger amounts of unlabeleddata does not outperform thetagger trained withcontemporary seeds andunlabeled data
![Page 52: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/52.jpg)
Updating the unlabeled data is better than
accumulating older unlabeled data0.
740.
760.
780.
800.
820.
84
Time frame (semester)
F−m
easu
re
(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Violet line: Seeds in the same time frame asunlabeled set being added; unlabeled data isenlarging backwards
Blue line: Seeds are the same as in the violetline; same unlabeled data for all taggers (98b)
Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards
Larger amounts of unlabeleddata is worse than training withcontemporary unlabeled data
Larger amounts of unlabeleddata does not outperform thetagger trained withcontemporary seeds andunlabeled data
![Page 53: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/53.jpg)
Updating the unlabeled data is better than
accumulating older unlabeled data0.
740.
760.
780.
800.
820.
84
Time frame (semester)
F−m
easu
re
(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Violet line: Seeds in the same time frame asunlabeled set being added; unlabeled data isenlarging backwards
Blue line: Seeds are the same as in the violetline; same unlabeled data for all taggers (98b)
Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards
Larger amounts of unlabeleddata is worse than training withcontemporary unlabeled data
Larger amounts of unlabeleddata does not outperform thetagger trained withcontemporary seeds andunlabeled data
![Page 54: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/54.jpg)
Final remarks
Contemporary unlabeled data are better data
But...
Why doesn’t the labeled data impact the performance more?Are other semi-supervised approaches also sensitive?
![Page 55: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/55.jpg)
Acknowledgments
This research work was funded by Fundacao para a Ciencia e a
Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
![Page 56: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/56.jpg)
Updating a Name Tagger Using
Contemporary Unlabeled Data
ACL-IJCNLP 2009Singapore, August 3rd - 5th
Cristina Mota1,2 and Ralph Grishman2
1IST & L2F INESC-ID (Portugal)2New York University (USA)
(Advisors: Ralph Grishman & Nuno Mamede)
This research was funded by Fundacao para a Ciencia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
![Page 57: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/57.jpg)
Example of (miss)classification
Test set 98b includes two instances of “Tizi Ouzou”:Tizi Ouzou tem (en: Tizi Ouzou has)manifestacoes em Tizi Ouzou (en: demonstrations in Tizi Ouzou)
Does not occur in u 91a so depends on contexts:(”n v” ”tem”) ORGANIZATION 0.52(”type” ”nprop v”) PERSON 0.43(”len” 2) PERSON 0.62
But occurs in u 98b:noite em Tizi (en: night in Tizi)ruas de Tizi Ouzou (en: street of Tizi Ouzou)ir a Tizi-Ouzou (en: go to Tizi Ouzou)
![Page 58: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/58.jpg)
NE tagger: Identification
Raw text
Lexical analysis
Chunking
NE + context identification
Portuguese dictionary
Pairs (NE,context)
Priority dictionaries
Chunking grammars
Morphological grammars
NE + context grammars
Text with unclassified NE
Identification designed with NooJ(Silberztein, 2004)
1 Elisa Ferreira comecou porcriticar Cavaco Silva
2 [Elisa Ferreira]SEQM [comecouporcriticar]V+Complexo+Pred=criticar
[Cavaco Silva]SEQM
3 [Elisa Ferreira]nprop v+criticar
comecou por criticar [CavacoSilva]v nprop+criticar
4 [Elisa Ferreira]nprop v+criticar
[Cavaco Silva]v nprop+criticar
![Page 59: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/59.jpg)
NE tagger: Classification
Seeds
Label with name rules
Infer context rules
Label with context rules
Infer name rules
Labeled examples
Context rules
Labeled examples
Name rules
Label with name + context rules
Labeled examples
Infername + context rules
List of examples
Name + context rules
Spelling features ← SEEDS: (ElisaFerreira,PESSOA,0.9999)
1 LABEL: Elisa Ferreira,criticar ← PESSOA
2 INFER: (criticar,PESSOA,0.98)
3 LABEL: Cavaco Silva,criticar ← PESSOA
4 INFER: (Silva,PESSOA,0.97)
5 REPEAT
![Page 60: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/60.jpg)
NE tagger performance decreases over time (Mota & Grishman, 2008)
Detailed analysis using six-month periods (instead of periods of 1 year)
(Si , Ui , Tj)a b R2
P 0.827 -0.0024 0.24824R 0.773 -0.0022 0.19393F 0.799 -0.0023 0.23765
0 5 10 15
0.74
0.76
0.78
0.80
0.82
Time gap (1=6 months)
F−m
easu
re
y=−0.00232x+0.79906R2=0.2376
The performance decreases at an estimated rate of:
0.00232 in F-measure each 6 months (0.0348 after 8 years)
The low R-squared values show that not all variation is attributableto increasing the time gap
![Page 61: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/61.jpg)
Updating the unlabeled data is better thanupdating the seeds (Complete training-test configurations)
0 5 10 15
0.74
0.76
0.78
0.80
0.82
Time gap (1=6 months)
F−m
easu
re
y=−0.00232x+0.79906R2=0.2376
Update? a b R2
No 0.799 -0.0023 0.238Seeds 0.800 -0.0019 0.192Unlabeled 0.807 -0.0005 0.019
![Page 62: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/62.jpg)
Updating the unlabeled data is better thanupdating the seeds (Complete training-test configurations)
0 5 10 15
0.76
0.78
0.80
0.82
Time gap (1=6 months)
F−m
easu
re
y=−0.00189x+0.80025R2=0.1917
Update? a b R2
No 0.799 -0.0023 0.238Seeds 0.800 -0.0019 0.192Unlabeled 0.807 -0.0005 0.019
![Page 63: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/63.jpg)
Updating the unlabeled data is better thanupdating the seeds (Complete training-test configurations)
0 5 10 15
0.77
0.78
0.79
0.80
0.81
0.82
0.83
Time gap (1=6 months)
F−m
easu
re
y=−0.00051x+0.80769R2=0.0189
Update? a b R2
No 0.799 -0.0023 0.238Seeds 0.800 -0.0019 0.192Unlabeled 0.807 -0.0005 0.019
![Page 64: Updating a Name Tagger Using Contemporary Unlabeled Data](https://reader034.vdocuments.site/reader034/viewer/2022052600/55895b83d8b42a5d178b469d/html5/thumbnails/64.jpg)
Confusion matrices
91a 335 12 22 330 16 20 393 12 22
52 453 79 52 456 69 12 463 38
23 21 330 28 14 342 5 11 371
92b 368 19 42 368 16 40 391 11 22
19 435 55 23 445 39 14 463 29
23 32 334 19 25 352 5 12 380
95b 375 14 34 387 14 30 394 12 26
22 465 78 13 461 73 12 463 43
13 7 319 10 11 328 4 11 362
98a 390 16 31 386 16 28 395 11 28
11 458 58 13 460 48 11 464 39
9 12 342 11 10 355 4 11 364
98b 394 9 20 394 9 20 394 9 20
8 467 29 8 467 29 8 467 29
8 10 382 8 10 382 8 10 382