bootstrapping regular-expression recognizer to h elp h uman annotators

11
Bootstrapping Regular- Expression Recognizer to Help Human Annotators Tae Woo Kim

Upload: ferris-lloyd

Post on 02-Jan-2016

24 views

Category:

Documents


1 download

DESCRIPTION

Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators. Tae Woo Kim. Background. Human annotators annotate entities Top to bottom, a person at a time Find what they can find. Background. Background. Background. The form fills out the ontology snippet. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

Bootstrapping Regular-Expression Recognizer to Help Human Annotators

Tae Woo Kim

Page 2: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

Background• Human annotators annotate entities

• Top to bottom, a person at a time

• Find what they can find

Page 3: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

Person

Name:

Birth date:

Death date:

Residence:

Father:

Mother:

Mary Eliza Warner

1826

Samuel Selden Warner

Azubah Tully Warner

Background

Page 4: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

Person

Name:

Birth date:

Death date:

Residence:

Father:

Mother:

Samuel Selden Warner

Background

Page 5: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

Background• The form fills out the ontology snippet

Page 6: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

Motivation• Too many genealogical documents for human

annotators

• 611,923 Historical documents and family tree with Ely

• The documents represent information in similar patterns

• Why not use these patterns!

Page 7: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

Solution• While human annotators annotate entities, the

system watches and learn

• Break the text of the documents into sentence fragments

• Find sentence fragments that are in the same pattern

• Turn the pattern into regular expressions

Page 8: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

What human annotators have

What the system has

Page 9: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

[1digit num.]._[name],_b._[date],_d._[date].

(\d).\s([A-Z][a-z]+\s[A-Z][a-z]+),\sb.\s(\d{4}),\sd.\s(\d{4}).

Solution

Page 10: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

Solution• Run the regular-expressions in the rest of the

documents

• Ontology snippet can be filled out with the extracted data

• The system fills out the form for the annotators

Page 11: Bootstrapping Regular-Expression  Recognizer  to  H elp  H uman Annotators

Conclusion• Regular-expression recognizers watches and learn

from human annotators

• Generate regular-expression to find entities for annotators

• The system will get better and better as it learns more patterns