progress report 0729

13
Progress Report 2010/7/29 Shu-Ying Li

Upload: lswing

Post on 14-Jul-2015

209 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Progress Report 0729

Progress Report

2010/7/29

Shu-Ying Li

Page 2: Progress Report 0729

Associated information There are two possible case :

1. Two different layouts in a web page.

2. No obvious boundary among extracted addresses. We apply parser and Tidy on web pages and consider the

following two fields :

1. Path based on number

2. Terminal Value

Page 3: Progress Report 0729

Method

Find the position of address a1 and record the path pa

For each pathi

If pathi match pa

extract the corresponding information

If pathi match pa and length(pathi) >length(pa)

extract the corresponding information

Path based on number

1\2\8\1\1\1\1\1\4\3\2

1\2\8\1\1\1\1\1\4\3\2\4

1\2\8\1\1\1\1\1\4\3\2

Terminal Value

Infomation1

information2

1410 Pines Road, Oregon, IL, 61061, USAa1pa

Page 4: Progress Report 0729

Case 1 : No obvious boundary among extracted addresses(1/2)

Page 5: Progress Report 0729

Case 1 : No obvious boundary among extracted addresses(2/2)

Page 6: Progress Report 0729

Case 2 : Two different layouts in a web page(1/2)

Page 7: Progress Report 0729

Case 2 : Two different layouts in a web page(2/2)

miss

miss

Page 8: Progress Report 0729

Case 3 :

Page 9: Progress Report 0729

Case 3 :

miss

Page 10: Progress Report 0729

Discussion Using ANNIE annotation

Organization Person Date Address Location

Page 11: Progress Report 0729

Discussion-Using ANNE annotation

Organization

Address

Location

AddressOrganization

Location

Page 12: Progress Report 0729

Discussion-Using ANNE annotationDate

Organization

Organization

OrganizationOrganization Person

Person

Person

Location

Page 13: Progress Report 0729

Discussion-Using ANNE annotationOrganization

Date

Location

Organization

Location

Date

Person

Organization

Location

Date