extracting paper titles, authors and conferences from lists on the web nguyen bach sue ann hong ben...

13
Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert

Extracting Paper Titles, Authors and Conferences

from Lists on the Web

Nguyen Bach

Sue Ann Hong

Ben Lambert

Page 2: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert

We will attempt to extract these predicates and relations• isAuthor(X)

• isPaperTitle(X)

• isConferenceName(X)

• publishedAt( <paper>, <conference> )

• authorOf (<author> , <paper> )

Page 3: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert

By looking at these Web pages:

• An author’s publication lists (on their home page)

• Conference accepted papers list (on the conference Web page)

Page 4: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert

AAAI-05 Paper List

Page 5: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert
Page 6: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert
Page 7: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert
Page 8: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert
Page 9: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert
Page 10: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert

Then we look for patterns…

<author> and <author>, <title>.

<author> , <author>, <author> and <author>, <title>.

<author> , <author>, <author> and <author>, <title>.

Page 11: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert

But…

• Maybe the patterns are wrong, so look for some more evidence…

• Once we have enough evidence we can add it to our knowledge base.

Page 12: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert

…And around 100 other citations in roughly the same format …

Page 13: Extracting Paper Titles, Authors and Conferences from Lists on the Web Nguyen Bach Sue Ann Hong Ben Lambert

Conclusion

• Redundancy = # of authors + 1• Seems to be very precise since titles usually

do not have many spelling variations.• Help to find alternate name spellings

– Could there be a “Q. Yang” and a “Qiang Yang” with nearly identical publications?

• Works for other fields also (e.g. History)