extracting paper titles, authors and conferences from lists on the web nguyen bach sue ann hong ben...

Post on 22-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Extracting Paper Titles, Authors and Conferences

from Lists on the Web

Nguyen Bach

Sue Ann Hong

Ben Lambert

We will attempt to extract these predicates and relations• isAuthor(X)

• isPaperTitle(X)

• isConferenceName(X)

• publishedAt( <paper>, <conference> )

• authorOf (<author> , <paper> )

By looking at these Web pages:

• An author’s publication lists (on their home page)

• Conference accepted papers list (on the conference Web page)

AAAI-05 Paper List

Then we look for patterns…

<author> and <author>, <title>.

<author> , <author>, <author> and <author>, <title>.

<author> , <author>, <author> and <author>, <title>.

But…

• Maybe the patterns are wrong, so look for some more evidence…

• Once we have enough evidence we can add it to our knowledge base.

…And around 100 other citations in roughly the same format …

Conclusion

• Redundancy = # of authors + 1• Seems to be very precise since titles usually

do not have many spelling variations.• Help to find alternate name spellings

– Could there be a “Q. Yang” and a “Qiang Yang” with nearly identical publications?

• Works for other fields also (e.g. History)

top related