extracting paper titles, authors and conferences from lists on the web nguyen bach sue ann hong ben...
Post on 22-Dec-2015
215 views
TRANSCRIPT
Extracting Paper Titles, Authors and Conferences
from Lists on the Web
Nguyen Bach
Sue Ann Hong
Ben Lambert
We will attempt to extract these predicates and relations• isAuthor(X)
• isPaperTitle(X)
• isConferenceName(X)
• publishedAt( <paper>, <conference> )
• authorOf (<author> , <paper> )
By looking at these Web pages:
• An author’s publication lists (on their home page)
• Conference accepted papers list (on the conference Web page)
AAAI-05 Paper List
Then we look for patterns…
<author> and <author>, <title>.
<author> , <author>, <author> and <author>, <title>.
<author> , <author>, <author> and <author>, <title>.
But…
• Maybe the patterns are wrong, so look for some more evidence…
• Once we have enough evidence we can add it to our knowledge base.
…And around 100 other citations in roughly the same format …
Conclusion
• Redundancy = # of authors + 1• Seems to be very precise since titles usually
do not have many spelling variations.• Help to find alternate name spellings
– Could there be a “Q. Yang” and a “Qiang Yang” with nearly identical publications?
• Works for other fields also (e.g. History)