extracting paper titles, authors and conferences from lists on the web nguyen bach sue ann hong ben...
Post on 22-Dec-2015
215 Views
Preview:
TRANSCRIPT
Extracting Paper Titles, Authors and Conferences
from Lists on the Web
Nguyen Bach
Sue Ann Hong
Ben Lambert
We will attempt to extract these predicates and relations• isAuthor(X)
• isPaperTitle(X)
• isConferenceName(X)
• publishedAt( <paper>, <conference> )
• authorOf (<author> , <paper> )
By looking at these Web pages:
• An author’s publication lists (on their home page)
• Conference accepted papers list (on the conference Web page)
AAAI-05 Paper List
Then we look for patterns…
<author> and <author>, <title>.
<author> , <author>, <author> and <author>, <title>.
<author> , <author>, <author> and <author>, <title>.
But…
• Maybe the patterns are wrong, so look for some more evidence…
• Once we have enough evidence we can add it to our knowledge base.
…And around 100 other citations in roughly the same format …
Conclusion
• Redundancy = # of authors + 1• Seems to be very precise since titles usually
do not have many spelling variations.• Help to find alternate name spellings
– Could there be a “Q. Yang” and a “Qiang Yang” with nearly identical publications?
• Works for other fields also (e.g. History)
top related