text and data mining with crossref
DESCRIPTION
Text and Data Mining with CrossRef. At The British Library's "Text Mining: Opportunities and Tools" event.TRANSCRIPT
Text and Data Mining with CrossRef
Joe Wass
www.crossref.org
@joewass
British Library, November 2014
Joe Wass (CrossRef) 1 / 30
Academic Life before Computers
Joe Wass (CrossRef) 2 / 30
URLs for everyone!
Joe Wass (CrossRef) 3 / 30
But linkrot!
3% of links unavailable after a year 1
1https://en.wikipedia.org/wiki/Link_rotJoe Wass (CrossRef) 4 / 30
DOI
Digital Object Identifier
http://dx.doi.org/10.5555/12345678
persistent
unique
cross-publisher industry standard
you can click them!
Joe Wass (CrossRef) 5 / 30
2
est 2000
2Other DOI Registration Agencies AvailableJoe Wass (CrossRef) 6 / 30
DOIs forever
Joe Wass (CrossRef) 7 / 30
DOIs everywhere
Joe Wass (CrossRef) 8 / 30
DOIs everywhere!
Joe Wass (CrossRef) 9 / 30
DOIs everywhere!!
Joe Wass (CrossRef) 10 / 30
DOIs everywhere!!!
Joe Wass (CrossRef) 11 / 30
DOIs everywhere!!!!
Joe Wass (CrossRef) 12 / 30
Metadata In Metadata Out
Joe Wass (CrossRef) 13 / 30
CrossRef
Association of scholarly publishers
15 years old this year
70,416,598 DOIs
not only linksI CrossCheck plagiarism detectionI CrossMark retraction noticesI an APII metadata
F titlesF tables of contentsF authorsF ISSNF datasetsF funding informationF license informationF full-text links
Joe Wass (CrossRef) 14 / 30
What’s this got to do with TDM?
It’s all about the links (and metadata).
Workflow for Text and Data Mining
1 Identify corpus2 Somehow get hold of corpus
1 Figure out the license for each document2 Figure out where to get the document3 Download it
3 Clever algorithms1 That’s your problem
Repeat for very large numbers of documents.
Joe Wass (CrossRef) 15 / 30
CrossRef Metadata
DOIs + license information + full-text URLs = corpuscross-publisher API
cross-publisher data schema
Joe Wass (CrossRef) 16 / 30
Joe Wass (CrossRef) 17 / 30
api.crossref.org
Joe Wass (CrossRef) 18 / 30
Demo time!
Joe Wass (CrossRef) 19 / 30
Joe Wass (CrossRef) 20 / 30
Joe Wass (CrossRef) 21 / 30
Joe Wass (CrossRef) 22 / 30
Joe Wass (CrossRef) 23 / 30
Joe Wass (CrossRef) 24 / 30
Joe Wass (CrossRef) 25 / 30
Joe Wass (CrossRef) 26 / 30
Joe Wass (CrossRef) 27 / 30
Joe Wass (CrossRef) 28 / 30
More metadata
> 1,100,000 articles and counting
11 million more coming soon
more publishers in the pipelineI American Institute of Physics (AIP)I American Physical Society (APS)I ElsevierI HighWire PressI Institute of Physics (IoPP)I SpringerI Taylor & FrancisI Walter de GruyterI Wiley
120,000 Creative Commons articles
Joe Wass (CrossRef) 29 / 30
Text and Data Mining with CrossRef
Joe Wass
www.crossref.org
@joewass
British Library, November 2014
http://www.crossref.org
http://tdmsupport.crossref.org
http://api.crossref.org
https://github.com/CrossRef/rest-api-doc/blob/master/rest_api_tour.md
Joe Wass (CrossRef) 30 / 30