diggicore: digging into connected repositories
DESCRIPTION
This is a presentation I gave in Bristol describing the DiggiCORE project and the challenges it addresses.TRANSCRIPT
![Page 1: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/1.jpg)
1/38
DiggiCORE: Digging into Connected Repositories
Petr KnothKnowledge Media institute
The Open University
![Page 2: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/2.jpg)
2/38
Outline
1. Connecting by aggregating Open Access (OA) publications• Why agregate and who is it for• The added value of aggregations
2. The CORE system3. Supporting research in mining databases of scientific
publications
![Page 3: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/3.jpg)
3/38
Outline
1. Connecting by aggregating Open Access (OA) publications • Why agregate and who is it for• The added value of aggregations
2. The CORE system3. Supporting research in mining databases of scientific
publications
![Page 4: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/4.jpg)
4/38
The rapid rise of OA articles
The graph (from Laasko and Bjork's paper - BMC Medicine 2012, 10:124) shows the numbers of papers published in three different types of online open access journals from 2000 to 2011.
![Page 5: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/5.jpg)
5/38
Growth of Open Access repositories
![Page 6: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/6.jpg)
6/38
Why we need aggregations?
“Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, Open Access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.’’
[COAR manifesto]
![Page 7: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/7.jpg)
7/38
Access to information according to the level of abstraction
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Semantic Enrichm
ent
Interfaces
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information access
Aggregation
![Page 8: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/8.jpg)
8/38
Who should be supported by aggregations?
• The following users groups (divided according to the level of abstraction of information they need):• Raw data access. Developers, DLs, DL researchers, companies …• Transaction information access. Researchers, students, life-long learners …• Analytical information access. Funders, government, bussiness intelligence
…
![Page 9: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/9.jpg)
9/38
What is it all about?
![Page 10: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/10.jpg)
10/38
Outline
1. Connecting by aggregating Open Access (OA) publications – why, how, what for?
2. The CORE system3. Supporting research in mining databases of scientific
publications
![Page 11: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/11.jpg)
11/38
CORE objective
CORE aims to provide a technical infrastructure for Open Access scholarly publications that will support access and reuse of scholarly materials at different levels of abstraction.
![Page 12: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/12.jpg)
12/38
CORE functionality
Content harvesting, processing
![Page 13: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/13.jpg)
13/38
CORE functionality
Semantic enrichment
![Page 14: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/14.jpg)
14/38
CORE functionality
Providing services
![Page 15: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/15.jpg)
15/38
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
Aggregation
CORE API
CORE Portal, CORE Mobile, CORE Plugin
Repository Analytics
What does CORE provide at different access levels?
CORE API
![Page 16: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/16.jpg)
16/38
CORE ApplicationsCORE Portal – Allows searching and navigating scientific publications aggregated from Open Access repositories
![Page 17: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/17.jpg)
17/38
CORE Applications
CORE Mobile – Allows searching and navigating scientific publications aggregated from Open Access repositories
![Page 18: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/18.jpg)
18/38
CORE ApplicationsCORE Plugin – A plugin to system that recommendations for related items.
![Page 19: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/19.jpg)
19/38
CORE ApplicationsRepository Analytics – is an analytical tool supporting providers of open access content (in particular repository managers).
![Page 20: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/20.jpg)
20/38
![Page 21: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/21.jpg)
21/38
CORE ApplicationsCORE API – Enables external systems and services to interact with the CORE repository.
• Search service• Pdf and plain text
service• Similarity service• Classification service• Citation service
![Page 22: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/22.jpg)
22/38
CORE ApplicationsCORE API registered users:British Education IndexCottagelabsUKCORREuropeanaULCCLibrary, The Open UniversityLos Alamos National Laboratory, USAUniversity of Manchester LibraryUniversidad de los Andes. Bogotá, ColombiaUNESCO
![Page 23: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/23.jpg)
23/38
CORE visits (October 2012)
More than 6000 visits per day
![Page 24: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/24.jpg)
24/38
Outline
1. Connecting by aggregating Open Access (OA) publications – why, how, what for?
2. The CORE system3. Supporting research in mining databases of scientific
publications
![Page 25: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/25.jpg)
25/38
Objective
Software for exploration and analysis of very large and fast-growing amounts of research publications stored across Open Access Repositories (OAR).
![Page 26: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/26.jpg)
26/38
DiggiCORE networks
Three networks: (a) semantically related papers,(b) citation network, (c) author citation network
![Page 27: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/27.jpg)
27/38
The problem of result transparency
Google Scholar
Microsoft Academic Search
![Page 28: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/28.jpg)
28/38
DiggiCORE objectives
Allow researchers to use this platform to analyse publications. Why?• To identifying patterns in the behaviour of research
communities• To detect trends in research disciplines• To gain new insights into the citation behaviour of researchers• To discover features that distinguish papers with high impact
![Page 29: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/29.jpg)
29/38
Questions the system can help answering?
• What are the attributes of impact publications?• Do these attributes differ in the humanities, social sciences and
computer sciences?• What are the features of research groups within disciplines and
how do these features relate to contributions generated by the group?
• What are the attributes of high-impact authors and what is their role within the group?
• What are the dynamics of successful research groups?
![Page 30: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/30.jpg)
30/38
Questions the system can help answering?
• What is the mechanism of cross-fertilisation within disciplines, especially between the humanities and the sciences?
• Who are the authors whose work is worth monitoring because they contribute to the achievements of their own discipline and also inspire other disciplines?
• How should the novice in the discipline get acquainted with key achievements in the discipline?
• How should he/she search for the most important publications?
![Page 31: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/31.jpg)
31/38
Challenges
• Technical issues of quick Open Access harvesting• Lack of understanding of publishers of academics of Open
Access licenses• Explain the added value of full-text vs metadata aggregations: • User experience• Text-mining
![Page 32: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/32.jpg)
32/38
The power of full-text aggregations (WorldCat vs CORE)
![Page 33: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/33.jpg)
33/38
Text-mining“There are currently over 144,000 full time equivalent academic professionals (teaching and research) working in UK higher education. Using data from the Higher Education Statistics Agency (HESA) for UK academic salaries, the median salary for a UK academic falls into a band of between £42k and £55k, which translates to between £26 and £33 per working hour. If text mining enabled just a 2% increase in productivity – corresponding to only 45 minutes per academic per working week (and looking at CIBER’s analysis of the impact of eJournals, this is very much an underestimate), this would imply over 4.7 million working hours and additional productivity worth between £123.5m and £156.8m in working time per year.” [McDonnald & Kelly, 2012] – JISC report on text-mining
![Page 34: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/34.jpg)
34/38
http://rossmounce.co.uk/2012/09/04/the-gold-oa-plot-v0-2/
Cost of Gold OA
![Page 35: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/35.jpg)
35/38
Summary
• Aggregations should serve the needs of different user groups.• Transparency is crucial• Machine access to publications provides lots of new
opportunities.• We can have many services that are part of the infrastructure,
but should work with the same data.• CORE aims to• prepare the way for innovative open access services• demonstrate the benefits of programmable access to
publications• data mine publications for impact characteristics
![Page 36: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/36.jpg)
36/38
Partners
Advisory Board
![Page 37: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/37.jpg)
37/38
Questions?
![Page 38: DiggiCORE: Digging into Connected Repositories](https://reader036.vdocuments.site/reader036/viewer/2022062418/554bdb98b4c905ac708b5363/html5/thumbnails/38.jpg)
38/38