data for research (dfr) service

26
l JSTOR Advanced Technology Research Denver 25 th January 2008 John Burns Clare Llewellyn

Upload: historiaimedia

Post on 06-May-2015

369 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Data for Research (DfR) service

l

JSTOR Advanced Technology Research

Denver 25th January 2008 John Burns Clare Llewellyn

Page 2: Data for Research (DfR) service

l

Today we will introduce a public beta of our Data for Research service and show you some of the other services that JSTOR’s advanced technology group is working on.

Mission: Working with other researchers on large-scale text and data mining initiatives with an eye toward beneficial applications for scholars and students.

Page 3: Data for Research (DfR) service

l

What is Data Mining?

“Data mining is the process of extracting hidden patterns from data” Lyman and Varian 2003

“As data sets and the information extracted from them have grown in size and complexity, direct hands-on data analysis has increasingly been supplemented and augmented with indirect, automatic data processing using more complex and sophisticated tools, methods and models”

Kantardizic 2002

Example: Data mining is using consumer purchasing patterns to predict which products are bought together (gas and flights)

Page 4: Data for Research (DfR) service

l

What is Text Mining?

“In text mining the patterns are extracted from natural language text rather than from structured databases of facts”

Marti Hearst 2003

“Text mining attempts to discover new, previously unknown information by applying techniques from information retrieval, natural language processing and data mining”

National Text Mining Center, UK

Example: Looking at which words co-occur in articles that in order to predict interactions (magnesium and migraines)

Page 5: Data for Research (DfR) service

l

Advanced Technology at JSTOR

•  Why are we here •  Who we are •  What we are doing

Page 6: Data for Research (DfR) service

l

Why are we releasing our system here?

Librarians are the point from which innovation is spread throughout the academy

“New roles and functions for librarians include: •  information consultants and producers •  information gatekeepers and intermediators •  end-user educators •  managers and leaders •  data analysts in data administration centers •  preservers of knowledge •  information equalizers”

Park 1987

A Data Support Role: “Helping students get their hands dirty with the data”

Robin Rice 2008 2nd DCC / RIN Research Data Management Forum

Page 7: Data for Research (DfR) service

l

Who we are - Advanced Technology Research

•  A formal commitment by JSTOR to a pro-active role in technology innovation to face new challenges and opportunities

•  Our MO is to collaborate with and aid the scholarly community •  We area team of world-class scientists and technologists with a proven

track record of innovation

Mission Statement

“The Advanced Technology Research Group is dedicated to creating, discovering and using relevant technologies in support of JSTOR and the broader scholarly community.”

Page 8: Data for Research (DfR) service

l

ATR - Collaborations with the academic community.

For other researchers we provide •  Access to large well-curated data sets •  An exposure channel on JSTOR for research results •  Facilities on JSTOR to expose tools and techniques to users •  Collaboration opportunities

For JSTOR •  We evaluate novel techniques •  We present rapid prototypes to users •  Develop peer relationships with research institutions •  Bring new forms of traffic to the JSTOR data •  Reuse JSTOR data in new and exciting ways

Page 9: Data for Research (DfR) service

l

What we are doing - Projects and Partners

•  University of Washington – Citation Network Analysis •  University of Princeton – Topic Analysis •  UIUC - Software Environment for the Advancement of Scholarly

Research (SEASR) •  University of Michigan – Linguistic tools •  Tufts -Classics Studies •  University of Liverpool – OAI-ORE, Text Mining, Data Analysis •  University of Queensland - Annotations •  Los Alamos National Labs – Annotation Management •  DFKI (German Artificial Intelligence Centre) – Document capture

and reconstruction / remastering. •  XRCE (EuroPARC, France) – Scanned Document Analysis •  …

Page 10: Data for Research (DfR) service

l

Advanced Technology Research - Showcase

Showcase provides a preview of interesting and useful technologies. It allows our research partners to demonstrate their tools and gain feedback and it allows JSTOR to assess candidate technologies before committing them to the product roadmap.

Page 11: Data for Research (DfR) service

l

Advanced Technology Research - Showcase

A place to expose JSTOR data and tools and to encourage new research

•  Provides access to JSTOR datasets •  Facility to expose and use tools created by researchers from

JSTOR and elsewhere. •  Explanation of ongoing research •  As a forum to facilitate connections between groups working with

JSTOR data

URL: http://showcase.jstor.org

Page 12: Data for Research (DfR) service

l

Data for Research

•  DFR is a set of web tools designed to allow for the visual exploration of large-scale data sets and the download of word frequencies in JSTOR articles

•  Beta Version launched 01/23/09

•  URL: http://dfr.jstor.org

Page 13: Data for Research (DfR) service

l

Why Word Frequencies

OCR Data

Citation Data

Usage Data

Word Frequency

Data Requested from JSTOR users in 2008

Page 14: Data for Research (DfR) service

l

What can you do with work counts?

Real life requests:

“I would like to request time and word distribution frequencies in linguistics (specific movement removed). These sorts of frequencies could potentially allow me to better understand and delimit the formation of groups, and the underlying impetus behind these groups as expressed in linguistic form.”

“I would like to create subject headings for material, using word frequency as a guide to selecting the appropriate terms for the headings.”

Page 15: Data for Research (DfR) service

l

DFR – DEMO!

http://dfr.jstor.org

Page 16: Data for Research (DfR) service

l

DFR – Front Page

Page 17: Data for Research (DfR) service

l

Thefe

Page 18: Data for Research (DfR) service

l

Hath Pre - 1900

Page 19: Data for Research (DfR) service

l

Hath – post 1900

Page 20: Data for Research (DfR) service

l

Chymistry

Page 21: Data for Research (DfR) service

l

Download Page

Page 22: Data for Research (DfR) service

l

Files Downloaded

Page 23: Data for Research (DfR) service

l

Chart to show the use of the word Chymistry

0

1

2

3

4

5

6

7

8 16

66

1669

16

72

1675

16

83

1692

16

97

1703

17

12

1738

17

65

1783

18

01

1889

19

07

1916

19

21

1928

19

31

1936

19

41

1945

19

50

1953

19

56

1960

19

64

1967

19

71

1974

19

80

1983

19

87

1990

19

93

1996

19

99

2002

20

05

Page 24: Data for Research (DfR) service

l

Page 25: Data for Research (DfR) service

l

3 Journals from 1957

Agricultural History American Journal Nursing The Annals Mathematics

Page 26: Data for Research (DfR) service

l

Any questions / feedback?

Please take a look at the site and tell us what you think. Email: [email protected]

Contact details Email: [email protected] Phone: 609-986-2282