improving research efficiency: user and content fingerprinting

33
Kevin Cohn Chief Operating Officer @Atypon Improving Research Efficiency Academic Publishing in Europe, Berlin 30 January 2013 User and Content Fingerprinting

Upload: atypon

Post on 31-May-2015

151 views

Category:

Technology


2 download

DESCRIPTION

Academic Publishing in Europe, 30 January 2013 Speaker: Kevin Cohn

TRANSCRIPT

Page 1: Improving Research Efficiency: User and Content Fingerprinting

Kevin CohnChief Operating Officer

@Atypon

Improving Research Efficiency

Academic Publishing in Europe, Berlin30 January 2013

User and Content Fingerprinting

Page 2: Improving Research Efficiency: User and Content Fingerprinting
Page 3: Improving Research Efficiency: User and Content Fingerprinting

• Provider of Software as a Service content delivery for publishers

• Literatum platform used to deliver 15M journal articles and 70,000 eBooks

• 1.5 billion user sessions in 2012

About Atypon

3 Improving Research Efficiency

Page 4: Improving Research Efficiency: User and Content Fingerprinting

• Research efficiency can be greatly improved if publishers tap into their huge volume of data to better connect users to content.

Thesis

4 Improving Research Efficiency

Page 5: Improving Research Efficiency: User and Content Fingerprinting
Page 6: Improving Research Efficiency: User and Content Fingerprinting

Users don’t want “advanced search...”

Page 7: Improving Research Efficiency: User and Content Fingerprinting
Page 8: Improving Research Efficiency: User and Content Fingerprinting

...but they do want relevant results.

Page 9: Improving Research Efficiency: User and Content Fingerprinting

This is the APE I’m looking for.

Page 10: Improving Research Efficiency: User and Content Fingerprinting

Data can drive this behavior.

Page 11: Improving Research Efficiency: User and Content Fingerprinting

• Relevancy is the only order that matters

• > 50% of clicks are to the first result

• > 90% of clicks are on the first page

• Filters/facets aren’t used

Observations

9 Improving Research Efficiency

Page 12: Improving Research Efficiency: User and Content Fingerprinting

• Give users what they want: a simple, Google-like search interface

• But use proprietary data to calculate relevancy for each individual user

Objectives

10 Improving Research Efficiency

Page 13: Improving Research Efficiency: User and Content Fingerprinting

Automatic Topic Modeling11 Improving Research Efficiency

Page 14: Improving Research Efficiency: User and Content Fingerprinting

• Based on a statistical model called latent Dirichlet allocation (LDA)

• Creates “topics:” collections of words that occur together with great frequency

Topic #1: {mammal, primate, hominoidea}

Topic #2: {academic, publishing, europe}

Automatic Topic Modeling

12 Improving Research Efficiency

Page 15: Improving Research Efficiency: User and Content Fingerprinting

13 Improving Research Efficiency

Page 16: Improving Research Efficiency: User and Content Fingerprinting

13 Improving Research Efficiency

Page 17: Improving Research Efficiency: User and Content Fingerprinting

Topic #1

Page 18: Improving Research Efficiency: User and Content Fingerprinting

Topic #2

Page 19: Improving Research Efficiency: User and Content Fingerprinting

16 Improving Research Efficiency

Page 20: Improving Research Efficiency: User and Content Fingerprinting

16 Improving Research Efficiency

Page 21: Improving Research Efficiency: User and Content Fingerprinting

17 Improving Research Efficiency

Page 22: Improving Research Efficiency: User and Content Fingerprinting

17 Improving Research Efficiency

Page 23: Improving Research Efficiency: User and Content Fingerprinting

17 Improving Research Efficiency

Page 24: Improving Research Efficiency: User and Content Fingerprinting

18 Improving Research Efficiency

Page 25: Improving Research Efficiency: User and Content Fingerprinting

• My search for “APE” returns results about this conference, not primates

• The same is true for recommendations

• Better related articles (topics 1 and 2 are not related, despite sharing “APE”)

Applications

19 Improving Research Efficiency

Page 26: Improving Research Efficiency: User and Content Fingerprinting

• Topics are self-updating = low-cost, low-maintenance

• Flat (not hierarchical) = avoids troublesome questions about classification

• Probabilistic (not binary) = better at expressing relevancy to topics

Not a Taxonomy/Ontology...

20 Improving Research Efficiency

Page 27: Improving Research Efficiency: User and Content Fingerprinting

21 Improving Research Efficiency

Page 28: Improving Research Efficiency: User and Content Fingerprinting

21 Improving Research Efficiency

Page 29: Improving Research Efficiency: User and Content Fingerprinting

• Topics are “collections of words that occur together with great frequency”

• Knowing that “APE” is an acronym for “Academic Publishing in Europe”

• Knowing that “CC0” and “CC BY” are Creative Commons license types

...But Is Helped by Them

22 Improving Research Efficiency

Page 30: Improving Research Efficiency: User and Content Fingerprinting

• We didn’t invent ATM (or LDA)

• Our implementation started as a collaboration with academic researchers...

• ...and will require considerable experimentation and testing to get right

Worth Mentioning

23 Improving Research Efficiency

Page 31: Improving Research Efficiency: User and Content Fingerprinting

• Usage is not personally identifiable

• Usage is not shared with third parties

• Users can opt out of personalization

Privacy

24 Improving Research Efficiency

Page 32: Improving Research Efficiency: User and Content Fingerprinting

• ATM uses proprietary data to calculate relevancy for each individual user

• Gives users what they want: a simple, Google-like search interface

• Improves research efficiency by freeing up searching time for reading

Summary

25 Improving Research Efficiency

Page 33: Improving Research Efficiency: User and Content Fingerprinting

Thank You

26 Improving Research Efficiency

[email protected]

Kevin CohnChief Operating Officer, Atypon