machines are people too

50
MACHINES ARE PEOPLE TOO Dr. Paul Groth | @pgroth | pgroth.com Disruptive Technology Director Elsevier Labs | @elsevierlabs Theory and Practice of Digital Libraries 2017

Upload: paul-groth

Post on 23-Jan-2018

514 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Machines are people too

MACHINES ARE PEOPLE TOO

Dr. Paul Groth | @pgroth | pgroth.com

Disruptive Technology Director

Elsevier Labs | @elsevierlabs

Theory and Practice of Digital Libraries 2017

Page 2: Machines are people too

THANKS FOR CONVERSATION & SLIDES!

Riffing off of Brad’s Dublin Core

2016 keynote

https://www.slideshare.net/bpa777/

dc2016-keynote-20161013-

67164305

Page 3: Machines are people too

THE SUCCESS OF DIGITAL LIBRARIES

“Live every day like it's NBER day”

Page 4: Machines are people too

THE SUCCESS OF DIGITAL LIBRARIES

Page 5: Machines are people too

THE SUCCESS OF DIGITAL LIBRARIES

Page 6: Machines are people too

THE SUCCESS OF DIGITAL LIBRARIES

Page 7: Machines are people too

THE SUCCESS OF DIGITAL LIBRARIES

Page 8: Machines are people too

THE NEXT MEDIA: DATA

Page 9: Machines are people too
Page 10: Machines are people too

FAIR EVERYWHERE

Page 11: Machines are people too
Page 12: Machines are people too

RESEARCH DATA MANAGEMENT

Page 13: Machines are people too

DATA SEARCH

Antony Scerri, John Kuriakose, Amit Ajit Deshmane, Mark Stanger, Peter Cotroneo, Rebekah Moore, Raj Naik, Anita de Waard;

Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge, Database, Volume 2017, 1 January 2017,

bax056, https://doi.org/10.1093/database/bax056

Page 14: Machines are people too

THE CENTRALITY OF THE USER

Page 15: Machines are people too

HOW DO RESEARCHERS SEARCH FOR DATA?

Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A.,

& Wyatt, S. (2017). Searching Data: A Review of

Observational Data Retrieval Practices. arXiv

preprint arXiv:1707.06937.

Some observations from @gregory_km

survey:

1. The needs and behaviours of specific user groups

(e.g. early career researchers, policy makers,

students) are not well documented.

2. Background uses of observational data are better

documented than foreground uses.

3. Reconstructing data tables from journal articles,

using general search engines, and making direct data

requests are common.

Page 16: Machines are people too

BUT ARE WE MISSING A USER?

Page 17: Machines are people too

WHY MACHINES?

Page 18: Machines are people too

ELSEVIER’S BUSINESS: PROVIDING ANSWERS FOR RESEARCHERS, DOCTORS AND NURSES

My work is moving towards a new field; what should I know?

• Journal articles, reference works, profiles of researchers, funders & institutions

• Recommendations of people to connect with, reading lists, topic pages

How should I treat my patient given her condition & history?

• Journal articles, reference works, medical guidelines, electronic health records

• Treatment plan with alternatives personalized for the patient

How can I master the subject matter of the course I am taking?

• Course syllabus, reference works, course objectives, student history

• Quiz plan based on the student’s history and course objectives

Page 19: Machines are people too

INFORMATION OVERLOAD

Page 20: Machines are people too

WHAT CAN MACHINE INTELLIGENCE DO TODAY?

If there’s a task that a normal person can do with

less than one second of thinking, there’s a very

good chance we can automate it with deep

learning.

Andrew Ng, Chief Scientist, Baidu (lecture at Bay Area Deep Learning

School, Stanford, CA, September 24, 2016)

Page 21: Machines are people too

HUMAN SPEECH RECOGNITION

Was 23% in 2013, and over 35% in 2012.

https://venturebeat.com/2017/05/17/googles-speech-recognition-technology-now-has-a-4-9-word-error-rate/

Page 22: Machines are people too

IMAGE RECOGNITION

https://devblogs.nvidia.com/parallelforall/author/czhang/

Page 23: Machines are people too

THESE RESULTS ARE DRIVEN BY DATA

“The paradigm shift of the ImageNet

thinking is that while a lot of people

are paying attention to models, let’s

pay attention to data, …”

– Prof. Fei-Fei Li [1]

[1] The data that transformed AI research—and possibly the world

https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-

possibly-the-world/

Page 24: Machines are people too

THE GROWTH IN DATA ENGINEERS

https://www.stitchdata.com/resources/reports/the-state-of-data-engineering

Page 25: Machines are people too

BUT DO DIGITAL LIBRARIES HELP MACHINES?

• Machines’ proficiency in learning to answer questions from text, audio,

images and video will depend on our ability to train them effectively to read

information from the Web

• How machines read the Web today

• Crawling and indexing Web resources, possibly semantically tagged

(e.g. using schema.org)

• Find-and-follow crawling of open linked data resources for ontology and

data sharing and reuse

• Programmatic access to APIs mediated through HTTP/S and other

Internet protocols

Page 26: Machines are people too

DIGITAL LIBRARIES & LINKED DATA STANDARDS

Page 27: Machines are people too

THE SEMANTIC WEB WAS INTENDED FOR MACHINE READING

… that’s the real idea behind the Semantic Web:

letting software use the vast collective genius

embedded in its published pages.

Swartz, A. (2013). Aaron Swartz's A programmable Web: An unfinished

work. San Rafael, Calif.: Morgan & Claypool Publishers.

Page 28: Machines are people too

BUT THE SEMANTIC WEB IS BUILT FOR PEOPLE, NOT MACHINES

• The Semantic Web is largely a logicist take on the way knowledge is to be

represented

• The latest advances in machine intelligence are based on a connectionist

approach to knowledge representation

• There is a gap between how knowledge is represented in the Semantic Web

and what deep learning is exploiting to such good effect

• The Semantic Web is silent about how machines can become better

readers, and hence better partners in the second machine age

• How will we evolve metadata standards to better accommodate machines?

Page 29: Machines are people too

MACHINE READING IS ENABLED BY MACHINE LEARNING

input

output

algorithm

input

output

model

learning

architecture

data

Programming

Machine learningGPU

CPU

CPU

Page 30: Machines are people too

MACHINES SEE THINGS DIFFERENTLY THAN PEOPLE

From: Alain, G. and Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv:1610.01644v1.

Page 31: Machines are people too

MACHINES LEARN THINGS DIFFERENTLY THAN PEOPLE

Page 32: Machines are people too

VOCABULARIES ARE SETS OF VECTOR EMBEDDINGS

From: Eisner, B., Rocktäschel, T., Augenstein, I., Bošnjak, M. and Riedel, S. (2016). Emoji2vec: learning emoji representations from their description. arXiv:1609.08359v1.

Page 33: Machines are people too

TRAINING DATASETS ARE GROWING IN VOLUME AND COVERAGE

From: Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B. and Vijayanarasimhan, S. YouTube-8M: a large-scale video classification benchmark. arXiv:1609.08675.

Page 34: Machines are people too

MODELS ARE BECOMING REUSABLE DATA RESOURCES

Check out: sujitpal.blogspot.com for more

Page 35: Machines are people too

MACHINE LEARNING DATASETS AND MODELS ARE BECOMING PART OF THE WEB

• Machines need lots and lots of data to learn how to read

• Datasets with ad-hoc formats are being made openly available

• Open Images “~9 million URLs to images that have been annotated with labels spanning over 6000 categories” (The Open Images Dataset.

(n.d.). Retrieved September 29, 2016, from https://github.com/openimages/dataset.)

• YouTube-8M : “8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of

4800 Knowledge Graph entities” (Vijayanarasimhan S. and Natsev, P. (2016). Announcing YouTube-8M: A Large and Diverse Labeled Video

Dataset for Video Understanding Research. Retrieved September 29, 2016, https://research.googleblog.com/2016/09/announcing-youtube-8m-

large-and-diverse.html.)

• Stanford Natural Language Inference: “570k human-written English sentence pairs manually labeled for balanced classification with the

labels entailment, contradiction, and neutral, supporting the task of natural language inference” (The Stanford Natural Language Inference

(SNLI) Corpus. (n.d.). Retrieved September 29, 2016, from http://nlp.stanford.edu/projects/snli/.)

• Standard architectures for machine (deep) learning are being released as open source

• Dense neural networks for classification

• Convolutional neural networks for image, audio and video recognition

• Recurrent neural networks for sequence processing and generation

• Advances in the field are being published quickly and transferred to industrial application just as

quickly

Page 36: Machines are people too

THE OPPORTUNITY FOR LIBRARIANS AND PUBLISHERS

As machines become increasingly capable of general-

purpose language understanding, the burden of effort in

building machine intelligences will shift from software

engineering to the acquisition, organization and curation

of training content and data.

Page 37: Machines are people too

THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER

SAVE THE TIME OF THE MACHINE READER

Perhaps this law is not so self-evident as the others.

None the less, it has been responsible for many

reforms in library administration and has a great

potentiality for effecting many more reforms in the

future.

Ranganathan, S.R. (1931). The five laws of library science. Madras: The

Madras Library Association.

Page 38: Machines are people too

IMAGE SOURCE: HTTP://WESTPORTLIBRARY.ORG/ABOUT/NEWS/ROBOTS-ARRIVE-WESTPORT-LIBRARY

WHAT DOES IT LOOK LIKE TO HAVE MACHINES AS

LIBRARY PATRONS?

Tasks

1. Dataset / Model / Vocabulary Curation

2. Combating Bias

3. Explanation

4. Interoperability

5. Data Narratives

Page 39: Machines are people too

DATASET CURATION

Page 40: Machines are people too

MODEL CURATION

Page 41: Machines are people too

VOCABULARY CURATION

Page 42: Machines are people too

BATTLING BIAS

Page 43: Machines are people too

BATTLING BIAS: ALGORITHMIC LITERACY

Algorithms all have their own ideologies. As computational

methods and data science become more and more a part of

every aspect of our lives, it is essential that work begin to ensure

there is a broader literacy about these techniques and that

there is an expansive and deep engagement in the ethical

issues surrounding them.”

– Trevor Owens (Library of Congress / Former IMLS)

http://www.pewinternet.org/2017/02/08/theme-7-the-need-grows-for-algorithmic-literacy-transparency-and-oversight/

Page 44: Machines are people too

THE RIGHT TO AN EXPLANATION

“The data subject shall have the right to obtain … the

existence of automated decision-making, including profiling

… meaningful information about the logic involved, as

well as the significance and the envisaged consequences

of such processing for the data subject.”

EU General Data Protection Chapter 3, Article 15

Page 45: Machines are people too

PROVENANCE FOR EXPLANATION

Credits: Curt Tilmes, Peter Fox

Tilmes, C.; Fox, P.; Ma, X.; McGuinness, D.L.; Privette, A.P.; Smith, A.; Waple, A.; Zednik, S.; Zheng, J.G.,

"Provenance Representation for the National Climate Assessment in the Global Change Information System,"

Geoscience and Remote Sensing, IEEE Transactions on , vol.51, no.11, pp.5160,5168, Nov. 2013

Page 46: Machines are people too

NATIONAL CLIMATE CHANGE ASSESSMENT

PROVENANCE

Page 47: Machines are people too

INTEROPERABILITY

Page 48: Machines are people too

DATA NARRATIVE GENERATION

Towards Automating Data Narratives.

Gil, Y.; and Garijo, D. In Proceedings of the

Twenty-Second ACM International Conference

on Intelligent User Interfaces (IUI-17),

Limassol, Cyprus, 2017.

Page 49: Machines are people too

THE CHALLENGE: DIGITAL LIBRARIES FOR MACHINES

• Digital Libraries have made tremendous strides in making media available

• The investment in Linked Data and APIs has made integration and building

applications easier and can help machine reader use cases

• But a new user needs new support:

• new forms of media (models, data)

• new vocabulary representations

• new forms of transparency

• new ways to interoperate

• new mechanisms to communicate

• ….

Page 50: Machines are people too

THANK YOU

Dr. Paul Groth | @pgroth | pgroth.com

labs.elsevier.com