nypl labs 9-10-13 hackshackers presentation

Post on 09-Feb-2016

93 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides from NYPL Labs talk to Hacks Hackers NYC on September 10, 2013.

TRANSCRIPT

The Great Data Migration

or... hackin’ the library with nypl labs

9/10/13

a What is NYPL Labs?

Ben Vershbow | Founder & Manager - NYPL Labsbenjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

a New York Public Library

a New York Public Library

existing patron base

a New York Public Library

existing patron base+ global community of users

a New York Public Library

free for all to use

a New York Public Library

free for all to use+ hack / build / improve

a New York Public Library

books, archives, images, documents, A/V etc.

a New York Public Library

+ digital material, data & APIsbooks, archives, images, documents, A/V etc.

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

Map Warpermaps.nypl.org

a

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

a

What’s on the Menu?menus.nypl.org

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

a

Stereogranimatorstereo.nypl.org

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

NYPL NYPLBPL↑ ↑ ↑

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

+

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

a

Direct Me NYC: 1940directme.nypl.org

x

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

NYPL  collec-ons

Genealogy  community

NYPL  collec-ons

Genealogy  community

NYPL  collec-ons

U.S.  Geological  Survey

Genealogy  community

NYPL  collec-ons

U.S.  Geological  Survey OpenStreetMap  (via  MapBox)

Genealogy  community

NYPL  collec-ons

U.S.  Geological  Survey OpenStreetMap  (via  MapBox)

New  York  Times  API

Genealogy  community

NYPL  collec-ons

U.S.  Geological  Survey OpenStreetMap  (via  MapBox)

New  York  Times  API

NYPL  users  &  staff

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary

Textapi.repo.nypl.org

a

Crowd-sourcing the transcription of historical theater programs

Ensembleensemble.nypl.org

Paul Beaudoinpaulbeaudoin@nypl.org | @nonword

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

FromThePage / Transcribe Bentham

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

Scripto

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

T-PEN

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

Freeform text transcription is not complex entity extraction

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

Crowd sourcing complex entity extraction of documents with inconsistent layouts

e.g. historical theater programs

Ensemble

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

NYPL Labs | What’s on the Menu?

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

NYPL Labs | What’s on the Menu?

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

Transcribable & DocumentCloud

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

Zooniverse | Notes From Nature

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

Zooniverse | Old Weather

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

NYPL Labs | Ensemble

http://ensemble.nypl.org

Built from Scribehttps://github.com/zooniverse/scribe

@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword

demo

a

Archives & Manuscripts

archives.nypl.org

Trevor Thorntontrevorthornton@nypl.org | @trevorthornton

Matt Millermatthewmiller@nypl.org | @thisismmiller

a

Archives & Manuscripts

archives.nypl.org

Trevor Thorntontrevorthornton@nypl.org | @trevorthornton

Matt Millermatthewmiller@nypl.org | @thisismmiller

or: where to find Timothy Leary’s Powerglove

Unique, unpublished materials: correspondence, personal papers, organizational records, literary manuscripts, AV documentation, electronic records

Typically included within discrete collections, which are often acquired in whole

Finding aids provide researchers with guidance on collection contents

EAD (Encoded Archival Description)XML schema for encoding finding aids

NYPL Archives & Manuscripts

@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | trevorthornton@nypl.org | @trevorthorntonMatt Miller | matthewmiller@nypl.org | @thisismmiller

The traditional model for presenting EAD-encoded finding

aids

@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | trevorthornton@nypl.org | @trevorthorntonMatt Miller | matthewmiller@nypl.org | @thisismmiller

What we did (more or less)

@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | trevorthornton@nypl.org | @trevorthorntonMatt Miller | matthewmiller@nypl.org | @thisismmiller

System overview

@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | trevorthornton@nypl.org | @trevorthorntonMatt Miller | matthewmiller@nypl.org | @thisismmiller

a

Video Annotation & Synchronization

digitalcollections.nypl.org/tools/video/compose

For NYPL Digital CollectionsJerome Robbins Dance Division

Brian Foo | brianfoo@nypl.org | @beefoo

ScenariosJerome Robbins Dance Division

Enhance & Improve video data• e.g. Sync multiple angles of the same performance• e.g. Annotate a performance

Discovery• e.g. Compare multiple performances

Instruction• e.g. Enhance lecture with multimedia

Probably many more• e.g. Mash-ups

@nypl_labs | #HacksHackers | Video Annotation & Synchronization | Brian Foo | brianfoo@nypl.org | @beefoo

Technology Used

RoR - Backend Framework

Backbone.js - Javascript MVC Framework

Brightcove - Video delivery platform

Popcorn.js - HTML5 media framework by Mozilla• Does not natively support multi-video• Does not natively support Brightcove

@nypl_labs | #HacksHackers | Video Annotation & Synchronization | Brian Foo | brianfoo@nypl.org | @beefoo

demo

ateh vectorizor

github.com/NYPL/map-vectorizer

mauricio giraldo arteagaNYPL Labs

mauriciogiraldo@nypl.org | @mgiraldo

background

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

building =

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

not paper

building =

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

not paper

not black

building =

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

not paper

not black

> 20m2 (~180ft2)

building =

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

not paper

not black

> 20m2 (~180ft2)

< 3,000m2 (~27,000ft2)

building =

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

not paper

not black

> 20m2 (~180ft2)

< 3,000m2 (~27,000ft2)

+ attributes (color, dots, crosses...)

building =

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

process

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

https://github.com/NYPL/map-vectorizer

test it! (please)

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

gdal_polygonize.pygenerates polygons automagically!

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

$ gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test

$ gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test

we need to optimize the input

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

we need to simplify the output

(for those polygons that we care about)

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

pts = spsample(polygon, n=1000, type="hexagonal")

pts = spsample(polygon, n=1000, type="hexagonal")pts = spsample(polygon, n=1000, type="regular")

pts = spsample(polygon, n=1000, type="hexagonal")pts = spsample(polygon, n=1000, type="regular")pts = spsample(polygon, n=1000, type="random")

pts = spsample(polygon, n=1000, type="hexagonal")pts = spsample(polygon, n=1000, type="regular")pts = spsample(polygon, n=1000, type="random")pts = spsample(polygon, n=500, type="hexagonal")

x.as = ashape(pts@coords,alpha=2.0)

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

we need to validate the output

(polygonzo!)

@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo

demo

a Old NYCDan Vanderkam

SOME OTHER COMPANYdanvdk@gmail.com | @danvdk

~40,000 images

Mostly taken from 1920–1950

Many were taken by Percy Loomis Sperr, who was commissioned by the library to take photographs of buildings soon to be demolished

Milstein Collection

@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | danvdk@gmail.com | @danvdk

demo

Images on the NYPL site were small, pictures even smaller.

What’s MrSID?

Challenges

@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | danvdk@gmail.com | @danvdk

First find the areas that aren’t brown:

@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | danvdk@gmail.com | @danvdk

Then find the Rectangles:

@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | danvdk@gmail.com | @danvdk

UI work

Better geocoding for boroughs with complicated streets

Keep your eyes out for an Old NYC launch this fall!

http://www.danvk.org/wp/2013-02-09/finding-pictures-in-pictures/

What’s left?

@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | danvdk@gmail.com | @danvdk

awhat’s next

(hint: you)

Dave Riordandavidriordan@nypl.org | @riordan

this thing we’re doing is way too big to do alone

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

this used to be a reservoir of water, now its a reservoir of knowledge

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

–an anonymous nypl docent

now its a reservoir of data

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

now its time to use it

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

datasetsMaps (GIS + GeoTIFFs) | Digital Collections API |

Menus API | City Directories | Archives | Ensemble API

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

there will be more

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

HackathonsPublishing Hackathon | Maphack

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

there will be more

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

NYPL Tech Challenges(coming soon)

like the x-prize but for way lower stakes and civic good

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

Questions for you:

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

what kind of things would you want to work on with

nypl labs?

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

making ebooks easier to borrow?

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

opening up historical social networks?

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

we want to know what questions you’re

interested in

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

how you want to use the library today...

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

...will be how everyone will use the library very

soon.

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

help us make that happen

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

its gonna be awesome

@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan

@nypl_labs | @subsublibrary | @nonword | @beefoo @trevorthornton | @thisismattmiller | @mgiraldo | @riordan

Special thanks to: Chrys Wu

top related