nypl labs 9-10-13 hackshackers presentation
Post on 09-Feb-2016
93 Views
Preview:
DESCRIPTION
TRANSCRIPT
The Great Data Migration
or... hackin’ the library with nypl labs
9/10/13
a What is NYPL Labs?
Ben Vershbow | Founder & Manager - NYPL Labsbenjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
a New York Public Library
a New York Public Library
existing patron base
a New York Public Library
existing patron base+ global community of users
a New York Public Library
free for all to use
a New York Public Library
free for all to use+ hack / build / improve
a New York Public Library
books, archives, images, documents, A/V etc.
a New York Public Library
+ digital material, data & APIsbooks, archives, images, documents, A/V etc.
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
Map Warpermaps.nypl.org
a
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
a
What’s on the Menu?menus.nypl.org
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
a
Stereogranimatorstereo.nypl.org
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
NYPL NYPLBPL↑ ↑ ↑
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
+
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
a
Direct Me NYC: 1940directme.nypl.org
x
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
NYPL collec-ons
Genealogy community
NYPL collec-ons
Genealogy community
NYPL collec-ons
U.S. Geological Survey
Genealogy community
NYPL collec-ons
U.S. Geological Survey OpenStreetMap (via MapBox)
Genealogy community
NYPL collec-ons
U.S. Geological Survey OpenStreetMap (via MapBox)
New York Times API
Genealogy community
NYPL collec-ons
U.S. Geological Survey OpenStreetMap (via MapBox)
New York Times API
NYPL users & staff
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | benjaminvershbow@nypl.org | @subsublibrary
Textapi.repo.nypl.org
a
Crowd-sourcing the transcription of historical theater programs
Ensembleensemble.nypl.org
Paul Beaudoinpaulbeaudoin@nypl.org | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
FromThePage / Transcribe Bentham
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
Scripto
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
T-PEN
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
Freeform text transcription is not complex entity extraction
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
Crowd sourcing complex entity extraction of documents with inconsistent layouts
e.g. historical theater programs
Ensemble
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
NYPL Labs | What’s on the Menu?
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
NYPL Labs | What’s on the Menu?
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
Transcribable & DocumentCloud
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
Zooniverse | Notes From Nature
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
Zooniverse | Old Weather
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
NYPL Labs | Ensemble
http://ensemble.nypl.org
Built from Scribehttps://github.com/zooniverse/scribe
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
demo
FromThePage beta.fromthepage.com
T-PEN t-pen.org
Whats’ on the Menu? menus.nypl.org
Transcribable github.com/propublica/transcribable
Notes from Nature notesfromnature.org
Old Weather oldweather.org
Ensemble ensemble.nypl.org
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | paulbeaudoin@nypl.org | @nonword
a
Archives & Manuscripts
archives.nypl.org
Trevor Thorntontrevorthornton@nypl.org | @trevorthornton
Matt Millermatthewmiller@nypl.org | @thisismmiller
a
Archives & Manuscripts
archives.nypl.org
Trevor Thorntontrevorthornton@nypl.org | @trevorthornton
Matt Millermatthewmiller@nypl.org | @thisismmiller
or: where to find Timothy Leary’s Powerglove
Unique, unpublished materials: correspondence, personal papers, organizational records, literary manuscripts, AV documentation, electronic records
Typically included within discrete collections, which are often acquired in whole
Finding aids provide researchers with guidance on collection contents
EAD (Encoded Archival Description)XML schema for encoding finding aids
NYPL Archives & Manuscripts
@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | trevorthornton@nypl.org | @trevorthorntonMatt Miller | matthewmiller@nypl.org | @thisismmiller
The traditional model for presenting EAD-encoded finding
aids
@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | trevorthornton@nypl.org | @trevorthorntonMatt Miller | matthewmiller@nypl.org | @thisismmiller
What we did (more or less)
@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | trevorthornton@nypl.org | @trevorthorntonMatt Miller | matthewmiller@nypl.org | @thisismmiller
System overview
@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | trevorthornton@nypl.org | @trevorthorntonMatt Miller | matthewmiller@nypl.org | @thisismmiller
a
Video Annotation & Synchronization
digitalcollections.nypl.org/tools/video/compose
For NYPL Digital CollectionsJerome Robbins Dance Division
Brian Foo | brianfoo@nypl.org | @beefoo
ScenariosJerome Robbins Dance Division
Enhance & Improve video data• e.g. Sync multiple angles of the same performance• e.g. Annotate a performance
Discovery• e.g. Compare multiple performances
Instruction• e.g. Enhance lecture with multimedia
Probably many more• e.g. Mash-ups
@nypl_labs | #HacksHackers | Video Annotation & Synchronization | Brian Foo | brianfoo@nypl.org | @beefoo
Technology Used
RoR - Backend Framework
Backbone.js - Javascript MVC Framework
Brightcove - Video delivery platform
Popcorn.js - HTML5 media framework by Mozilla• Does not natively support multi-video• Does not natively support Brightcove
@nypl_labs | #HacksHackers | Video Annotation & Synchronization | Brian Foo | brianfoo@nypl.org | @beefoo
demo
ateh vectorizor
github.com/NYPL/map-vectorizer
mauricio giraldo arteagaNYPL Labs
mauriciogiraldo@nypl.org | @mgiraldo
background
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
not paper
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
not paper
not black
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
not paper
not black
> 20m2 (~180ft2)
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
not paper
not black
> 20m2 (~180ft2)
< 3,000m2 (~27,000ft2)
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
not paper
not black
> 20m2 (~180ft2)
< 3,000m2 (~27,000ft2)
+ attributes (color, dots, crosses...)
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
process
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
https://github.com/NYPL/map-vectorizer
test it! (please)
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
gdal_polygonize.pygenerates polygons automagically!
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
$ gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
$ gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
we need to optimize the input
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
we need to simplify the output
(for those polygons that we care about)
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
pts = spsample(polygon, n=1000, type="hexagonal")
pts = spsample(polygon, n=1000, type="hexagonal")pts = spsample(polygon, n=1000, type="regular")
pts = spsample(polygon, n=1000, type="hexagonal")pts = spsample(polygon, n=1000, type="regular")pts = spsample(polygon, n=1000, type="random")
pts = spsample(polygon, n=1000, type="hexagonal")pts = spsample(polygon, n=1000, type="regular")pts = spsample(polygon, n=1000, type="random")pts = spsample(polygon, n=500, type="hexagonal")
x.as = ashape(pts@coords,alpha=2.0)
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
we need to validate the output
(polygonzo!)
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | mauriciogiraldo@nypl.org | @mgiraldo
demo
a Old NYCDan Vanderkam
SOME OTHER COMPANYdanvdk@gmail.com | @danvdk
~40,000 images
Mostly taken from 1920–1950
Many were taken by Percy Loomis Sperr, who was commissioned by the library to take photographs of buildings soon to be demolished
Milstein Collection
@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | danvdk@gmail.com | @danvdk
demo
Images on the NYPL site were small, pictures even smaller.
What’s MrSID?
Challenges
@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | danvdk@gmail.com | @danvdk
First find the areas that aren’t brown:
@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | danvdk@gmail.com | @danvdk
Then find the Rectangles:
@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | danvdk@gmail.com | @danvdk
UI work
Better geocoding for boroughs with complicated streets
Keep your eyes out for an Old NYC launch this fall!
http://www.danvk.org/wp/2013-02-09/finding-pictures-in-pictures/
What’s left?
@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | danvdk@gmail.com | @danvdk
awhat’s next
(hint: you)
Dave Riordandavidriordan@nypl.org | @riordan
this thing we’re doing is way too big to do alone
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
this used to be a reservoir of water, now its a reservoir of knowledge
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
–an anonymous nypl docent
now its a reservoir of data
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
now its time to use it
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
datasetsMaps (GIS + GeoTIFFs) | Digital Collections API |
Menus API | City Directories | Archives | Ensemble API
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
there will be more
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
HackathonsPublishing Hackathon | Maphack
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
there will be more
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
NYPL Tech Challenges(coming soon)
like the x-prize but for way lower stakes and civic good
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
Questions for you:
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
what kind of things would you want to work on with
nypl labs?
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
making ebooks easier to borrow?
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
opening up historical social networks?
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
we want to know what questions you’re
interested in
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
how you want to use the library today...
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
...will be how everyone will use the library very
soon.
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
help us make that happen
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
its gonna be awesome
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | davidriordan@nypl.org | @riordan
@nypl_labs | @subsublibrary | @nonword | @beefoo @trevorthornton | @thisismattmiller | @mgiraldo | @riordan
Special thanks to: Chrys Wu
top related