big metadata: mining special collections catalogs for new knowledge
TRANSCRIPT
Big MetadataMining Special Collections Catalogs for New Knowledge
@AllisonJaiODell
#rbms152015 RBMS Conference, 25 June, Oakland & Berkeley, CA
#gillsans #sorrynotsorry
MetadataData about data
“Metadata was traditionally in the card catalogs of libraries”
-- Wikipedia
“We kill people based on metadata”
Big Data“Big data is an evolving term that describes
any voluminous amount of structured, semi-structured, and unstructured data that has the potential to be mined for information.”
-- Margaret Rouse
IT Acronyms: A Quick Reference Guide
Volume
Velocity
Variety
Veracity
Big MetadataA voluminous amount of semi-structured
data that has the potential to be mined for information
Data Mining“Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in
novel ways that are both understandable and useful to the data owner.”
-- David Hand, Heikki Mannila, Padhraic Smyth
Principles of Data Mining
Digital Humanities“By digital humanities, we mean research
that uses information technology as a central part of its methodology, for creating and/or
processing data.
“The digital humanities used to be known as Humanities Computing, or ICT (Information
and Communications Technology) for humanities research. The use of the term
reflects a growing sense of the importance that digital tools and resources now have for
humanities subjects.”
-- University of Oxford
What are the Digital Humanities?
Visualization“Data visualization is the presentation of data
in a pictorial or graphical format. For centuries, people have depended on visual
representations such as charts and maps to understand information more easily and
quickly”
-- SAS
Topic Modeling“Topic models provide a simple way to
analyze large volumes of unlabeled text. A ‘topic’ consists of a cluster of words that
frequently occur together.”
-- MAchine Learning for LanguagE Toolkit (MALLET)
Pattern Matching“Pattern matching is the act of checking a
given sequence of tokens for the presence of the constituents of some pattern.”
-- Wikipedia
“A regular expression (regex or regexp for short) is a special text string for describing a
search pattern. You can think of regular expressions as wildcards on steroids.”
-- regular-expressions.info
ToolsR
D3
Gephi
MIT Exhibit
Tableau
FusionCharts
PALLADIO
MALLET
Topic-Modeling-Tool
ArchExtract
Stanford Named Entity Recognizer
Jigsaw
More in the DH Toychest
Shop Your Closet“You really can repurpose what you have.
Look in the back of the closet at the garments and whole outfits you forgot you
have. Mix it all up in new combinations. “
-- Deborah L. Jacobs
10 Ways to ‘Shop Your Closet’
Provenance Metadata“Assertions about description statements or
description sets”
-- DCMI Metadata Provenance Task Group
Creation & revision history
Policy documentation
SummaryMetadata is data
Your catalog is full of data
Do some data mining
Make some cool discovery experiences
Make your researchers happy
Questions?Allison Jai O’Dell
Metadata Librarian
University of Florida
@AllisonJaiODell
#rbms15