Video from the Web and its Context: Defining Boundaries, Developing
TechnologyNDIIPP Partner Meeting
Arlington, VAJuly 9, 2008
Gary Marchionini & Helen TibboSchool of Information & Library Science, UNC-
Chapel HIll
Context Emerges Through Use
• A primary concern is capturing dynamics associated with video content.
• Given a video, what associated context will help people in the future understand the video and its role in human history?
Usage CharacteristicsBeyond the usual metadata (Title, Description, Username, Time when video
added, Duration in seconds, Category, Keywords)
• Number of times viewed• Number of times annotated
– Text– Video
• Rank in Results List• Number of times favorited• Number of times linked to
– From the web– From (specific) blogs
• Allusions and Mashups • References, reviews in other venues (print, e-media)
What to Harvest? A Collection Development Issue
• Topic– US Presidential Election 2008– Epidemics and pandemics– Energy– Medical issues– Natural disasters– Truth commissions
• Source (where to harvest)– YouTube (Blinkx, thenewsroom, etc.)– Blogpsphere– Specialized collections (e.g., NYT, CNN, CSPAN, specialized
archives, open video, public.tv, etc.)
How to Harvest• Crawl (follow links) vs. query (use API)• Metadata and context plus video files• Storage (DB, SRB)• Parameters– How often? (daily)– How many results? (100 hits)– How many hops (0)
• (use YT API to execute 57 queries each day, store results in MySQL db, flash files in SRB store house in Odum Institute)
Progress (adapted from JCDL 08 paper)
~21000 videos today
Curator Tool (ContextMiner)(www.contextminer.org )