fox@vt fox.cs.vt dept. of computer science, virginia tech
DESCRIPTION
CS4624 Closing Slides (May 6, 2009) “From Multimedia to Hypertext to Information Access to Digital Libraries” by Edward A. Fox. [email protected] http://fox.cs.vt.edu Dept. of Computer Science, Virginia Tech Blacksburg, VA 24061 USA. Acknowledgements. Mentors (Licklider, Kessler, Salton) - PowerPoint PPT PresentationTRANSCRIPT
1
CS4624 Closing Slides(May 6, 2009)
“From Multimedia to Hypertext to Information Access to Digital Libraries”
by Edward A. Fox
• [email protected] http://fox.cs.vt.edu
• Dept. of Computer Science, Virginia Tech
• Blacksburg, VA 24061 USA
Acknowledgements
• Mentors (Licklider, Kessler, Salton)• Virginia Tech, CS, Digital Library Research
Laboratory• NSF and other sponsors• Students, colleagues, co-investigators• Marcos André Gonçalves, Doug Gorton, Rao
Shen, ...• Barbara Wildemuth, Jeffrey Pomerantz,
Sanghee Oh, Seungwon Yang2
3
CC2001 Information Management Areas
IM1. Information models and systems*
IM8. Distributed DBs
IM2. Database systems* IM9. Physical DB design
IM3. Data modeling* IM10. Data mining
IM4. Relational DBs IM11. Information storage and retrieval
IM5. Database query languages
IM12. Hypertext and hypermedia
IM6. Relational DB design IM13. Multimedia information & systems
IM7. Transaction processing IM14. Digital libraries
* Core components
4
DL Curriculum FrameworkSemester 1:
DL collections:development/creation
Semester 2:DL services and
sustainability
CO
UR
SE
ST
RU
CT
UR
E
DigitizationStorage
Interchange
Digital objectsCompositesPackages
MetadataCataloging
Author submission
NamingRepositories
Archives
Spaces(conceptual,geographic,2/3D, VR)
Architectures(agents, buses,
wrappers/mediators)Interoperability
Services(searching,
linking, browsing, etc.)
Intellectual property rights mgmt.
PrivacyProtection (watermarking)
Archiving and preservation
Integrity
Architectures(agents, buses,
wrappers/mediators)Interoperability
CO
RE
DL
TO
PIC
S
DocumentsE-publishing
Markup
Info. NeedsRelevanceEvaluation
Effectiveness
ThesauriOntologies
ClassificationCategorization
Bibliographic information
BibliometricsCitations
RoutingFiltering
Community filtering
Search & search strategyInfo seeking behavior
User modelingFeedback
Info summarizationVisualization
Multimedia streams/structures
Capture/representationCompression/coding
Content-based analysis
Multimedia indexing
Multimediapresentation,
rendering
RE
LA
TE
DT
OP
ICS
DL Curric. Project – Acknowledgements, Info.
• NSF award to VT and UN C-CH
• CS and LIS
• http://curric.dlib.vt.edu/
• http://curric.dlib.vt.edu/wiki/index.php/Main_Page
• Advisory Board, reviewers, field testers5
6
7
8
9
For More Information• Magazine: www.dlib.org• Books: http://fox.cs.vt.edu/DLSB.html (1994)
– MIT Press: Arms, plus by Borgman, Licklider (1965)– Morgan Kaufmann: Witten... (several), Lesk (2nd edition)
• Conferences– ECDL: www.ecdl2008.org– ICADL: www.icadl.org– JCDL: www.jcdl2008.org
• Associations– ASIS&T DL SIG– IEEE TCDL: www.ieee-tcdl.org (student awards, doctoral
consortia)• NSF: www.dli2.nsf.gov• Labs: VT: www.dlib.vt.edu, http://ei.cs.vt.edu/~dlib/ (old)
10
SynchronousScholarly Communication
Same time, Same or different place
11
Asynchronous, Digital Library Mediated Scholarly Communication
Different time and/or place
12
Libraries of the FutureJCR Licklider, 1965, MIT Press
World
Nation
State
City
Community
Computing (flops)Digital content
Com
mun
icat
ions
(ban
dwid
th, c
onne
ctiv
ity)
Locating Digital Libraries in Computing andCommunications Technology Space
Digital Libraries technologytrajectory: intellectualaccess to globally distributed information
less moreNote: we should consider 4 dimensions: computing, communications,content, and community (people)
14
Borgman et al.:Workshop Report onSocial Aspects ofDigital Libraries: http://www-lis.gseis.ucla.edu/DL/
InformationLifeCycle
15
Information Life Cycle
AuthoringModifying
OrganizingIndexing
StoringRetrieving
DistributingNetworking
Retention/ Mining
AccessingFiltering
UsingCreating
16
Digital LibrariesShorten the Chain from
Editor
Publisher
A&I
Consolidator
Library
Reviewer
17
DLs Shorten the Chain to
Author
Reader
Digital
LibraryEditor
Reviewer
Teacher
Learner
Librarian
18
DL Definitions - 1
• “A digital library is an organized and focused collection of digital objects, including text, images, video, and audio, along with methods of access and retrieval, and for selection, creation, organization, maintenance, and sharing of the collection.”
• Witten & Bainbridge – “How to Build a Digital Library” – Morgan Kaufmann 2003
19
DL Definitions - 2
• “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities”
• Waters,D.J. CLIR Issues, July/August 1998• www.clir.org/pubs/issues/issues04.html
20
DL Definitions - 3
• Issues and Spectra
– Collection vs. Institution
– Content vs. System
– Access vs. Preservation
– “Free” vs. Quality
– Managed vs. Comprehensive
– Centralized vs. Distributed
21
DL Definitions - 4
• NOT a “digitized library”• NOT a “deconstruction” of existing
systems and institutions, moving them to an electronic box in a Library
• IS a new way to deal with knowledge– Authoring, Self-archiving, Collecting,– Organizing, Preserving,– Accessing, Propagating, Re-using
22
D ig ita l L ib ra r y C o n te n t
A rtic le s ,R e p o rts,
B o o ks
T e xtD o cum e n ts
S p ee ch ,M u s ic
V id eoA u d io
(A e ria l)P h o tos
G e og rap h icIn fo rm ation
M o d e lsS im u la tio ns
S o ftw a re ,P ro g ra m s
G e no m eH u m a n,a n im a l,
p la n t
B ioIn fo rm ation
2 D , 3 D ,V R ,C A T
Im ag es a ndG ra p h ics
C o nte n tT yp e s
23
Informal 5S & DL Definitions
DLs are complex systems that
• help satisfy info needs of users (societies)
• provide info services (scenarios)
• organize info in usable ways (structures)
• present info in usable ways (spaces)
• communicate info with users (streams)
24
5S LayersSocieties
Scenarios
Spaces
Structures
Streams
25
5Ss
Ss Examples Objectives
Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data
Structures Collection; catalog; hypertext; document; metadata
Specifies organizational aspects of the DL content
Spaces Measure; measurable, topological, vector, probabilistic
Defines logical and presentational views of several DL components
Scenarios Searching, browsing, recommending
Details the behavior of DL services
Societies Service managers, learners, teachers, etc.
Defines managers, responsible for running DL services; actors, that use those services; and relationships among them
26
5S and DL formal definitions and compositions (April 2004 TOIS)
5S
structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)
structural metadataspecification(d.25)
descriptive metadataspecification(d.26)
repository(d. 33)
collection (d. 31)
(d.34)indexingservice
structured stream (d.29)
digitalobject (d.30)
metadata catalog (d.32)
browsingservice
(d.37)
searchingservice (d.35)
digital library(minimal) (d. 38)
services (d.22)
sequence (d. 3)
graph (d. 6)function (d. 2)
measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces
event (d.10)state (d. 18)
hypertext(d.36)
sequence (d. 3)
transmission(d.23)
relation (d. 1) language (d.5)
grammar (d. 7)
tuple (d. 4)*
27
Streams
text
audio
image
video digitalobject
Repository
CollectionCatalog
describes
stores
is_version_of/ cites/links_to
Index
Service
Scenario
event
extends
reuses
ServiceManager
Actor
operationexecutes
participates_in
recipient
runs
Scenarios
Societies
inherits_from/includes
association
uses
Topological
ProbabilisticMetric
Measurable
Measure
describes
employsproduces
employsproduces
employs
produces
Structures
Spaces
Vector
contains
metadata specifications
is_a is_a
precedes
happens_before
is_a
redefinesinvokes
contains
contains
28
Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing
Annotating Classifying Clustering Evaluating Extracting Indexing
Measuring Publicizing
Rating Reviewing (peer)
Surveying Translating
(language)
Conserving Converting
Copying/Replicating Emulating Renewing
Translating (format)
Acquiring Cataloging
Crawling (focused) Describing Digitizing
Federating Harvesting Purchasing Submitting
Preservational Creational
Add Value
Repository-Building
Information Satisfaction
Services
Infrastructure Services
29
Ontology: Applications
30
31
ETANA-DL
• Archaeological DL• Integrated DL
– Heterogeneous data handling
• Applies and extends the OAI-PMH– Open Archives Initiative Protocol for Metadata
Handling
• Design considerations– Componentized– Extensible– Portable
32
ETANA-DL ArchitectureDigBase and DigKit
Lahav
Nimrin
Umayri
Hisban
Megiddo
Jalul
New Sites
DATABASE
WRAPPERS
ETANA-DLUNION
CATALOG
SearchUSER
INTERFACE
Browse
Recommend
Note
Personalize
Review
Visualizations
ArchaeologySpecific
Work in progress
…
33Map courtesy: www.enchantedlearning.com
Initial ETANA-DL Member Locations
Virginia Tech
Mississippi State University
Vanderbilt University
Canadian University College
Walla Walla College
Andrews University
CWRU
Willamette University
34
35
36
Lahav Website
37
Megiddo Opening Screen
38
Locus Screen: Pictures
View all
39
Area Screen
40
41
ETANA-DL Website
42
Marking – writingnotes for
a specific user
Marking Items
43
ETANA-DL Multi-dimensional Browsing
3 new sites
2 new types of artifacts
44
Visual Browsing Nimrin: Topographical Drawings
Full site North west quadrant
Square:N40/W20
45
Visual Browsing Nimrin : Square information
Square:N40/W20
Locus: 86
Loci layout
46
Visual Browsing Bab edh-Dhra'
Cemetery
Pottery # 25
47
Visual Browsing Bab edh-Dhra'
Cemetery
Pottery # 25
48
ETANA Societies
1. Historic and pre-historic societies (being studied)2. Archaeologists (in academic institutes, fieldwork
settings, or local and national governmental bodies)
3. Project directors4. Technical staff (consisting of photographers,
technical illustrators, and their assistants)5. Field staff (responsible for the actual work of
excavation)6. Camp staff (e.g., camp managers, registrars, tool
stewards)7. General public (e.g., educators, learners, citizens)
49
ETANA Societies
• Social issues1. Who owns the finds?
2. Where should they be preserved?
3. What nationality and ethnicity do they represent?
4. Who has publication rights?
5. What interactions took place between those at the site studied, and others? What theories are proposed by whom about this?
50
ETANA Scenarios1. Life in the site in former times2. Digital recording: the planning stage and the excavation stage 3. Planning stage: remote sensing, fieldwalking, field surveys, building
surveys, consulting historical and other documentary sources, and managing the sites and monuments
4. Excavation1. Detailed information is recorded, including for each layer of soil, and for
features such as pole holes, pits, and ditches. 2. Data about each artifact is recorded together with information about its
exact find spot. 3. Numerous environmental and other samples are taken for laboratory
analysis, and the location and purpose of each is carefully recorded. 4. Large numbers of photographs are taken, both general views of the
progress of excavation and detailed shots showing the contexts of finds. 5. Organization and storage of material6. Analysis and hypotheses generation and testing7. Publications, museum displays8. Information services for the general public
51
ETANA Spaces
1. Geographic distribution of found artifacts2. Temporal dimension (as inferred by
archaeologists) 3. Metric or vector spaces
1. used to support retrieval operations, and to calculate distance (and similarity)
2. used to browse / constrain searches spatially
4. 3D models of the past, used to reconstruct and visualize archaeological ruins
5. 2D interfaces for human-computer interaction
52
ETANA Structures
1. Site Organization1. Region, site, partition, sub-partition, locus,
…
2. Temporal orderings (ages, periods)
3. Taxonomies1. for bones, seeds, building materials, …
4. Stratigraphic relationships1. above, beneath, coexistent
53
ETANA Streams
1. successive photos and drawings of excavation sites, loci, unearthed artifacts
2. audio and video recordings of excavation activities and discussions
3. textual reports
4. 3D models used to reconstruct and visualize archaeological ruins.
54
Integrated Integrated CCLINC CCLINC Translingual Information SystemTranslingual Information System
Integrated Integrated CCLINC CCLINC Translingual Information SystemTranslingual Information System
DARPA
Extraction
What is th
e north korean
movement in th
e front li
ne?
CCLINC SERVER
Info Detection
Summarization
It seems that North Korea launch a missile againAfter North Korea launched a Daipodong missilelast month, NK is perceived to proceed to an additionaltest launch. Korea, US and Japan enter into an alertstate, and prepare for a joint response policy. Korea estimates that the additional launch will be on 09/05. Japan estimates that NK’s missile range is short. USinformation says that there is no sign of launch yet.
Translation
What is th
e status of nk
missile la
unch against japan?
BugHanI IlBonE Ddo MiSaIlEul
BalSaHan Deus HaDa
2-w
a yS
pe e
c h T
ran
s ati
on
55
Structured Video Browser(making video into hypermedia)
www.learn.umd.edu
• IBrowse
• Expository multimedia• Narrative Structures
56
MP
EG
-7 Video Library S
ystems T
ech.
ICUInformation and CommunicationUniversity
MPEG-7 Video Library Systems Tech.
Video Data
Description GeneratorDescription Schemes
Design Tool
DescriptionScheme
MetaDatabase
VideoDatabase
Retrieval ServerModule
PlayerP
resentation
Module
Architecture
57
The AMICO Library™
58
Textual information retrieval
Query on Google using Sunset and Rio de Janeiro
Query result
59
Content BasedInformationRetrieval
60
Degree of Structure
Chaotic Organized Structured
Web DLs DBs
61
Digital Objects (DOs)
• Born digital
• Digitized version of “real” object– Is the DO version the same, better, or worse?– Decision for ETDs: structured + rendered
• Surrogate for “real” object– Not covered explicitly in metamodel for a
minimal DL– Crucial in metamodel for archaeology DL
62
Complex to Simple
MARC ($50) Dublin Core (DC)
+thesis
63
OAI – Repository PerspectiveRequired: Protocol
DODO DO DO
MDO
MDO MDOMDOMDO
MDOMDOMDO
64
OAI – Black Box Perspective
OA 1
OA 2
OA 4
OA 3
OA 5OA 6
OA 7
65
DiscoveryCurrent
AwarenessPreservation
Service Providers
Data Providers
Meta
data
harv
estin
g
The World According to OAI
66
Institutional Repositories - 1
• “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.”
• Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA
• www.arl.org/sparc/IR/IR_Guide_v1.pdf
67
Institutional Repositories - 2
• “A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.”
• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7, Feb. 2003, www.arl.org/newsltr/226/ir.html
68
69
70
What is Fedora™?
• Slides courtesy Vinod Chachra of VTLS
Flexible Extensible Digital Object Repository Architecture
71
Fedora™ Digital Object ArchitecturePersistent ID (PID)
Disseminators
System Metadata
EAD, TEI, DC, MARC,
VRA Core, MIX, etc.
Datastreams
Images, E-books, E-journals, Music, Video, etc.
Globally unique persistent id
Public view: access methods for obtaining “disseminations” of digital object content
Internal view: metadata necessary to manage the object
Protected view: content that makes up the “basis” of the object
The Mellon Fedora Project
Adapted from Slide by V. Chachra, VTLS
72
Fedora™Repository
E x ter n a lC o n ten tS o u r c e
E x ter n a lC o n ten tS o u r c e
HT
TP
E x ter n a l C o n ten tR etr iev er
X M L F ile s
Re la t io n a l D B
S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n
P o l icies
U s ers /G ro u p s
H T T P
F T P
D atas tr eam s
D ig ita l O b jec tsS to rag e S u b s ys te m
S e c u rityS u b s ys te m
W e b Se r vi c eE xpo s ur eL aye r
SO
AP
R em o teS er v ic e
L o c alS er v ic e
M an ag e A c c e s s S e arc h O A I P ro v id e r
M an ag e m e n tS u b s ys te m
A c c e s sS u b s ys te m
HT
TP
FT
P
H T T PH T T P S O A P H T T P S O A P H T T P S O A P
C lie n tA pplica t io n
B a tchPro g ra m
S e rv e rA pplica t io n
W e bB ro ws e r
Co mp o n e n t M g mt
O b je c t M g mt
O b je c t Va lid a t io n
P ID Ge n e ra t io n
O b je c t D is s e min a t io n
O b je c t Re fle c t io n
P o lic y En fo rc e me n t
P o lic y M g mt
Co n te n t
Web Service Web Service Exposure Exposure LayerLayer
Adapted from Slide by V. Chachra, VTLS
73
VITAL / Fedora Relationship
74
Annotations
OAI Data
Harvester
EDUCATORS
ADMINISTRATORS LEARNERS
Multilingual Searching
Revising Annotating Filtering Browsing Administering
Filtering Profiles User Profiles
Union Metadata
OAI Data
Provider
Remote and Peer Digital Libraries (eg. NSDL -CIS)
PORTALS
SERVICES
REPOSITORIES
Digital library architecture for localand interoperable CITIDEL services
75
Cluster NDLTD-Computing
76
Example of Union Service: CitiViz
77
78
79
NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup
referenceditems &
collections
referenceditems &
collections
Special Databases
NSDLServicesNSDL
ServicesOther NSDLServices
CI Services
annotation
CI Services
discussion
CI Services
personalization
CI Services
authentication
CI Services
browsing
Core Services:information retrieval
Core Collection-Building Services
harvesting
Core Collection-Building Services
protocols
Core Services:metadata gathering
Portals &ClientsPortals &
ClientsPortals &Clients
Usage Enhancement
Collection Building
User Interfaces
NSDLCollections
NSDLCollections
NSDLCollections
CoreNSDL“Bus”
A Digital Library Case Study
• Domain: graduate education, research
• Genre:ETDs=electronic theses & dissertations
• Submission: http://etd.vt.edu
• Collection: http://www.theses.org
Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org
Student Gets CommitteeSignatures and Submits ETD
Signed
Grad School
Library Catalogs ETD, Access isOpened to the New Research
WWW
NDLTD
83
QuickTime™ and aCinepak decompressor
are needed to see this picture.
http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/
84
85Repository1
DL1
Repository2
Union Catalog
Union Repository
Catalog1 Catalog2
Searching
Union DL DL2
archaeologists
Society
General Public
Society
ArchaeologistsGeneral Public
Union Society
ServiceBrowsingService
Union Service
Harvesting, Mapping,Searching, Browsing,
Clustering, Visualization
Architecture of a Union DL
86
Union Catalog Integration
VN MetadataFormat
Global MetadataFormat
VNCatalog
HDCatalog
Union Catalog
MappingTool
Wrapper
MappingTool
Wrapper
HD MetadataFormat
Virtual Nimrin(VN)
Halif DigMaster(HD)
Union ArchDL
87
Mapping confirmation
Mapping history
88
89
90
91
92
93
Conclusions• Digital libraries integrate multimedia, hypertext, and
information access into a unified framework.• The 5S theory helps with analysis, specification,
system development, implementation, assessment, and refinement.
• We provide services atop repositories that include digital objects and a catalog of metadata objects. Examples include archaeology and education.
• Integration extends to distributed sites, including heterogeneous systems where schema mapping as well as union services are needed.
• There is worldwide benefit in all areas.