Digital libraries,computer science, and
education:10 projects
Edward A. [email protected]
Department of Computer ScienceVirginia Tech (VPI&SU)Blacksburg, Virginia, USA
http://fox.cs.vt.edu
Virginia Tech
• Digital Library Research Laboratory• Center for Human-Computer Interaction• College of Engineering• System X, Terascale Computing Facility: largest
academic supercomputer, for only $5M• Largest university in Virginia (26K students)• Ongoing collaboration with TUD, especially with
Mechanical Engineering• Prior visit from CS by Deborah Tatar, Adrian
Sandu to TUD -> collab. between VT, IPSI
Acknowledgements
• Students
• Faculty, Staff
• Collaborators
• Support
• Mentors
Acknowledgements: Students
• Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Gonçalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, Qinwei Zhu, …
Acknowledgements: Faculty, Staff
• Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …
Other Collaborators (Selected)
• Brazil: FUA, UFMG, UNICAMP• Case Western Reserve University• Emory, Notre Dame, Oregon State• Germany: Univ. Oldenburg• Mexico: UDLA (Puebla), Monterrey• College of NJ, Hofstra, Penn State, Villanova• University of Arizona• University of Florida, Univ. of Illinois• University of Virginia• VTLS (slides on digital repositories, NDLTD)
Acknowledgements: Sponsors
• ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, IEEE, Microsoft, NASA, NDLTD, NLM, OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed., VTLS
• NSF (IIS-9986089, 0086227, 0080748, 0325579; ITR-0325579; DUE-0121679, 0136690, 0121741, 0333601)
Outline
• More information• Information life cycle• Digital libraries• DL curriculum• DL textbook• 5S• ETANA• OCKHAM• CITIDEL, NSDL
• GetSmart• Networked Digital
Library of Theses and Dissertations (NDLTD)
• Quality, Metasearch• Stepping Stones and
Pathways• Personalization,
SenseCam, SI
For More Information• Magazine: www.dlib.org• Books: http://fox.cs.vt.edu/DLSB.html (1994)
– MIT Press: Arms, plus by Borgman, Licklider (1965)– Morgan Kaufmann: Witten... (several), Lesk (2nd edition)
• Conferences– ECDL: www.ecdl2005.org– ICADL: http://icadl2004.sjtu.edu.cn– JCDL: www.jcdl2005.org
• Associations– ASIS&T DL SIG; DELOS– IEEE TCDL: www.ieee-tcdl.org (student awards, doctoral
consortia)• NSF: www.dli2.nsf.gov• Labs: VT: www.dlib.vt.edu, http://ei.cs.vt.edu/~dlib/
Information Life Cycle
AuthoringModifying
OrganizingIndexing
StoringRetrieving
DistributingNetworking
Retention/ Mining
AccessingFiltering
UsingCreating
Computing (flops)Digital content
Com
mun
icat
ions
(ban
dwid
th, c
onne
ctiv
ity)
Locating Digital Libraries in Computing andCommunications Technology Space
Digital Libraries technologytrajectory: intellectualaccess to globally distributed information
less moreNote: we should consider 4 dimensions: computing, communications,content, and community (people)
D ig ita l L ib ra r y C o n te n t
A rtic le s ,R e p o rts,
B o o ks
T e xtD o cum e n ts
S p ee ch ,M u s ic
V id eoA u d io
(A e ria l)P h o tos
G e og rap h icIn fo rm ation
M o d e lsS im u la tio ns
S o ftw a re ,P ro g ra m s
G e no m eH u m a n,a n im a l,
p la n t
B ioIn fo rm ation
2 D , 3 D ,V R ,C A T
Im ag es a ndG ra p h ics
C o nte n tT yp e s
Digital Libraries --- Objectives
• World Lit.: 24hr / 7day / from desktop• Integrated “super” information systems: 5S:
Table of related areas and their coverage• Ubiquitous, Higher Quality, Lower Cost • Education, Knowledge Sharing, Discovery• Disintermediation -> Collaboration • Universities Reclaim Property• Interactive Courseware, Student Works• Scalable, Sustainable, Usable, Useful
Digital Library (DL) Challenges
• Preservation - so people with trust DLs
• Supporting infrastructure - networks, ...
• (Semantic) interoperability
• DL industry - critical mass by covering libraries, archives, museums, corporate info, govt info, personal info - “quality WWW” integrating DB, HCI, HT, IR, MM, networking, ...
– Need tools/methods to make building them easier
DL Challenges – 2: Terminology
• Digital / electronic / virtual library
• Born digital, hybrid (digital/physical)
• Universal access (all people/places/times)– Accommodate disabilities (color, visual, auditory)– Mobile (office, home, laptop, PDA, mobile)
• Archiving, self-archiving
• Open (source, standards, archives)
Outline
• More information• Information life cycle• Digital libraries• DL curriculum• DL textbook• 5S• ETANA• OCKHAM• CITIDEL, NSDL
• GetSmart• Networked Digital
Library of Theses and Dissertations (NDLTD)
• Quality, Metasearch• Stepping Stones and
Pathways• Personalization,
SenseCam, SI
How to organize a DL course?
• Various frameworks– What, Why, How– History, Current status, Future (research)– Economics: open source, sustainability– Social: users/patrons, management– Technical: DB, HCI, HT, IR, LIS, MM, Web
• Suggest that concept maps be drawn by readers to help in working with this book
• Instructors can access “expert” maps with IHMC tools
CC2001 Information Management Areas
IM1. Information models and systems*
IM8. Distributed DBs
IM2. Database systems* IM9. Physical DB design
IM3. Data modeling* IM10. Data mining
IM4. Relational DBs IM11. Information storage and retrieval
IM5. Database query languages
IM12. Hypertext and hypermedia
IM6. Relational DB design IM13. Multimedia information & systems
IM7. Transaction processing IM14. Digital libraries
* Core components
DL Curriculum FrameworkSemester 1:
DL collections:development/creation
Semester 2:DL services and
sustainability
CO
UR
SE
ST
RU
CT
UR
E
DigitizationStorage
Interchange
Digital objectsCompositesPackages
MetadataCataloging
Author submission
NamingRepositories
Archives
Spaces(conceptual,geographic,2/3D, VR)
Architectures(agents, buses,
wrappers/mediators)Interoperability
Services(searching,
linking, browsing, etc.)
Intellectual property rights mgmt.
PrivacyProtection (watermarking)
Archiving and preservation
Integrity
Architectures(agents, buses,
wrappers/mediators)Interoperability
CO
RE
DL
TO
PIC
S
DocumentsE-publishing
Markup
Info. NeedsRelevanceEvaluation
Effectiveness
ThesauriOntologies
ClassificationCategorization
Bibliographic information
BibliometricsCitations
RoutingFiltering
Community filtering
Search & search strategyInfo seeking behavior
User modelingFeedback
Info summarizationVisualization
Multimedia streams/structures
Capture/representationCompression/coding
Content-based analysis
Multimedia indexing
Multimediapresentation,
rendering
RE
LA
TE
DT
OP
ICS
Book Parts – Fox & Goncalves
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
• Part 2 – Higher DL Constructs
• Part 3 – Advanced Topics
• Appendix
Book Parts and Chapters - 1
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
Book Parts and Chapters - 2
• Part 2 – Higher DL Constructs– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
Book Parts and Chapters - 3
• Part 3 – Advanced Topics– Ch. 13: Quality– Ch. 14: Integration– Ch. 15: How to build a digital library– Ch. 16: Research Challenges, Future Perspectives
• Appendix– A: Mathematical preliminaries– B: Formal Definitions: Ss – C: Formal Definitions: DL terms, Minimal DL– D: Formal Definitions: Archeological DL– E: Glossary of terms, mappings
Informal 5S & DL Definitions
DLs are complex systems that
• help satisfy info needs of users (societies)
• provide info services (scenarios)
• organize info in usable ways (structures)
• present info in usable ways (spaces)
• communicate info with users (streams)
5S LayersSocieties
Scenarios
Spaces
Structures
Streams
5Ss
Ss Examples Objectives
Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data
Structures Collection; catalog; hypertext; document; metadata
Specifies organizational aspects of the DL content
Spaces Measure; measurable, topological, vector, probabilistic
Defines logical and presentational views of several DL components
Scenarios Searching, browsing, recommending
Details the behavior of DL services
Societies Service managers, learners, teachers, etc.
Defines managers, responsible for running DL services; actors, that use those services; and relationships among them
Hypotheses
• A formal theory for DLs can be built based on 5S.
• The formalization can serve as a basis for modeling and building high-quality DLs.
Research Questions1. Can we formally elaborate 5S?
2. How can we use 5S to formally describe digital libraries?
3. What are the fundamental relationships among the Ss and high-level DL concepts?
4. How can we allow digital librarians to easily express those relationships?
5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties?
6. Where in the life cycle of digital libraries can key aspects of quality be measured and how?
5S and DL formal definitions and compositions (April 2004 TOIS)
5S
structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)
structural metadataspecification(d.25)
descriptive metadataspecification(d.26)
repository(d. 33)
collection (d. 31)
(d.34)indexingservice
structured stream (d.29)
digitalobject (d.30)
metadata catalog (d.32)
browsingservice
(d.37)
searchingservice (d.35)
digital library(minimal) (d. 38)
services (d.22)
sequence (d. 3)
graph (d. 6)function (d. 2)
measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces
event (d.10)state (d. 18)
hypertext(d.36)
sequence (d. 3)
transmission(d.23)
relation (d. 1) language (d.5)
grammar (d. 7)
tuple (d. 4)*
Streams
text
audio
image
video digitalobject
Repository
CollectionCatalog
describes
stores
is_version_of/ cites/links_to
Index
Service
Scenario
event
extends
reuses
ServiceManager
Actor
operationexecutes
participates_in
recipient
runs
Scenarios
Societies
inherits_from/includes
association
uses
Topological
ProbabilisticMetric
Measurable
Measure
describes
employsproduces
employsproduces
employs
produces
Structures
Spaces
Vector
contains
metadata specifications
is_a is_a
precedes
happens_before
is_a
redefinesinvokes
contains
contains
Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing
Annotating Classifying Clustering Evaluating Extracting Indexing
Measuring Publicizing
Rating Reviewing (peer)
Surveying Translating
(language)
Conserving Converting
Copying/Replicating Emulating Renewing
Translating (format)
Acquiring Cataloging
Crawling (focused) Describing Digitizing
Federating Harvesting Purchasing Submitting
Preservational Creational
Add Value
Repository-Building
Information Satisfaction
Services
Infrastructure Services
Ontology: Applications
Composition of key fundamental / infrastructure services
Ic
Acquiring
universalcollection
C
DMCIndexing
DescribingCataloguing
Linking
Hypertext
Submitting
AuthoringDigitizing
doi
mskjp
p
e
e
describes
p
p
p
e
e
p
e
p
SearchingBrowsing
queryanchor
Society
actor
Collection, {digital object}
Recommending Filtering Binding Visualizing Expanding query
user model query/category {digital object}
{digital object} {digital object}
binder
InformationSatisfaction Services
space query’
fundamental
Rating Training
Infrastructure
Services (Add_Value)
composite
Requesting
handle
p pp
e e e{(digital object, actor, rate) }
p
e
e
p p p p p
e e
classifier
e ee e
e
p
e
Indexing
Index
p
e
transformer
e
Requirements Analysis Design Implementation Test
5S 5SLOO ClassesWorkflow Components
DLEvaluation
5SGraph 5SLGenFormalTheory/Metamodel
DL XMLLog
ETANA-DL
• Archaeological DL• Integrated DL
– Heterogeneous data handling
• Applies and extends the OAI-PMH– Open Archives Initiative Protocol for Metadata
Handling
• Design considerations– Componentized– Extensible– Portable
Lahav Website
Megiddo Opening Screen
Locus Screen: Pictures
View all
Area Screen
ETANA-DL Approach• Applying and extending Digital Library (DL)
techniques to solve key problems: making primary data available, data preservation, and interoperability
• Modeling archaeological information systems using 5S to better understand the domain and design the system and the supporting services
• Rapidly prototyping DLs that handle heterogeneous archaeological data using componentized frameworks:– eliciting requirements– refining metamodel and union schema– modeling sites– mapping– harvesting– providing useful services
ETANA-DL Website
Marking – writingnotes for
a specific user
Marking Items
Marked Items Display
Sender, Date,Object OAI ID
SenderComments
Options:View Record,
Add record to Items Of Interest,Re-mark item (Redirect),
Unmark item (Remove item from list)
Discussions Page
Discussions about an
object
View/Post messages, create new
threads
Recommendations
Items recommendedon the basis of
similar interests
ETANA-DL Searching ServiceSearch
ETANA-DL Multi-dimensional Browsing
3 new sites
2 new types of artifacts
ETANA-DL Visual Browsing Service
Visual BrowseBy site
Visual Browsing Nimrin: Topographical Drawings
Full site North west quadrant
Square:N40/W20
Visual Browsing Nimrin : Square information
Square:N40/W20
Locus: 86
Loci layout
Visual Browsing Nimrin : locus sheet
Visual Browsing Bab edh-Dhra'
Cemetery
Pottery # 25
Visual Browsing Bab edh-Dhra'
Cemetery
Pottery # 25
ETANA Societies
1. Historic and pre-historic societies (being studied)2. Archaeologists (in academic institutes, fieldwork
settings, or local and national governmental bodies)
3. Project directors4. Technical staff (consisting of photographers,
technical illustrators, and their assistants)5. Field staff (responsible for the actual work of
excavation)6. Camp staff (e.g., camp managers, registrars, tool
stewards)7. General public (e.g., educators, learners, citizens)
ETANA Societies
• Social issues1. Who owns the finds?
2. Where should they be preserved?
3. What nationality and ethnicity do they represent?
4. Who has publication rights?
5. What interactions took place between those at the site studied, and others? What theories are proposed by whom about this?
ETANA Scenarios1. Life in the site in former times2. Digital recording: the planning stage and the excavation stage 3. Planning stage: remote sensing, fieldwalking, field surveys, building
surveys, consulting historical and other documentary sources, and managing the sites and monuments
4. Excavation1. Detailed information is recorded, including for each layer of soil, and for
features such as pole holes, pits, and ditches. 2. Data about each artifact is recorded together with information about its
exact find spot. 3. Numerous environmental and other samples are taken for laboratory
analysis, and the location and purpose of each is carefully recorded. 4. Large numbers of photographs are taken, both general views of the
progress of excavation and detailed shots showing the contexts of finds. 5. Organization and storage of material6. Analysis and hypotheses generation and testing7. Publications, museum displays8. Information services for the general public
ETANA Spaces
1. Geographic distribution of found artifacts2. Temporal dimension (as inferred by
archaeologists) 3. Metric or vector spaces
1. used to support retrieval operations, and to calculate distance (and similarity)
2. used to browse / constrain searches spatially
4. 3D models of the past, used to reconstruct and visualize archaeological ruins
5. 2D interfaces for human-computer interaction
ETANA Structures
1. Site Organization1. Region, site, partition, sub-partition, locus,
…
2. Temporal orderings (ages, periods)
3. Taxonomies1. for bones, seeds, building materials, …
4. Stratigraphic relationships1. above, beneath, coexistent
ETANA Streams
1. successive photos and drawings of excavation sites, loci, unearthed artifacts
2. audio and video recordings of excavation activities and discussions
3. textual reports
4. 3D models used to reconstruct and visualize archaeological ruins.
Repository
Catalog
DatabaseSearching
and Browsing
Archaeologists
Society
Service
Lahav
Repository
Catalog
DatabaseSearching
and Browsing
Archaeologists
Society
Service
Madaba
Repository
Catalog
DatabaseSearching
and Browsing
Archaeologists
Society
Service
Megiddo
Repository
Catalog
DatabaseSearching
and Browsing
Archaeologists
Society
Service
Umayri
…
Member DLs of ETANA-DL
ETANA-DL: a Union DL
Union Catalog
Union Repository
ArchaeologistsGeneral Public
Union Society
Union Services
Harvesting, Mapping,Searching, Browsing, Recommendation,
Annotation, Object comparison, Object SharingBlinding, Visualization
Site Artifact Type Original data sourceNumber of
records harvested
Bab edh-Dhra’ Pottery cp6 database file 786
Lahav Figurine Tab-delimited text file 563
Madaba Locus field record Tables in Access DB 786
Mozan Publication PDF files 19
Nimrin
Bone field record Table in Oracle DB 7419
Seed field record Table in Oracle DB 429
Locus field record Table in Oracle DB 2101
Umayri Bone field record 2 tables in Access DB 2122
Total 18404
Heterogeneous data handling
5S MetaModel
5SGraphDL
Expert
DL Designer
5SL DL
Model
5SGen
Practitioner
Researcher
TailoredDL
Services
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Requirements (1) Analysis (2)
Implementation (4)
Design (3)
5SGraph 5SGen
Mapping Tool
5SSuite
Automation of DL Generation
5SGraph
Structure Sub-model
5S MetaModel
Union ServicesDescriptions
HarvestingMapping
SearchingBrowsing
…
Scenario Sub-model
Local Schema Union DL Schema
5SGraph
Mapping Tool
DL Designer
DL Expert DL Designer
Local data
Globaldata
UnionCatalog
5SGen
ComponentPool
Browsing…
Tailored Union services5SGen
Wrapper
Mapping Tool
5SSuite
5SGraph Mapping Tool
5SGen
5SGraph: Structure model
Local SchemaGlobal Schema
Digital Object
RepositoryCollection Minimal DL
Metadata Catalog
Descriptive Metadata
Specification
A Minimal DL in the 5S Framework
Structural Metadata
Specification
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
Descriptive Metadata
specification
SpaTemOrg
StraDia
Arch Descriptive Metadata specification
ArchDO
ArchObj
ArchColl
Arch Metadata catalog
ArchDColl ArchDR Minimal ArchDL
A Minimal ArchDL in the 5S Framework
Local Schema Global Schema
Mapping list
Initial set of mappings for flint tool based on rules and name-based matching
5SGen and Component Pool
Searching Multi-dimensional browsing Integrated searching and browsing Visualization
Exploring Services
Example of Union Service: CitiViz
EtanaViz: Percentages of Animal Bones
across Nimrin Culture Phrases
Outline
• More information• Information life cycle• Digital libraries• DL curriculum• DL textbook• 5S• ETANA• OCKHAM• CITIDEL, NSDL
• GetSmart• Networked Digital
Library of Theses and Dissertations (NDLTD)
• Quality, Metasearch• Stepping Stones and
Pathways• Personalization,
SenseCam, SI
OCKHAM Library Network
NSDL
OCKHAM
Services
NSDLServices
Teachers LearnersLibrarians
OCKHAMLibrary
Network
LibraryServices
OCKHAM
• Simplicity (a la OCCAM’s razor)
• Support by Mellon and DLF
• Four main ideas:
1. Components
2. Lightweight protocols
3. Open reference models (e.g., 5S, OAIS)
4. Community perspective and involvement
• Funded by NSF in NSDL, with P2P
Lightweight Protocols
• “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive.
• Successes of protocols considered lightweight is illuminating.
• Examples: TCP/IP, HTTP, LDAP, and the OAI PMH
OCKHAM Proposed Services
• Alerting
• Browsing
• Cataloging
• Conversion
• OAI – Z39.50
• Pathfinding
• Registry
Computing and Information Technology Interactive Digital Educational Library (CITIDEL)
• Domain: computing / information technology
• Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), …
• Submission & Collection: sub/partner collections www.citidel.org
www.CITIDEL.org
• Led by Virginia Tech, with co-PIs:– Fox (director, DL systems)– Lee (history)– Perez (user interface, Spanish
support)
• Partners– College of New Jersey (Knox)– Hofstra (Impagliazzo)– Villanova (Cassel)– Penn State (Giles)
Union Metadata Repository
OAI Data
Provider
Laboratories Repository
Applets Repository
Papers Repository
Syllabi Repository
. . .
Digital Library Services
OAI Data
Harvester
Distributed repository structure
CITIDEL: Computing & Information Technology Interactive Digital Education Library
CITIDEL Technology Features•Component architecture (Open Digital Library)
•Re-use and compose re-deployable digital library components.
•Built Using Open Standards & Technologies
•OAI: Used to collect DL Resources and DL Interoperability
•XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …)
•Perl: Component Integration
•ESSEX: Search Engine Functionality
•Very fast, utilizing in-memory processing
•Includes snap-shots for persistence
•Multi-scheming
•Integrates multiple classifications / views through maps, closure
CITIDEL + PIPE
• Adds Interaction Personalization to CITIDEL
•Automatically handles multi-modal conversion to Cell phone, PDA, Etc.
•Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.
CITIDEL -> NSDL
• A collection project in the
• National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL
• National Science Digital Library
• www.nsdl.org
NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup
referenceditems &
collections
referenceditems &
collections
Special Databases
NSDLServicesNSDL
ServicesOther NSDLServices
CI Services
annotation
CI Services
discussion
CI Services
personalization
CI Services
authentication
CI Services
browsing
Core Services:information retrieval
Core Collection-Building Services
harvesting
Core Collection-Building Services
protocols
Core Services:metadata gathering
Portals &ClientsPortals &
ClientsPortals &Clients
Usage Enhancement
Collection Building
User Interfaces
NSDLCollections
NSDLCollections
NSDLCollections
CoreNSDL“Bus”
GetSmart
• Let by Hsinchun Chen at U. of Arizona for NSDL• Concept Maps for Students and Instructors to
help with learning– Notes attached to nodes– URLs attached to nodes
• Integration with meta-searching• Record keeping for individuals and groups of
students• Similar system to Cmap tools from IHMC• Dissertation on English-Spanish ETDs, and
summarization using concept maps
Outline
• More information• Information life cycle• Digital libraries• DL curriculum• DL textbook• 5S• ETANA• OCKHAM• CITIDEL, NSDL
• GetSmart• Networked Digital
Library of Theses and Dissertations (NDLTD)
• Quality, Metasearch• Stepping Stones and
Pathways• Personalization,
SenseCam, SI
A Digital Library Case Study
• Domain: graduate education, research
• Genre:ETDs=electronic theses & dissertations
• Submission: http://etd.vt.edu
• Collection: http://www.theses.org
Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org
Student Gets CommitteeSignatures and Submits ETD
Signed
Grad School
Library Catalogs ETD, Access isOpened to the New Research
WWW
NDLTD
Key Aspects
• NDLTD incorporated as non-profit corporation, with international board of directors, and annual conference– Berlin, Kentucky, Sydney, Montreal, …
• Scirus provides full-text search service atop over 200K documents harvested in diverse languages from hundreds of member universities
• Next: multimedia works, other services
QuickTime™ and aCinepak decompressor
are needed to see this picture.
http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/
OCLC SRU Interface
ETD Union Search Mirror Site in China (CALIS)(http://ndltd.calis.edu.cn – popular site!)
Language = German; hits = 137
Outline
• More information• Information life cycle• Digital libraries• DL curriculum• DL textbook• 5S• ETANA• OCKHAM• CITIDEL, NSDL
• GetSmart• Networked Digital
Library of Theses and Dissertations (NDLTD)
• Quality, Metasearch• Stepping Stones and
Pathways• Personalization,
SenseCam, SI
Describing Quality inDigital Libraries
• What’s a “good” digital Library?– Central Concept: Quality!– Hypotheses of this work:
• Formal theory can help to define “what’s a good digital library” by:
• New formalizations of quality indicators for DLs within our 5S framework
• Contextualizing these measures within the Information Life Cycle
Quality DimensionsDL Concept Dimensions of Quality Digital object Accessibility
Pertinence Preservability Relevance Similarity Significance Timeliness
Metadata specification Accuracy Completeness Conformance
Collection Completeness Impact Factor
Catalog Completeness Consistency
Repository Completeness Consistency
Services Composability Efficiency Effectiveness Extensibility Reusability Reliability
Examples of DL Quality Concepts, Dimensions, and Measures
DL Concept
Dimensions of Quality
Factors in Measuring
Digital Object AccessibilityTimeliness
Collection, no. of structured streams, rights management metadata, actor Storage time; creation time; modification time; access time
Structural Metadata Specification
Accuracy Completeness
Accurate attributes, no. of attributes in the record Missing attributes, schema size
Descriptive Metadata Specification
Appropriateness Accuracy, Completeness, Conformance
Collection Completeness Impact Factor
Collection size; size of the “ideal collection” Size of the collection; number of citations
Metadata Catalog
Completeness Validity
No. of digital objects without a metadata spec; size of the corresponding collectionNo. of invalid metadata specs; catalog size
Repository Consistency No. of collections in repository
Services ConsistencyEffectivenessReusability
Scenario paths; log entriesPrecision/recall (search); F1 measure (classification), etc.No. of reused services; no. of services in the DL; no of lines of code per service manager
Metadata Specifications and Metadata Format: Completeness
• OCLC NDLTD Union catalog
00. 10. 20. 30. 40. 50. 60. 70. 80. 9
1
GWUD LSU
VTET
D
MIT
UBC
PHYS
NET
VTIN
DIV
VAND
ERBI
LT
NCSU
USAS
K
PITT HKU
HUMB
OLT
OCLC
BGMY
U
DRES
DEN
VIEN
NA
GATE
CH
ETSU USF
MUEN
CHEN
UTEN
N
CCSD
WATE
RLOO
NSYS
U
LAVA
L
UPSA
LLA
CALT
ECH
UCL
WagU
niv
Metadata Specifications and Metadata Format: Conformance
• Based on ETD-MS
0. 75
0. 8
0. 85
0. 9
0. 95
1
GW
UD
LSU
VTET
D
MIT
UBC
PHYS
NET
VTIN
DIV
VAN
DER
BILT
NC
SU
USA
SK
PITT HKU
HU
MBO
LT
OC
LC
BGM
YU
DR
ESD
EN
VIEN
NA
GAT
ECH
ETSU
USF
MU
ENC
HEN
UTE
NN
CC
SD
WAT
ERLO
O
NSY
SU
LAVA
L
UPS
ALLA
CAL
TEC
H
UC
L
Wag
Uni
v
AuthoringModifying
OrganizingIndexing
Storing
Archiving
NetworkingAccessing
Filtering
Creation
DistributionUtilization
Significance
Similarity
Pertinence
AccuracyCompletenessConformance
Seeking
SearchingBrowsingRecommending
Relevance
Timeliness
Accessibility
Accessibility
Inactive
Active
Discard
RetentionMining
Semi-Active
Preservability
Timeliness
Preservability
Describing
Quality and the Information Life Cycle
SS1 SS2 SSn
META SEARCH SYSTEM
RESULTS
Pre-Scoring Engine(with standard pre-scoring system)
Pre-scoring engine configuration
Pre-scorin
g Plug
in(S
eon
ho K
im’s Inference
system)
Pre-scorin
g Plug
in(W
ensi Ki’s Infe
rence syste
m)
Record Attributes
Records with Quality Metadata
Records with Quality Metadata
Lucene Indexing Engine
Indexed Records
Outline
• More information• Information life cycle• Digital libraries• DL curriculum• DL textbook• 5S• ETANA• OCKHAM• CITIDEL, NSDL
• GetSmart• Networked Digital
Library of Theses and Dissertations (NDLTD)
• Quality, Metasearch• Stepping Stones
and Pathways• Personalization,
SenseCam, SI
parallel distributedType a two-topic query here: inand
parallel
message passing
shared me mory
distributed
End Stone 1End Stone 1
Stepping Stone3Stepping Stone3
Stepping Stone 1Stepping Stone 1 End Stone 2End Stone 2
ppaa
tthh
wwaa
yy
1
2
3
4
5
A comparative study of distributed shared memory system design Issues
Implementing object-based distributed shared memory
Update protocols and cluster based shared memory
Towards OpenMP execution on software distributed shared memory systems
Is OpenMP for grids?
Differences between distributed and parallel sy stems
Document TitleId
4
5
2
3
1
Stepping Stone2Stepping Stone2
communication overhead
6
6
Collection Query Document TitleOperating System
End Stone1 Message passing Implementing Object-based Distributed Shared MemoryStepping
Stonearea networks
End Stone2 Remote procedure call
A Causally Consistent Protocol for Distributed Shared Memory
Data Mining
End Stone1 pattern discovery Discovery of Interesting Usage Patterns from Web DataStepping
Stoneapplication of data mining
End Stone2 personalization Discovery of Aggregate Usage Profiles for Web Personalization
Information Retrieval
End Stone1 expansion a probabilistic model of information retrieval development and statusStepping
Stoneprobabilistic model
End Stone2 modeling an information retrieval logic model implementation and experiments
Description Sub-query 1 Sub-query2
Salmon dams Pacific northwest
What harm have power dams in the Pacific northwest caused to salmon fisheries?
Northwest, pacific, dam, fish, river, Oregon, specie, spawn, ocean, water, California, marine, Idaho, habitat, wildlife
salmon, fishery, bycatch
quilts, income
In what ways have quilts been used to generate income?
quilt, deduct, median, jacket
income, family, tax, taxpay, house, household, percent, person, low, trust, children, high, invest, rate
Lyme disease arthritis
What evidence is there to link tick-borne Lyme disease with arthritis?
disease, arthritis, patient, rheumatoid, drug, research, health, infect, medical, cancer, treatment, clinic, immunize, blood, cell, symptom, doctor
lyme, diabetes, biotechnology
tourists, violence
Here are tourists likely to be subjected to acts of violence causing bodily harm or death?
violence, Egypt, foreign, military, Egyptian, Islam, police, destiny, kill, industry
Tourist, tourism, visitor, hotel, travel, attack, tour
Personalization of Content:Personalization of Content:Bridging the Gap BetweenBridging the Gap Between
NSDL and its UsersNSDL and its Users
Dr. Manuel A. Pérez-Quiñones†, Dr. Edward Fox†, Dr. Lillian Cassel††, Dr. Patrick Fan †
†Virginia Tech, Blacksburg, VA†† Villanova University, Villanova, PA
Problem Statement
• The NSDL needs to be integrated into the current pedagogical practices of educators and students.
• As of April 2005, 331 out of 406 NSDL collections gather no information about the user, thus personalization is not currently a possibility for most of the NSDL.
• We will explore how to provide personalized content for instructors and students based on the context provided by the course website.
Bringing NSDL to the users
• The goal of this project is to get NSDL content closer to its
intended audience by designing, implementing, and evaluating
the integration of its content with a course course management
system using personalization techniques.
• We will conduct studies with instructors and students to identify
how NSDL resources are used in class planning or class
activities.
• We will prototype interfaces using some of the leading CMS
and utilizing standard APIs for communication between the
CMS and the NSDL collections.
Extending a CMS
For the instructor
• Provide textbook recommendations, based on the textbook being used in the course and on typical textbooks used in other similar courses.
• Show similar syllabus to the course being shown in the CMS.
• Suggest demos, visualizations, etc. that match some of the topics covered in the course.
• Allow the instructor to define parameter to configure a genetic algorithm will recommend supplementary material to the students.
For the students
• Show other online resources (e.g. extra readings) at a particular knowledge unit level for his/her course.
NSDL Recommendation
•Alternative Textbooks
•Slides, code, labs at Publishers Site
•Tutorials on Eclipse at w3schools.org
•Papers on Teaching CS2 in the ACM DL
•Syllabi for similar courses
SenseCam Project
• Microsoft MyLifeBits RFP, award to Perez and Fox starting 2006
• Near continuous capture of photos, audio, GPS, temperature
• Integration with laptop daily, for email, calendar, and other activities
• Two particular types of users– Veterinary students– Students with various disabilities
Superimposed Information
• Another project with NSDL
• Lois Delcambre, David Maier, Lillian (Boots) Cassel (Portland State, Villanova)
• Middleware for “marks”
• Integrate with various knowledge management systems: wiki, Cmap tools, SIMPEL (multimedia presentation)
• Allow management of educational materials at the sub-document level
Outline
• More information• Information life cycle• Digital libraries• DL curriculum• DL textbook• 5S• ETANA• OCKHAM• CITIDEL, NSDL
• GetSmart• Networked Digital
Library of Theses and Dissertations (NDLTD)
• Quality, Metasearch• Stepping Stones and
Pathways• Personalization,
SenseCam, SI
Selected Links - http://fox.cs.vt.edu• CITIDEL (computing education resources)
– www.citidel.org• NCSTRL (computing technical reports)
– www.ncstrl.org• NDLTD (electronic theses and dissertations
worldwide)– www.ndltd.org and etdguide.org
• NSDL (National Science Digital Library)– www.nsdl.org
• OAI (Open Archives Initiative)– www.openarchives.org
• Virginia Tech Digital Library Research Laboratory (DLRL, www.dlib.vt.edu)
Questions?Discussion?
Thank You!