digital object architecture: an advanced architecture for managing digital information presentation...
TRANSCRIPT
Digital Object Architecture: an Advanced Architecture for Managing Digital Information
Presentation by
Robert E. Kahn
President & CEO
Corporation for National Research Initiatives
WSIS Forum 2011May 19, 2011
Origins of the Internet
• Multiple Different Packet Networks
• Open Architecture
• Implemented via the TCP/IP Protocols
• Standards Processes
• Sustained Research Support
• Eventually resulting in– Commercialization– Widespread Dissemination– Global Acceptance
Three Initial Networks
• DARPA originally funded three seminal packet networks – ARPANET, Packet Radio, Packet Satellite
• The Internet came about from a desire to enable users and their computers to communicate efficiently, independent of the network they were using
• Initial challenges were in areas such as:– Addressing– Routing– Congestion Control– Host Protocols
• Addressing (16 bits to the wire, 32 bit IPv4 addresses; later -- 128 bit IPv6 addresses, URLs)
Key Initial Decisions
• Global Addresses (IP) freed us from ARPANET addressing of the wires
• Gateways introduced for IP routing and for Network “Impedance Matching” – now called routers
• TCP dealt with network-related concerns– different packet sizes, duplicates, error
detection, losses due to tunnels, mountains, jamming, etc.
• Enabled separate network administration• Global information system based on an open
architecture
From Packet Communication to Information Management
• The Internet did not start out with a primary goal of assisting users in managing information.
• Fast, efficient, reliable, global connectivity was the main goal– Information management was limited to ensuring proper
information flows in the Internet– The World Wide Web was an important step in simplifying
user access to information– Other alternatives are now emerging.
• We now present an open architecture approach to information management that– Makes use of existing Internet capabilities– allows different types of information management systems
to be developed and interoperate.
Digital Object Architecture
• To reformulate the Internet architecture to focus more specifically on managing information rather than just communicating bits
• Making use of its world-wide connectivity, but independent of current technology choices
• Enabling existing and new types of information to be reliably managed and accessed in the Internet environment, including over very long periods of time
• Providing mechanisms to stimulate dynamic new forms of expression and to manifest older forms
• Support for multi-lingual identifier names in most native/local scripts
• While supporting privacy, security, intellectual property protection, managed access and well-formed business practices
Digital Object Architecture
• Technical Components– Digital Objects (DOs)
• Structured data with a unique persistent identifier
– Resolution of the Unique Identifiers• To “state information” about the DOs
– Repositories• To deposit DOs• To access DOs with security
– Registries• To create and store metadata• For secure searching
Digital Object Architecture
Client
Resource Discovery•Metadata Registries in lieu of traditional
•Search Engines•Metadata Databases•Catalogues, Guides, etc.
Resolution System
Repositories / Collections
User
Selected Digital Object Types
• Documents, Books, Music, Videos, Spreadsheets• Personal data (coordinates, financial, medical)• Observational data (climate, radio astronomy)• Networking Information (operations, provisioning,
forecasting)• Commerce and Business Information (contracts, bills of
lading, letters of credit, etc)• Software (programs, running processes & distributed
systems)• Information about “Things”
Repositories
Any Hardware & SoftwareConfiguration
Logical External Interface
Store and Access Digital Objects on the Net
Digital Object Protocol
Digital Object Protocol
• Uniform interface for accessing repositories and their digital objects
• Based on the use of identifiers
• Provides authentication of both users and servers upon request or where required
• Uses identity management based on the use of public keys
• Key means of implementing interoperability
The Digital Object Protocol is a Meta-Level, Extensible Interface
<input sequence><H1> <H2> <Params> <output sequence>
H1 is a handle for the operation applied to the Target DO H2.Similarly both A and B are known by their Handles HA and HB.The steps of the protocol are:
Establish a connection from A to B
{Optionally} A asks B to authenticate himself
If successful, A provides an input string to B
{Optionally} B asks A to authenticate herself
B provides the results of the operation
Either party may choose to continue or close
• Registers the existence and access conditions for Digital Objects– Enables collections to be defined with appropriate access controls
• Provides a user interface to browse and search the registry, and an API for other programs to search the registry
• Integrates existing technologies– Handle System for identification and access– Digital Object Repository for metadata object storage and access– XML for object description and submission– Specification of Metadata Schemas
Metadata Registry
ContentRepositories
CORDRACommunity
CORDRARegistry
CORDRARegistry
Community
ContentRepositories
CORDRACommunity
ContentRepositories
CORDRARegistry
IntermediateRegistry
of Registries
FederationLevel
Metadata
CORDRARegistry
CORDRACommunity
Federation Level
Metadata
FederationLevel
Metadata
CORDRACommunity
CORDRARegistry
ContentRepositories
CORDRACommunity
CORDRARegistry
ContentRepositories
IntermediateRegistry
of Registries
CORDRARegistry
Community
Federation LevelMetadata
MasterRegistry
of Registries
CORDRARegistry
Community
ContentRepositories
CORDRARegistry
Federation LevelMetadata
CORDRA
What are Handles?Why Resolution Systems?
• CNRI uses the name “Handles” to denote digital object identifiers
• Others may prefer to use their own descriptors• Existing identifier schemes are accommodated• Identifiers provide a way to identify data structures
independent of their physical form or location, if any• Identifiers can be of many forms, and may contain
randomly generated strings, date-time stamps as well as semantics
• The identifier itself will not usually contain useful information about the digital object
• The resolution system is intended to make available the useful information
Why are identifiers Important
• For global addressing– and possibly routing
• For long-term information preservation• For building linkages
– In lieu of attachments– To create virtual structures
• For accessing related metadata– To convey search results– To authenticate/validate
• Connectivity• Individual Digital Objects• Identity
Structure of the Identifiers
• Digital Object Identifiers are structured as “prefix/suffix”
• They may be conveyed in various forms, such as:– 10.1234/Conf_Summary– HDL:10.1234/Conf_ Summary– hdl.handle.net/10.1234/Conf_Summary
• Each prefix has its own administrator with PKI access to the system for creation, change and deletion.
• Resolution of an identifier results in a returned resolution record – generally within a fraction of a second
Resolution Mechanism
Multiple WorkstationsDistributed Globally
Handle System<www.handle.net>
DO Identifier
ResolutionRecord
System is non –nodalScaleable & DistributedSupports global (and local) resolution
Handle System Features
• Supports both Resolution and Administration• Internationalized character sets• Secured resolution service• Provides for Unique Persistent Identifiers
• Current Users include:
DOI System, Open Archives Initiative, Library of
Congress, CNNIC, Office of European Publications,
DataCite, EIDR, DSpace Community and others
Handle Resolution
is a collection ofhandle services,each of which consists of one ormore replicated sites,
Site 1Site 1 Site 2Site 2
Site 1Site 1
Site 2Site 2
Site 3Site 3 …... Site nSite n
Client
The Handle System
LHS
LHS LHS
LHSGHR
each of which mayhave one or moreservers.
123.456/abc URL 4 http://www.acme.com/
http://www.ideal.com/8URL
#1#1 #2#2 #n#n#4#4#3#3
#1#1 #2#2
...
Mirroring the Global Handle Registry
M M P M M • • • •• • • •
Administration
user user user
Non-System Handle Recordsare in lots of Local Handle Services
Contains SystemHandle Records
Planned Deployment of aMulti-Primary Global Registry
P P P P P • • • •• • • •
A limited number of primarieseach Administered Separately
user user user
Non-System Handle Recordsare in lots of Local Handle Services
Contains SystemHandle Records
Plus MirrorsPlus Mirrors
Observations
• Identifiers provide the glue that holds complex distributed systems together
• Security can be provided at a very fine level of granularity in the system
• Repositories enable reliable long-term access to digital objects over generations of technology change
• Registries enable digital objects to be made known and findable using multiple metadata schemas
• The Multi-primary Global Registry enables distributed administration on a collaborative basis by multiple parties around the world.
• Finally, DONA will provide a framework for the management of the DO Architecture in the future.