digital object architcture an open approach to information management on the net bibliotheca...

19
Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research Initiatives Reston, Virginia 20191 November 19, 2009

Upload: richard-walker

Post on 27-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Digital Object ArchitctureAn open approach to Information

Management on the Net

Bibliotheca Alexandrina

Dr. Robert E. KahnCorporation for National Research InitiativesReston, Virginia 20191

November 19, 2009

Page 2: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Historically

• The initial challenge was to get different computers to interoperate when they are all on a single network.

• Subsequently, the Internet challenge was to get different packet networks to interoperate

• And enabling computers on those diverse networks to talk to each other reliably

• One initial objective was communicating bits without regard for what the receiver would later do with them

• Or for hostile intermediate actors• Mission Accomplished in the mid 1970s• Things have gotten much more sophisticated since then

Page 3: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

What is the Internet• It’s a set of protocols and procedures that allow

different computers and networks to interoperate.

• It links together virtually any packet network, independent of its internal characteristics

• The Internet is not itself a network.• Rather it’s a global information system where the

information flows allow the different constituent networks to work together.

Page 4: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Trust and Authentication

• These are both critical aspects• Bits received can always be checked for

correctness using agreed encryption techniques– Certain techniques may be easier to employ– Others may be more efficient

• But applications can be corrupted, and systems can be compromised

• But even if the underlying application runs (apparently) properly, one needs to be sure that nothing nefarious is going on?

Page 5: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Focusing• I purposely focus in the remainder of these remarks on

what one can do to manage information in the Internet environment

• I purposely do not address physical threats of the form that destroy capabilities. Surreptitious physical threats that modify capabilities lie in-between

• I assume that all system components as well as information in digital form may be viewed as logical entities, of the same genre – known as “digital objects”

• And communication between components (including users) is authenticable in a single logical fashion.

• Given this, the problem is transformed into two alternate ones– how can the authentication be managed systemically– how can the components protect themselves from

information attacks.

Page 6: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Properties of DOs• They are machine independent and

portable from platform to platform• Parts of a digital object may be accessed

and protected separately from the object as a whole

• Authentication of a DO may be enabled by using fingerprints of one or more parts of a DO

• Which enables portability of such objects in many situations.

Page 7: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

What are Digital Objects

• If you can’t uniquely identify a digital object, it doesn’t qualify as a digital object

• Its not the same as a name, its more like the object’s dna. You can exist without a name, but not without your dna

• Like dna, the identifier must be a part of the digital object

• A digital object (DO) is defined as “structured data”, that is machine parsable, and which contains a unique persistent identifier.

Page 8: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Is that all there is to a DO?

• In one sense, yes. In another sense, no.• An important part of a DO is what I call the payload.

When one accesses a DO, the payload is normally what is wanted

• But a DO will generally have associated with it additional information, known as metadata, that provides state information about the DO.– Some of the metadata is always part of the payload– Some (or even all) of the metadata may be stored apart from the

payload or even duplicated there

• And a part of the metadata may be transaction information referencing the use of that digital object.

Page 9: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Finding DOs

• In many cases, one may know the identity of a DO a priori or even its location; in other cases, one may only know properties or characteristics of a DO and must rely on that knowledge to find it.

• Search engines find web pages on the Internet by crawling the Web; but many computers, applications and systems are not available for a “public crawl”

• But they can be characterized explicitly by their owners, managers or creators or with their permission

• Systems that provide this information are called Metadata Registries. At a minimum, such registries respond to queries by returning the digital object identifiers, usually in a presentation format that can be visualized by a user.

• Within the government, a good example of a metadata registry is ADL-R, created for the Advanced Distributed Learning Initiative in the Pentagon.

Page 10: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Metadata Registries

• Are generally used for searching, browsing or creating collections of information

• They do not track operational details• The identifiers they return may be “resolved”

to determine the relevant state information via a “resolution system”

• We call these identifiers “handles”• The Handle System is the pre-eminent

system for resolving digital object identifiers

Page 11: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Handle SystemA general purpose resolution

system• A detailed description is at www.handle.net• It has been operational on the net since 1994 and is in

widespread use in many applications• Software may be downloaded from the net and users can

run their own local handle services• Resolution of a handle produces a “handle record” which

contains state information needed for immediate decision making or action

• For example, the state information may contain– One or more IP Addresses– Terms and Conditions for access– Public Keys– Authentication information to validate the object itself

Page 12: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Repositories

• Repositories provide access to digital objects

• A repository may be a housed in a physical location, or it may be a mobile program.

• Communication with a Repository is via the digital object protocol which supports– Access to DOs based on handles– Authentication in both directions.

Page 13: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Repository Notion

Storage System

Digital Object Manager

Digital Object Protocol

REPOSITORY

Page 14: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

DO Repository Server Software

• Takes inputs based on identifiers and returns digital objects

• Connects to existing and older legacy systems

• Based on an open architecture• Achieves interoperability with other

repository systems that support the protocol• Can provide additional application

dependent functionality, if desired, by depositing executable digital objects

Page 15: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Specific Interface Capabilities

• Standard Interface is at a “meta level”

• Allows new functionality to be added by defining new digital objects

• Supports Authentication of Users and Services

• Provides object level protection

Page 16: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Extensible Interface

<input sequence><H1> <H2> <Parameters> <output sequence>

Where H1 is a handle for the operation to be applied to theTarget DO H2. Similarly both A and B are known by theirHandles HA and HB. The steps of the protocol are

Establish a connection from A to B

{Optionally} A asks B to authenticate himself

If successful, A provides an input string to B

{Optionally} B asks A to authenticate herself

B provides the results of the operation

Either party may choose to continue or close

Page 17: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Displaced Vulnerabilities?

• The Handle System can be attacked– But its fully distributed, can be replicated– And can be locally protected from external

unauthorized intrusions so external actions won’t affect local usage

• Private Keys can be lost– But revocation will prevent continued damage– And replication of digital objects can mitigate

against corruption of information

Page 18: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Vulnerabilities (cont’d)

• Registries can be corrupted, or access denied to authorized users due to hostile action. Replication of registries is one solution to this problem.

• Repositories may be corrupted and produce the wrong information. One must take care where one trusts the deposit of information, just as one must take care in depositing other assets in, say, banks

Page 19: Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research

Bottom Line

• This approach allows for digital information to be managed effectively over very long as well as very short time frames

• All the architectural components have well defined open interfaces, protocols and returned objects which will stand the test of time.

• The architecture allows investment into creating of digital information to be made once and easily ported from technology base to technology base.

• The modular nature of the architecture allows the system to be managed component by component