standardization aspects of ebook content formats

13
Standardization aspects of eBook content formats Kyong-Ho Lee * , Nicholas Guttenberg, Victor McCrary IT Laboratory, National Institute of Standards and Technology, 100 Bureau Drive, Stop 8951, Gaithersburg, MD 20899, USA Received 3 November 2001; received in revised form 25 March 2002; accepted 6 April 2002 Abstract This paper presents the necessity and direction of a standard for representing contents of electronic books (eBooks). To identify the current problems of content formats, as a case study, we discuss the advantages and disadvantages of the Open eBook Publication Structure (OEBPS) and Portable Document Format (PDF) in terms of functional aspects as well as critical standardization issues such as interoperability, openness, applicability, and extensibility. Particularly, this paper suggests an Extensible Markup Language (XML)-based improvement of the OEB format as a standard. This paper also describes that PDF can be used together for different aspects of publishing process. D 2002 Elsevier Science B.V. All rights reserved. Keywords: eBook; Content format; Open eBook Publication Structure; PDF; Standardization; XML 1. Introduction With the development of information technology and the wide spread of the Internet, the volume of digital information is increasing. According to the recent research result, over 93% of new information produced is being created in digital format [9]. Due to this increasing volume of digital contents, there is a growing interest in an electronic book (eBook). Spe- cifically, dedicated reading devices for eBooks that are relatively new forms of reading apparatus have been designed [2,4,18 – 20]. eBooks are more efficient than paper-based books from the various perspectives such as storage, transfer, delivery, and accessibility. Because eBooks may even be compressed, the size of data storage devices for them is far smaller than the size of the paper that they would be printed on. It is possible to introduce multi- media and hypertext links into eBooks. For example, eBooks allow the reader to view video clips, listen to sound and narrations, or jump to a location on the Web simply by selecting a link. Accessibility features for eBooks may also be implemented. Text-to-speech or Braille output programs [17] could increase access for the blind. Online libraries such as the Internet Public Library, 1 the Project Gutenberg, and the netLibrary provide information that is in public domain. This would allow people otherwise unable to access public domain materials to view the book of their choice at any computer connected to the Web. The digital library technologies, coupled with mobile reading devices can support the vision of ubiquitous access to electronic materials including eBooks [1,11]. 0920-5489/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved. PII:S0920-5489(02)00032-6 * Corresponding author. E-mail address: [email protected] (K.-H. Lee). www.elsevier.com/locate/csi 1 The Web sites of interest in this article are listed in Appendix A in the order that they are mentioned. Computer Standards & Interfaces 24 (2002) 227 – 239

Upload: kyong-ho-lee

Post on 18-Sep-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Standardization aspects of eBook content formats

Standardization aspects of eBook content formats

Kyong-Ho Lee *, Nicholas Guttenberg, Victor McCrary

IT Laboratory, National Institute of Standards and Technology, 100 Bureau Drive, Stop 8951, Gaithersburg, MD 20899, USA

Received 3 November 2001; received in revised form 25 March 2002; accepted 6 April 2002

Abstract

This paper presents the necessity and direction of a standard for representing contents of electronic books (eBooks). To

identify the current problems of content formats, as a case study, we discuss the advantages and disadvantages of the Open

eBook Publication Structure (OEBPS) and Portable Document Format (PDF) in terms of functional aspects as well as critical

standardization issues such as interoperability, openness, applicability, and extensibility. Particularly, this paper suggests an

Extensible Markup Language (XML)-based improvement of the OEB format as a standard. This paper also describes that PDF

can be used together for different aspects of publishing process. D 2002 Elsevier Science B.V. All rights reserved.

Keywords: eBook; Content format; Open eBook Publication Structure; PDF; Standardization; XML

1. Introduction

With the development of information technology

and the wide spread of the Internet, the volume of

digital information is increasing. According to the

recent research result, over 93% of new information

produced is being created in digital format [9]. Due to

this increasing volume of digital contents, there is a

growing interest in an electronic book (eBook). Spe-

cifically, dedicated reading devices for eBooks that

are relatively new forms of reading apparatus have

been designed [2,4,18–20].

eBooks are more efficient than paper-based books

from the various perspectives such as storage, transfer,

delivery, and accessibility. Because eBooks may even

be compressed, the size of data storage devices for

them is far smaller than the size of the paper that they

would be printed on. It is possible to introduce multi-

media and hypertext links into eBooks. For example,

eBooks allow the reader to view video clips, listen to

sound and narrations, or jump to a location on the

Web simply by selecting a link. Accessibility features

for eBooks may also be implemented. Text-to-speech

or Braille output programs [17] could increase access

for the blind.

Online libraries such as the Internet Public Library,1

the Project Gutenberg, and the netLibrary provide

information that is in public domain. This would allow

people otherwise unable to access public domain

materials to view the book of their choice at any

computer connected to the Web. The digital library

technologies, coupled with mobile reading devices can

support the vision of ubiquitous access to electronic

materials including eBooks [1,11].

0920-5489/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved.

PII: S0920 -5489 (02 )00032 -6

* Corresponding author.

E-mail address: [email protected] (K.-H. Lee).

www.elsevier.com/locate/csi

1 The Web sites of interest in this article are listed in Appendix

A in the order that they are mentioned.

Computer Standards & Interfaces 24 (2002) 227–239

Page 2: Standardization aspects of eBook content formats

Although eBooks will not completely replace

paper-based books, there are considerable predictions

in the successful eBooks [3,5–7,14,21]. However, in

spite of the advantages and optimism, the acceptance

of eBook technology is slow due to several obstacles

including the user unwillingness to read from a screen.

Sottong [23] states that eBooks are not comparable to

paper-based books in terms of several criteria includ-

ing display quality such as resolution and contrast, du-

rability, cost, and ease of use.

Particularly, the lack of a standard is exacerbating

this situation. This means that some reading device

and software for eBooks may be unable to read

publications created in the format for others. Conflict-

ing formats would create confusion for potential

customers. Currently, there are a few producers of

eBook devices. Examples of dedicated eBook devices

are the Rocket eBook from Nuvo Media and the

Softbook Reader from SoftBook Press, which are

now manufactured as the REB 1100 and 1200 by

the Gemstar eBook. The devices’ proprietary formats

require that additional versions of eBooks be made

available. Therefore, a standard is one of the most

important factors for the success of eBooks.

On the other hand, authors are afraid to publish

their books electronically due to the potentials for

book piracy. Many companies are developing meth-

ods for the digital rights management (DRM) of

eBooks on their device. However, this is also non-

standard. Different encryption methods would make it

increasingly harder to interface different eBook devi-

ces and publications. The DRM issue is also one of

the big concerns of eBook industry.

However, focusing on content formats, this paper

presents the necessity and direction of a standard for

representing contents of eBooks. We identify the

current problems of eBook formats and discuss crit-

ical standardization issues. Specifically, as a case

study, we analyze the advantages and disadvantages

of the Open eBook Publication Structure (OEBPS)

and Portable Document Format (PDF) by Adobe in

detail. Although the LIT format of Microsoft Reader

may be in competition with PDF in the eBook market-

place, our choice of PDF is dictated by the fact that it

is widely used as a de facto standard for electronic

publishing. Meanwhile, there is an effort to develop a

standard format for eBooks. The National Institute of

Standards and Technology (NIST) and related hard-

ware and software companies have established the

Open eBook Forum (OEBF) on October 1998. To

create and maintain standards and promote the suc-

cessful adoption of eBooks, the OEBF released the

specification of OEBPS on September 1999.

Neither of OEBPS and PDF is ideal for a standard

for representing contents of eBooks. Regardless of the

advanced formatting capability, PDF has an inherent

liability in terms of standardization issues including

the openness. Although OEBPS has limited level of

style control, this paper suggests that, for a standard, it

is desirable to improve OEBPS based on Extensible

Markup Language (XML) technologies. Considering

the future direction of technological development,

XML-related technologies are promising and feasible

[24]. On the other hand, this paper describes that PDF

can be used together with a standard format because it

addresses different aspects of publishing process.

This paper is organized as follows. In Section 2, we

describe a necessity of a standard eBook format and

address critical issues that a standard should support.

To identify the current problems and future require-

ments, as a case study, an analysis of OEBPS and PDF

is given in Section 3. Finally, conclusions are sum-

marized in Section 4.

2. Critical issues for standardization

As shown in Table 1, various vendors use different

kinds of formats including Hypertext Markup Lan-

guage (HTML), PDF, Rich Text Format (RTF), and

proprietary formats. They have developed the reader

or viewer based on their own formats. As a result, to

read the eBooks of particular vendor, the users have to

purchase or download the corresponding software.

Table 1

Examples of various kinds of eBook readers and formats

Reader or viewer Format

Acrobat Reader PDF

AportisDoc AportisDoc format

MobiPocket Reader OEBPS, HTML, DOC, PCF

Microsoft Reader OEBPS, HTML, LIT

Palm Reader PDB

Everybook Viewer PDF, RTF

Eroket OEBPS, RoketEdition

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239228

Page 3: Standardization aspects of eBook content formats

Furthermore, this has introduced the problem of

double investment in eBook industry and has pre-

vented eBooks from being actively and widely accep-

ted. Therefore, a standard format is required for active

usage of eBooks.

On the other hand, in developing a standard, in

addition to functional and technical requirements

including formatting capability and logical structure,

critical issues such as interoperability, extensibility,

applicability, and openness should be considered.

Fig. 1. An illustration of interoperability.

Fig. 2. An illustration of extensibility.

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239 229

Page 4: Standardization aspects of eBook content formats

Brief descriptions about critical issues are as fol-

lows.

2.1. Interoperability

The eBook industry including traditional publish-

ers, eBook producers, authors, solution developers, and

service providers should be able to exchange eBooks

independent of software and hardware as shown in Fig.

1. To this end, a non-proprietary standard format should

be developed with the consent of the whole eBook

community. The interoperability issue is one of the

most important problems that should be solved to

achieve wide spread success of eBook technology.

2.2. Extensibility

An eBook standard should be able to be extended

to include new functionalities such as multimedia and

user interaction as shown in Fig. 2. Due to the rapid

advance of computer technology, new media and

technology will be invented and their user demand

will increase. Particularly, the forward and backward

compatibilities between different generations of

eBooks should be considered.

2.3. Applicability

An eBook format should be easily applicable to

various kinds of related fields such as database system

and wireless Internet as illustrated in Fig. 3. Specif-

ically, one example of the applicability issue is about

whether a format can specify content and user inter-

face for wireless narrowband devices such as digital

mobile phone, personal digital assistant, and other

wireless terminal. An XML document can be reused

(e.g., splitted) and directed, by using different style

sheets, towards eBooks devices with the smaller

screen size and limited communication bandwidth.

2.4. Openness

An eBook standard should be independent of a

particular vendor. That is, it must be an open

Fig. 3. An illustration of applicability.

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239230

Page 5: Standardization aspects of eBook content formats

standard that is accessible freely as illustrated in

Fig. 4.

3. A case study: analysis of OEBPS and PDF

To identify the current problems of content for-

mats, this section discusses the advantages and dis-

advantages of OEBPS and PDF in terms of the

proposed standardization issues as well as the func-

tional and technical aspects.

3.1. OEBPS

3.1.1. Overview

Based on XML, OEBPS draws on well-estab-

lished techniques from various document publishing

and representation communities. It incorporates

elements from Extensible Hypertext Markup Lan-

guage (XHTML), Cascading Style Sheet (CSS),

Dublin Core metadata, and Unicode. The content

providers referred to by OEBPS provide publica-

tions to reading systems in a form defined by the

OEB format. An OEB publication is a collection of

OEB documents (XML documents that conform to

OEBPS) and other files, including structured text

and graphics, which constitute a unit for publica-

tion. A reading system is a combination of hard-

ware and/or software that accepts OEB publications,

and directly or indirectly makes them available to

readers.

A publication conforming to OEBPS must include

exactly one OEB package file, which specifies the

OEB documents, images, and other objects that make

up the OEB publication and how they relate to each

other. As shown in Fig. 5, the main parts of the OEB

package file are:

� Metadata: Publication metadata (title, author,

publisher, etc.).

Fig. 4. An illustration of openness.

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239 231

Page 6: Standardization aspects of eBook content formats

� Manifest: A list of files (documents, images,

style sheets, etc.) that makes up the publication.

The manifest also includes fallback declarations

for files of types not supported by this

specification.� Spine: An arrangement of documents providing

a linear reading order.� Tours: A set of alternate reading sequences

through the publication, such as selective views

for various reading purposes, reader expertise

levels, etc.� Guide: A set of references to fundamental

structural features of the publication, such as

table of contents, foreword, bibliography, etc.

Particularly, an OEB document that restricts itself

to the constructs defined in the OEB specification is

called as a basic OEB document. Basic OEB docu-

ments are valid XML documents that conform fully to

the OEB document Document Type Definition

(DTD). OEBPS defines a style language that is based

on CSS and includes other properties for page layout,

headers, and footers. OEBPS can handle simple for-

matting with horizontal text of any color as well as

graphics in Portable Network Graphics (PNG) and

JPEG formats and organizational structures such as

tables and lists.

3.1.2. Pros and cons

The Cascading Style Sheets Level 1 (CSS1) stand-

ard permits document objects to layer and overlap. All

objects are treated as boxes or blocks, which cannot

be rotated. Cascading Style Sheets Level 2 (CSS2) has

extensions that allow text to either flow left to right or

right to left but not vertically or at an angle. The

languages that can be represented by CSS are there-

fore limited to those that run horizontally. As some

languages such as Chinese, Japanese, and Korean

mostly utilize vertical texts, a workable solution is

necessary for the OEB format to be fully adopted.

While specific positioning of images and text in a

document is possible through both CSS and a number

of tricks, it is not the ability of the OEB style

language. Likewise, preformatted texts can simulate

vertical texts, but the effort required would be more

than if one were to use other formats, such as PDF, for

the same tasks.

On the other hand, because of the benefit of human

readability, any content provider with a simple text

editor can easily create OEB documents with a basic

knowledge of the format. An example of the simplest

form of a valid OEB document is shown in Fig. 6. The

first two lines describe the document type. As they are

the most daunting part of most documents, users who

ignore them will likely find the remainder of an OEB

document creation to be simple. OEB documents will

be viewable on most modern Web browsers because

the DTD basically comes from XHTML 1.0. One

Fig. 6. An example of an OEB document.

Fig. 5. The structure of the OEBPS package.

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239232

Page 7: Standardization aspects of eBook content formats

difference between viewers designed explicitly for

OEBPS and those designed for Web browsing is that

OEB viewers are required to support PNG for images,

whereas most Web browsers support only JPEG and

GIF. The PNG format is gaining acceptance and many

eventually replace GIF in viewers designed for HTML

and Web applications.

One of the advantages of XML is its ability to

embed logical structure information into a document.

The logical structure information enables a multi-

plicity of applications, including hierarchical brows-

ing, structural hyper-linking, logical component-based

storage and retrieval, and style translation. XML

intends to separate the content and the presentation

from a document using an external style sheet mech-

anism. The logical structure information and the

external style sheet mechanism enable re-purposing

and multiple views of an XML document.

The OEB document DTD is based on HTML that

is intended for human rendering. Although it is

possible to insert table of contents through the Guide

of OEB Package, OEBPS has basically limitations as

a format for structured documents. OEBPS describes

an extension mechanism that enables user to include

any kinds of XML documents, that is, extended OEB

documents. However, the reading systems are not

required to interpret the extended OEB documents.

Meanwhile, OEBPS supports external style sheets.

Particularly, the external style sheet mechanism and

the fall back function of the OEB package enable

eBooks to be directed towards reading devices with

various display size and bandwidth.

As mentioned before, the interoperability issue

means that eBook contents should be able to be

exchanged over various platforms among authors,

editors, publishers, and content owners. Particularly,

the human readability and platform independence of

ASCII format can make the OEB format truly inter-

operable. The OEB format has been designed to

support non-proprietary standard, based on open and

public domain specifications used on the Web by a

group of over 85 organizations involved in electronic

publishing. There is also a large user base for OEB-

like content already existent, that is, the Internet

population.

In terms of the applicability issue, due to the fact

that XML is being widely accepted in various fields

including wireless Internet communications and data-

base [22], the application potential of OEB documents

is really high. Recently, its expansion is being con-

sidered to include advanced style control, multimedia,

user interaction, and navigational structure [12].

3.2. PDF

3.2.1. Overview

PDF lets users view and print a file exactly as an

author designed it, without needing to have the same

application or fonts used to create the file. PDF is a de

facto standard for electronic document distribution

worldwide.

A PDF file is a physical container in a file system

containing a PDF document and other data such as

version and object catalog. A PDF document contains

one or more pages, where each page consists of text,

graphics and/or images as well as hyperlinks and

sounds. As shown in Fig. 7, a PDF file consists of

header (specifies PDF version), body (sequence of

objects), cross-reference table (where to find each

object), and trailer (tells where to find cross-reference

table). A body represents a hierarchy of objects

comprising a document as shown in Fig. 8. A root

Catalog object references the root of a Page object

tree. Each Page object has imageable contents, thumb-

nail, and/or annotations.

PDF is technically a binary format, although the

primary contents of a PDF document are ASCII

characters. Binary segments are inserted into the

header of the file by PDF software to prevent transfer

programs from treating them as text files. PDF is

intended not for creating a document but for convert-

ing a file. There are several ways to create a PDF file.

Fig. 7. The layout of a PDF file.

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239 233

Page 8: Standardization aspects of eBook content formats

Specifically, Adobe Acrobat Distiller is widely used to

convert Postscript to PDF. Especially for desktop

platforms, Adobe PDF Writer emulates a printer

driver and converts the GDI or Quickdraw commands

from Windows and Mackintosh applications to PDF

documents. Additionally, there are other Adobe and

third party tools that support web pages, scanned

images, and Microsoft Word.

3.2.2. Pros and cons

PDF has the ability to represent fairly complex

formatting in a document. It can perform rotation and

transformation as well as a kind of layering. It can

utilize graphics primitives and include links to other

documents and websites. Inline media including

sound and video are supported and compressed in a

number of ways, as long as the target platform

supports that particular media. The newest PDF spec-

ification includes support for JavaScript, which allows

a document to interact with the user in a programmed

way. The ability of PDF to handle complex formatting

would be an advantage to many content providers. For

example, certain type of documents usually uses

different kinds of writing directions in order to high-

light the words. PDF is based on the Postscript

language-imaging model that makes sharp and precise

printing available on almost all printers.

The free Adobe Acrobat Reader has played an

important role in the wide success of PDF, having a

plug-in for Web browsers as well as standalone

applications. Furthermore, tools for handling PDF

are common on various platforms. The powerful

environment for electronic publishing is one of the

advantages of PDF, compared with the fact that tools

that support CSS fully are even rare. The advanced

technology of Adobe enables to reduce the file size

Fig. 8. The structure of a PDF document.

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239234

Page 9: Standardization aspects of eBook content formats

and increase the rendering speed. However, the

detailed description about this technology is left out

because it is not directly related to the format itself.

To the average user, these advanced tools may be

inaccessible and difficult to find or use. Writing PDF

files by hand is even more difficult a task. As the PDF

format is not human readable, authoring tools would

be necessary, potentially increasing the cost of eBook

development. Some free authoring software exists but

may be hard to use for the average author. Converting

utilities also exist, allowing one to create a document

under a different program and then convert it to a PDF

file. This may have the problem of occasional con-

version errors because the features of the original

program may be not available for PDF.

On the other hand, because many things in PDF are

referenced through pixel locations as opposed to

relative locations, a document may not fit on the

display that it is intended to be viewed on. The

implementation of PDF for various eBook-reading

devices may be difficult, because of the number of

data conversions. PDF is built primarily for document

exchange on computing systems with the full range of

interfaces and outputs. If a PDF file were transferred

to an eBook reading device without the ability to use

some of the data in the PDF file, the extra space used

to store it would be wasted. The development of a

PDF stripping utility, which would remove selected

elements from a PDF file depending on the level of

the reading device, would potentially reduce the size

of documents. As mentioned before, this problem of

multiple views can be solved by the external style

sheet mechanism that separates the content and the

presentation of a document.

PDF has a feature that allows progressive changes

to the document. This may provide functionality for

writing notes in the electronic margins, underlining or

marking up texts. Changes are made on top of the

existing material, without removing it or replacing it.

Such changes could be removed or simply not ren-

dered, if desired, leaving the book blank for the next

user. However, it may be difficult to specifically

transmit the changes made between users, because

they become a part of the PDF document.

PDF files are commonly used for electronic docu-

ments over various platforms due to the advanced

style control and the wide spread of the free reading

software. The PDF software has the advantages of

having flexible formatting and compressing a source

file in a compact one. PDF has interactive sequences

and other features that make it appropriate for use in

eBooks. Additionally, its features can be extended by

using various plug-ins. However, despite the fact that

the PDF specification is available without the re-

quirement of a paid license, the specification is still

owned and controlled by Adobe. This is a disadvant-

age to its use as a standard for eBooks in terms of

interoperability and openness, because of making the

future eBook standard dependent on a proprietary

format.

In case of the applicability issue, the re-purpose

and multiple views of content materials have been

emphasized. Recently, Adobe Acrobat software has

introduced tagged PDF, an enhancement to the PDF

specification that can be re-flowed to fit small screen

devices and embed logical document structure in PDF

files. However, there still exists the necessity of the

separation of contents and their presentation. Addi-

tionally, for the database applications such as compo-

nent-based storage and retrieval, the ability to

represent logical structure information should be

enough to get data out of and into legacy database

systems.

4. Conclusions

The success of eBooks will be affected by the

factors such as the quantity, quality, and cost of

contents and the feel of reading. With the help of

advanced technology, the quality and cost must and

will go up and down, respectively [4,16]. Recently, a

new digital display, that is, an electronic paper that is

thin and flexible enough to roll into a tube, has been

introduced [10]. However, a large number and variety

of eBook contents and titles is still not available due

to the lack of a standard format. Content providers are

still using many proprietary formats for each of the

reading devices and software.

As shown in Table 2, PDF and OEBPS have their

own advantages and disadvantages. Both can repre-

sent the majority of documents and are available to

most authors and publishers. However, neither is ideal

for a universal standard, concerning critical issues,

which should be required for a standard format, as

well as technical and functional aspects.

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239 235

Page 10: Standardization aspects of eBook content formats

Each has particular faults that may make it unsuit-

able to certain audiences. An ideal solution would be

to combine the positive aspects of each and generate a

new format. However, this may be incompatible with

existing software and invalidate the experience of its

users in working with other formats. Some of prob-

lems may be allayed by an improvement to existing

formats and the creation of advanced software. PDF

affords detailed control and maximal flexibility in

formatting and presentation. However, its problematic

standardization issue, that is, the proprietary nature of

the format, must be dealt with.

When we consider the future direction of informa-

tion technology as well as the proposed critical issues,

a standard based on XML is desirable. XML has been

developed to make it easy to interchange structured

documents over the Web. XML is flexible enough to

be able to describe any logical document structure.

Particularly, the logical structure of XML documents

facilitates various kinds of data processing including

database storage and retrieval. With the help of the

style sheet mechanism, the content is separated from

the presentation. The separation of the content and the

presentation enables to support the multiple views of

an XML document. This benefit is more and more

becoming important because of the availability of

different kinds of reading devices and the need for

the various level of service.

Therefore, the improvement of the OEB format

based on XML-related technologies is more appropri-

ate for a standard. The central problem of OEBPS is

that complex formatting cannot be represented by the

current version. For OEBPS to be a general standard

for eBooks, it must, at least, be possible to take paper-

based books and convert them into OEB representa-

tions. On the other hand, eXtensible Style Language

(XSL), which has recently been approved as a Web

standard, supports sophisticated formatting including

the ability to handle text with any angle. The future

version of OEBPS may consider XSL as a base style

language. In order to maximize the benefits of XML,

the future OEBPS should be able not only to represent

logical structure information, but also to separate the

content and the presentation.

Recently, there is a growing interest in eLearning

and Web education. There is a demand for advanced

eBooks in technical and scientific fields such as

physics and computational sciences [8]. To apply

eBook technology to eLearning environment, the

future eBooks may be extended to incorporate the

recent development of XML-related technologies

including MathML and Scalable Vector Graphics

(SVG). Particularly, interactivity and user-tailored

eBooks based on multimedia database might be

important [13,15].

Because PDF addresses different aspects of pub-

lishing process from OEBPS, with the result that they

are not direct competitors, it can be used together with

OEBPS in a variety of ways. For instance, a reading

system might use PDF internally as a rendering format

for eBook contents. In addition, existing PDF-format

content, like content in any other non-OEBPS format,

can be embedded in an OEBPS publication, provided

that the publication contains an alternative represen-

tation of the content that can be used by reading

systems that lack PDF support.

Table 2

The advantages and disadvantages of OEBPS and PDF

Formats Advantages Disadvantages

OEBPS Interoperability by large number of organizations Limited level of style control

Opened format based on Web standards Lack of advanced authoring tools

Applicability and extensibility Limited level of logical structure information

Re-purpose and multiple views of contents

Human readability based on ASCII text

Simplicity and conciseness

PDF Advanced style control Vendor-owned format

Free Acrobat Reader over variable platforms Intended for converting a file

Correct printing on any printing device Difficult to modify or edit contents

Easy extension with various plug-ins No logical structure information

Compressed smaller file sizes/font embedding Lack of re-use and multiple views of contents

Advanced rendering technology of Adobe

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239236

Page 11: Standardization aspects of eBook content formats

On the other hand, because the current version of

the OEB specification does not address the DRM

issue, eBook industry is still likely to use proprietary

wrappers, for example, Microsoft’s LIT format and

MobiPocket’s PCF format, for end-user delivery. In

order to ensure the success of eBooks, a DRM stand-

ard should also be issued. OEBPS itself does not

directly address DRM and copy protection. However,

OEBF recognizes that these are extremely important

issues for the publishing community and is leading

work in this area. Recently, the Electronic Book

Exchange (EBX) working group, an industry consor-

tium for protecting copyright in eBooks, merged with

OEBF. The OEBF Rights and Rules Working Group

is the center of DRM activity in OEBF and is

collaborating with the Publication Structure Working

Group to provide the electronic publishing community

with a consistent and mutually supporting set of

specifications.

Acknowledgements

Certain commercial equipment, instruments, or

materials are identified in this paper to foster under-

standing. Such identification does not imply recom-

mendation or endorsement by the National Institute of

Standards and Technology, nor does it imply that the

materials or equipment identified are necessarily the

best available for the purpose.

Appendix A. URLs of Interest

Internet Public Library, http://www.ipl.org/

Project Gutenberg, http://www.promo.net/pg/

netLibrary page, http://www.netlibrary.com/

Nuvo Media, http://www.nuvomedia.com/

SoftBook Press, http://www.softbook.com/

Gemstar eBook, http://www.ebook-gemstar.com/

Open eBook Forum, http://www.openebook.org/

OEB Publication Structure 1.01 Specification, http://www.openebook.org/oebps/oebps1.0.1/download/index.htm

PDF Reference Manual 1.3 page, http://partners.adobe.com/asn/developer/acrosdk/docs/pdfspec.pdf

National Institute of Standards and Technology, http://www.nist.gov/

Extensible Markup Language (XML) 1.0 Specification page, http://www.w3.org/TR/REC-xml

XML frequently asked questions, http://www.ucc.ie/xml/

XML resource page, http://www.computer.org/internet/xml/

HTML 4.01 Specification page, http://www.w3.org/TR/html401/

Acrobat Reader page, http://www.adobe.com/products/ebookreader/

AportisDoc page, http://www.aportis.com/

MobiPocket Reader page, http://www.mobipocket.com/

Microsoft Reader page, http://www.microsoft.com/reader/

Palm Reader page, http://www.peanutpress.com/

Everybook Viewer page, http://www.everybook.net/

Eroket page, http://www.rocket-library.com/

Unicode Consortium page, http://www.unicode.org/

Wireless Application Protocol (WAP) Forum, http://www.wapforum.org

Portable Network Graphics (PNG) page, http://www.libpng.org/pub/png/

Scalable Vector Graphics (SVG) 1.0 Specification page, http://www.w3.org/TR/SVG/

RTF 1.6 Specification page, http://msdn.microsoft.com/library/specs/rtfspec.htm

XHTML 1.0 Specification page, http://www.w3.org/TR/xhtml1/

XSL 1.0 Specification page, http://www.w3.org/TR/xsl/

CSS page, http://www.w3.org/Style/CSS/

CSS Level 1 Specification page, http://www.w3.org/TR/REC-CSS1

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239 237

Page 12: Standardization aspects of eBook content formats

References

[1] R. Burk, Don’t be afraid of E-Books, Library Journal 125 (7)

(2000) 25–42.

[2] A. Crossen, J. Budzik, M. Warner, L. Birnbaum, K.J. Ham-

mond, Xlibris: an automated library research assistant, Proc.

Int’l Conf. IUI, Santa Fe, NM (2001) 49–52.

[3] S. Ditlea, The real E-Books, Technology Review 103 (4)

(2000) 70–78.

[4] B.L. Harrison, E-Books and the future of reading, IEEE Com-

puter Graphics and Applications 20 (3) (2000) 32–39.

[5] D.T. Hawkins, Electronic books: a major publishing revolu-

tion: Part 1. General considerations and issues, Online 24 (4)

(2000) 14–28.

[6] D.T. Hawkins, Electronic books: a major publishing revolu-

tion: Part 2. The marketplace, Online 24 (5) (2000) 18–36.

[7] J. Heilmann, H. Linna, The technology and applications of the

new generation of electronic books, Proc. Conf. the Technical

Association of the Graphic Arts (TAGA) (2001) 581–590.

[8] R.H. Landau, D. Vediner, P. Wattanakasiwich, K.R. Kyle, Fu-

ture Scientific Digital Documents with MathML, XML, and

SVG, IEEE Computing in Science and Engineering 4 (2)

(2002) 77–85.

[9] P. Lyman, H.R. Varian, How Much Information, http://www.

sims.berkeley.edu/how-much-info.

[10] C.C. Mann, Electronic paper turns the page, Technology Re-

view 104 (2) (2001) 42–48.

[11] C.C. Marshall, G. Golovchinsky, M.N. Price, Digital libraries

and mobility, Communications of the ACM 44 (5) (2001) 55–

56.

[12] Open eBook Forum, Toward the Next Version of the Publi-

cation Structure Specification, http://www.openebook.org/

members/ps/20000928-expanded.htm (2000) (members only).

[13] G. Ozsoyoglu, N.H. Balkir, G. Cormode, Z.M. Ozsoyoglu,

Electronic books in digital libraries, Proc. the Forum on Re-

search and Technology Advances in Digital Libraries (ADL)

(2000) 5–14.

[14] L. Press, From P-books to E-books, Communications of the

ACM 43 (5) (2000) 17–21.

[15] J.C. Principe, N.R. Euliano, W.C. Lefebvre, Innovating adap-

tive and neural systems instruction with interactive electronic

books, Proceedings of the IEEE 88 (1) (2000) 81–95.

[16] A.R. Procter, Electronic paper and the E-books: a real or im-

agined threat to paper? Pulp and Paper of Canada 102 (6)

(2002) 8.

[17] J. Roberts, O. Slattery, D. Kardos, Rotating-wheel Braille dis-

play for continuous refreshable Braille, Proc. Society for In-

formation Display (SID) Symposium Digest of Technical

Papers (2000) 1130–1133.

[18] B.N. Schilit, G. Golovchinsky, M.N. Price, Beyond paper:

supporting active reading with free form digital ink annota-

tions, Proc. Int’l Conf. CHI, NY (1998) 249–256.

[19] B.N. Schilit, M.N. Price, G. Golovchinsky, Digital library in-

formation appliances, Proc. ACM Digital Libraries (1998)

217–226.

[20] B.N. Schilit, M.N. Price, G. Golovchinsky, K. Tanaka, C.C.

Marshall, The reading appliance revolution, IEEE Computer

32 (1) (1999) 65–73.

[21] K. Schreiner, E-Books: it’s all in the revolution, IEEE Multi-

media 7 (2) (2000) 15–17.

[22] L. Seligman, A. Roenthal, XML’s impact on databases and

data sharing, IEEE Computer 34 (6) (2001) 59–67.

[23] S. Sottong, E-Book technology: waiting for the ‘‘false Pretend-

er’’, Information Technology and Libraries 20 (2) (2001) 72–

80.

[24] R. Vidgen, S. Goodwin, XML: what is it good for?Computing

and Control Engineering Journal 11 (3) (2000) 119–124.

Kyong-Ho Lee received BS, MS, and

PhD degrees in computer science from

Yonsei University, Seoul, Korea, in 1995,

1997, and 2001, respectively. Currently,

he is working as a guest researcher of IT

Laboratories at the National Institute of

Standards and Technology (NIST), MD,

USA. Prior to coming to NIST, he was

one of the members of the Korea EBook

standard working group. His research

interests include multimedia document

engineering, knowledge and data engineering, pattern matching,

and XML. He is a member of the Korea Information Science

Society, the Korea Information Processing Society, the Korea

Multimedia Society, and the IEEE Computer Society.

Nicholas Guttenberg is currently major-

ing in physics at McGill University in

Montreal, Quebec. He worked on digital

media storage and displays at the

National Institute of Standards in Tech-

nology, Gaithersburg, MD, for the past 4

years.

CSS Level 2 Specification page, http://www.w3.org/TR/REC-CSS2/

Dublin Core Metadata Initiative page, http://dublincore.org/

MathML 1.01 Specification, http://www.w3.org/TR/REC-MathML/

IMS Global Learning Consortium page, http://www.imsproject.org/

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239238

Page 13: Standardization aspects of eBook content formats

Victor R. McCrary is currently chief of

the new Convergent Information Systems

Division at the National Institute of

Standards and Technology in Gaithers-

burg, MD. In his current position, he

leads a group of researchers, and com-

puter scientists in NIST’s Information

Technology Laboratory. His organization

conducts research in display characteriza-

tion, optical disk (DVD) storage and

reliability, biometrics, electronic books,

trust management, interactive digital television, and digital cinema.

Dr. McCrary organized the world’s first conference on electronic

books in October 1998, and subsequent conferences in 1999, and

2000. His research group has developed a prototype of the

electronic book, and a Braille reader for electronic books. He also

served as the chair for the newly formed Open Electronic Book

Forum, an industry group dedicated to the development and

promotion of standards for electronic books. Concurrently, Dr.

McCrary is an adjunct professor in the Executive Masters of

Technology Management Program at the University of PA. The

program is jointly administered by the Graduate School of Engi-

neering and the Wharton School of Business.

K.-H. Lee et al. / Computer Standards & Interfaces 24 (2002) 227–239 239