mhe - consultants for document and datament technologies the xml bubble william j. “bill”...
TRANSCRIPT
MHE - Consultants for Document and Datament Technologies
The XML Bubble
William J. “Bill” McCalpin
EDPP, CDIA, MIT, LIT
Principal, MHE
MHE - Consultants for Document and Datament Technologies
Xplor 21st Global Conference and Exhibit
Miami Beach, Florida
October 30, 2000
MHE - Consultants for Document and Datament Technologies
Introduction
The Hegelian Dialectic
MHE - Consultants for Document and Datament Technologies
Thesis, Antithesis, SynthesisIn the philosophy of Hegel,
these words show the inevitable transition of thought, by contradiction and reconciliation, from an initial conviction to its opposite and then to a new, higher conception that involves but transcends both of them
MHE - Consultants for Document and Datament Technologies
The Hegelian Dialetic
• Thesis: Most business have well-established, productive legacy systems
• Antithesis: XML is springing forth everywhere
• Synthesis: XML will be integrated with legacy systems - enhancing some processes, changing many others, and eliminating some altogether
• In short, XML will affect what you do
MHE - Consultants for Document and Datament Technologies
The Document In The 20th Century
MHE - Consultants for Document and Datament Technologies
What Is A Document?
• The American Heritage Dictionary defines a document as “information in writing placed on a medium such as paper, often used as a record.”
• Documents have been placed on clay tablets, gold leaf, animal skins, all types of paper, microfilm, optical storage, and so on
MHE - Consultants for Document and Datament Technologies
Information And Presentation
• In every case, the document represents a fundamental union of information and presentation
• But “presentation” presumes that the primary audience for the document is a human being
• With the coming of the Internet, this is no longer the case
MHE - Consultants for Document and Datament Technologies
The Curse Of Presentation• Composition
products require that you specify a printer, even before you know where the document will print
MHE - Consultants for Document and Datament Technologies
Why Are Print, Image, And Presentation Formats
Incompatible?
MHE - Consultants for Document and Datament Technologies
Printing And Imaging Formats
• Many printing formats: AFP, Metacode, DJDE, XES (UDK), PostScript, PCL, etc.
• All formats use external resources like fonts, forms, graphics, etc., although sometimes inconsistently
• Most are escape-sequence based, some are formal data architectures, and some are almost programming languages
MHE - Consultants for Document and Datament Technologies
Printing And Imaging Formats
• Many imaging formats - while most used CCITT Group 4 for image compression, most also had proprietary data wrappers
• Later systems adopted text-based formats such as PDF, although storing other print streams is not unknown
• Systems which store text-based formats must wrestle with resource issues
MHE - Consultants for Document and Datament Technologies
Different Print Formats• Why do printers have different formats?
Because of physical constraints imposed by the hardware:– resources reduce the amount of data sent through
pipeline to printer– pages must be imaged in less than a fraction of a
second– complex graphics can be developed on the
printer, but this needs a special language
MHE - Consultants for Document and Datament Technologies
Different Imaging Formats• Why do imaging systems have different
formats: because of physical constraints imposed by the hardware:– Mass storage was expensive
– Indexing schemes were too close to the application
– Text is avoided sometimes because of resource issues
– Interoperability with other products an issue
MHE - Consultants for Document and Datament Technologies
Result
• In each case, data architecture decisions were made in order to enhance some aspect of legibility of the stored objects.
• If there were no requirement to present the information (to a human reader), then the requirement for custom data formats for each vendor would probably disappear!
MHE - Consultants for Document and Datament Technologies
Universal Literacy
Who’s reading our documents?
MHE - Consultants for Document and Datament Technologies
The Road To Universal Literacy
• First, only the few could read
• After the printing press, the many began to read
• Eventually, educational reforms brought the ability to read to all
MHE - Consultants for Document and Datament Technologies
Literacy In The Internet Age
• Can there be a spread of literacy beyond “all”?
• How many webpages have you ever read?
• You will never be able to keep up with the Web – alone
MHE - Consultants for Document and Datament Technologies
Intelligent Agents
• Just around the corner is software that will read the Web for us – not search, but read
• So we have to spread literacy to an audience beyond “all” – people, that is
• Does increased quality in presentation mean better computer literacy?
MHE - Consultants for Document and Datament Technologies
Noise On The Net• Think of the average webpage:
three dimensional spinning objects marquees scrolling across the bottom multiple frames bookmarks audio
• These items are all designed to attract the eye – your eye
• This does nothing for the machine reading the webpage
MHE - Consultants for Document and Datament Technologies
The Cost Of Data Differences“NASA lost a $125
million Mars orbiter because one engineering team used metric units while another used English units for a key spacecraft operation...” CNN 9/30/99
MHE - Consultants for Document and Datament Technologies
The Nature Of XML
MHE - Consultants for Document and Datament Technologies
XML And SGML
• XML is eXtensible Markup Language
• XML is an instance of SGML, Standard Generalized Markup Language, an ISO standard (ISO 8879)
• XML is “extensible” because people and enterprises with common interests get together to define the tags which describe their data
MHE - Consultants for Document and Datament Technologies
XML And Print Formats
• In most print formats, something like account number would be:– AMB 200 AMI 300 SCFL 01 STO 0, 90 TRN
12345-67890
• In XML, the same information is:– <account_number>12345-67890
MHE - Consultants for Document and Datament Technologies
XML And Print Formats
• The nature of all print formats is to be focused on the presentation of the information.
• The nature of XML is focused on the “author’s content”, that is, information is described as what it is, not how it looks.
MHE - Consultants for Document and Datament Technologies
Why XML Over Print?
• Given that print formats are focused on the presentation, it is often difficult for the non-human reader to derive information out of the print data.
• E.g., we could have:– AMB 200 AMI 300 SCFL 01 STO 0,90 TRN
12345 RMI120 TRN - RMI 24 TRN 67890– Note the data is not required to be contiguous
MHE - Consultants for Document and Datament Technologies
Separating Information From Presentation
• XML enables the total separation of information from presentation
• Thus, some XML objects have only tagged information, while others have content and presentation information
XML
XSL
XML
MHE - Consultants for Document and Datament Technologies
The Four Spaces
MHE - Consultants for Document and Datament Technologies
Dr. Davidson’s DocumentSpace• Dr. Keith Davidson,
EDPP, hypothesized that we work in something called the “DocumentSpace”
• He believes that industries will become spaces under the influence of the Internet
MHE - Consultants for Document and Datament Technologies
Three Spaces• Dr. Davidson stated that there were three
spaces: PrintSpace, MarketSpace, and DecisionSpace
• PrintSpace comprised our existing industry
• MarketSpace covered documents used in financial transactions
• DecisionSpace deals with documents used in knowledge management
MHE - Consultants for Document and Datament Technologies
Three Spaces Become Four
• I have added a fourth space: ArchiveSpace, the use of documents in archival and records management to preserve information
• These four spaces can be viewed as --->
MHE - Consultants for Document and Datament Technologies
The Use Of The Document In The Four Spaces
MHE - Consultants for Document and Datament Technologies
Document And Information• The document is used as a container of
information, particularly in the exchange of information across the boundaries between the four spaces
• Documents are used for two reasons: – (1) The lack of common data standards across the
four spaces, and– (2) The requirement that humans be able to read
and process the information
MHE - Consultants for Document and Datament Technologies
Print To Image
MHE - Consultants for Document and Datament Technologies
Print To Image Format• Print formats are Metacode, DJDE, AFP,
PCL, PostScript, and so on
• Image formats are TIFF, MO:DCA, other proprietary formats using CCITT-4, and PDF
• Only AFP & MODCA, and PostScript & PDF are closely related, but PostScript to PDF requires a transform, and AFP and MO:DCA often aren’t implemented the same
MHE - Consultants for Document and Datament Technologies
Print To Market
MHE - Consultants for Document and Datament Technologies
Print To Market Formats• Print formats are Metacode, DJDE, AFP, PCL,
PostScript, and so on• Financial Interchange formats are OFX/IFX,
XML, and “transaction” data• The significant data must be extracted out of the
print stream to create data for SGML formats - a sometimes hazardous process
• However, using original transaction data may not be correct
MHE - Consultants for Document and Datament Technologies
Print To Knowledge
MHE - Consultants for Document and Datament Technologies
Print To Knowledge Formats• Print formats are Metacode, DJDE, AFP, PCL,
PostScript, and so on• True Knowledge Management does not yet
exist - it’s often blob management• XML and its many related standards will make
KM possible, if you think of KM as something like human knowledge
• As noted, XML out of existing processes can be hazardous
MHE - Consultants for Document and Datament Technologies
The Growth Of The XML Bubble
MHE - Consultants for Document and Datament Technologies
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
MHE - Consultants for Document and Datament Technologies
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
EBPP
MHE - Consultants for Document and Datament Technologies
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
EBPP
MHE - Consultants for Document and Datament Technologies
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
EBPP
MHE - Consultants for Document and Datament Technologies
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
EBPP
MHE - Consultants for Document and Datament Technologies
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
EBPP
MHE - Consultants for Document and Datament Technologies
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
EBPP
MHE - Consultants for Document and Datament Technologies
William J. “Bill” McCalpin
EDPP, CDIA, MIT, LIT
Principal, MHE
1400 Cheyenne Dr.
Richardson, Texas 75080-3921
972-231-3660 (v) 972-690-4521 (f)