from xml to ebooks part 2: the details

From XML to eBooksPart II: The Devil is in the Details

Richard HamiltonXML Press

[email protected]

Slight Recap

For most tech comm situations:●Two formats matter: ePub & Kindle●XML processes (esp., Docbook or DITA) will make things much easier

●Content strategy is the hardest part●Authoring is next hardest●Production is tough, but doable●Distribution is easiest

Overview

●Authoring●Storing and managing content●Producing output

Content Strategy is critical,but not for this presentation

AuthoringAuthoring formats at XML Press:●DocBook XML: 5 books●DITA XML: 2 books (so far)●Word: 4 books●Wiki (Confluence): 1 book●Wiki (pbworks): 3 books●Author-it: 1 book● InDesign: 1 book

All but 3 (1 each in Word, InDesign, & Author-it)were ultimately produced from XML

Authoring in a Wiki

●Based on PBWorks●Authors, editor, reviewers, indexers, work in wiki●Parallel access throughout most of the process●Content exported for proofs as needed●Content moved to SVN for final production

Requires a clean, clear breaking pointwhere content moves from wiki to SVN

HTML to XHTML

Convert Cleanup

Tidy

Pre-process XHTML

Remove empty elements

Normalize Handle headings

XSL Stylesheet

Convert to DocBook

Infer hierarchy Convert Define structure

Herold

Process Supplemental Markup

Index entries Footnotes Endnotes Sidebars Epigraphs Block quotes Convert all to

DocBook

Perl script

Supplemental MarkupIndexing: {in primary; secondary; tertiary} {id term 1; term 2} {is id; primary; secondary; tertiary} {ie id} {is term; see term}

Footnotes, sidebars, etc. {if footnote text} {ib sidebar text} {ip epigraph text;;attribution;;source} {it endnote text} {iq quotation;;attribution}

Cleanup

Handle links Validate structure

XSL stylesheet

What about Confluence?

Confluence, Tech Comm, Chocolate used K15t Software's DocBook export plugin, which

also handles much of what the supplemental markup handles.

Storing and managing content

Content has one home, but...●That home can change at certain well-defined points

●For XML, SVN is the home●For wiki, the wiki is the home until production, then SVN is the home

●Home changes once, irrevocably●All production comes from SVN

ePub Structure

mimetypeApplication/epub_zip

Top Level Directory

META-INFOEBPS

container.xmlIdentifies this as an ePub file

Points to package file inOEBPS folder.

(folder)(folder)

(file)

(file)

(next page)

ePub file is simply a zip file of thisstructure, with mimetype as firstfile in the zip. Uses .epub suffix.

Ebook production - DocBookOEBPS Directory Contents

xyz.css

OEBPS

CSS fileHTML TOC

(folder)

This folder is like any website

ch01-toc.xhtml

ch01.xhtmlch01s02.xhtmlch01s03.xhtml…chXX.xhtml

HTML ContentMedia

figure.jpgscreen.png

...

Notes:●Names are arbitrary●Sub-folders ok

toc.ncxpackage.opf

Navigation file

OPF file

NCX View in Kindle

Button for NCX viewin emulator

Ebook production - DocBookOEBPS Directory Contents

xyz.css

OEBPS

CSS fileHTML TOC

(folder)

This folder is like any website

ch01-toc.xhtml

ch01.xhtmlch01s02.xhtmlch01s03.xhtml…chXX.xhtml

HTML ContentMedia

figure.jpgscreen.png

...

Notes:●Names are arbitrary●Sub-folders ok

toc.ncxpackage.opf

Navigation file

OPF file

OPF (Open Packaging Format)<package ...> <metadata ...> … Dublin Core Metadata elements … </metadata> <manifest> <item id=”ncx” media-type=”application/x-dtbncx+xml” href=”toc.ncx”/> <item id=”toc” media-type=”application/xhtml+xml” href=”ch01-toc.xhtml”/> <item id=”ch01” media-type=”application/xhtml+xml” href=”ch01-toc.xhtml”/> … </manifest> <spine toc=”ncx”> <itemref idref=”cover”/> <itemref idref=”toc”/> … </spine> <guide> <reference type=”text” title=”Startup page” href=”ch01.xhtml”/> </reference> </guide></package>

}}

What's inthe ePub?

What orderis it in?

Where doyou start?}

} Metadata

Change starting place

Other tweaks to XHTML

●Remove empty paragraphs (vestige of wiki past)●Remove <p> around first para after an <li> (for original Kindle)

●Work around a few epubcheck anomalies

ePub/Kindle from DocBook

●Based on open-source DocBook stylesheets●ePub3 transform by Bob Stayton●CSS added●A few minor tweaks for personal preference●Kindle (.mobi) produced using kindlegen●Amazon tests .mobi and converts to smaller file

Generating ePub from DocBook

ePub3 transform Based on HTML5

transformGenerates all ePub3 files

DocBook XSL


Adjust .opf file Clean up XHTML

File cleanup


Copy images Copy in CSS file Run zip to

create .epub file

File preparation

ePub/Kindle from DITA

●Based on DITA Open Toolkit and DITA for Publishers toolkit extensions (developed by Eliot Kimber)

●Does not require content to use DITA for Publishers specialization.

●Generates ePub2 compliant files●Kindle (.mobi) produced using kindlegen●Amazon tests .mobi and converts to smaller file

Thanks for listening

Richard HamiltonXML Press

[email protected]

from xml to ebooks part 2: the details

Technology