from xml to ebooks part 2: the details

27
From XML to eBooks Part II: The Devil is in the Details Richard Hamilton XML Press [email protected]

Upload: richard-hamilton

Post on 06-May-2015

442 views

Category:

Technology


0 download

DESCRIPTION

LavaCon 2012 presentation about creating eBooks from DocBook XML. This presentation provides details of the XML Press process for creating eBooks. A companion presentation (From XML to eBooks Part 2: Overview) is an introduction.

TRANSCRIPT

Page 1: From XML to eBooks Part 2: The Details

From XML to eBooksPart II: The Devil is in the Details

Richard HamiltonXML Press

[email protected]

Page 2: From XML to eBooks Part 2: The Details

Slight Recap

For most tech comm situations:●Two formats matter: ePub & Kindle●XML processes (esp., Docbook or DITA) will make things much easier

●Content strategy is the hardest part●Authoring is next hardest●Production is tough, but doable●Distribution is easiest

Page 3: From XML to eBooks Part 2: The Details

Overview

●Authoring●Storing and managing content●Producing output

Content Strategy is critical,but not for this presentation

Page 4: From XML to eBooks Part 2: The Details

AuthoringAuthoring formats at XML Press:●DocBook XML: 5 books●DITA XML: 2 books (so far)●Word: 4 books●Wiki (Confluence): 1 book●Wiki (pbworks): 3 books●Author-it: 1 book● InDesign: 1 book

All but 3 (1 each in Word, InDesign, & Author-it)were ultimately produced from XML

Page 5: From XML to eBooks Part 2: The Details

Authoring in a Wiki

●Based on PBWorks●Authors, editor, reviewers, indexers, work in wiki●Parallel access throughout most of the process●Content exported for proofs as needed●Content moved to SVN for final production

Requires a clean, clear breaking pointwhere content moves from wiki to SVN

Page 6: From XML to eBooks Part 2: The Details
Page 7: From XML to eBooks Part 2: The Details
Page 8: From XML to eBooks Part 2: The Details

HTML to XHTML

Convert Cleanup

Tidy

Page 9: From XML to eBooks Part 2: The Details

Pre-process XHTML

Remove empty elements

Normalize Handle headings

XSL Stylesheet

Page 10: From XML to eBooks Part 2: The Details

Convert to DocBook

Infer hierarchy Convert Define structure

Herold

Page 11: From XML to eBooks Part 2: The Details

Process Supplemental Markup

Index entries Footnotes Endnotes Sidebars Epigraphs Block quotes Convert all to

DocBook

Perl script

Page 12: From XML to eBooks Part 2: The Details

Supplemental MarkupIndexing: {in primary; secondary; tertiary} {id term 1; term 2} {is id; primary; secondary; tertiary} {ie id} {is term; see term}

Footnotes, sidebars, etc. {if footnote text} {ib sidebar text} {ip epigraph text;;attribution;;source} {it endnote text} {iq quotation;;attribution}

Page 13: From XML to eBooks Part 2: The Details

Cleanup

Handle links Validate structure

XSL stylesheet

Page 14: From XML to eBooks Part 2: The Details

What about Confluence?

Confluence, Tech Comm, Chocolate used K15t Software's DocBook export plugin, which

also handles much of what the supplemental markup handles.

Page 15: From XML to eBooks Part 2: The Details

Storing and managing content

Content has one home, but...●That home can change at certain well-defined points

●For XML, SVN is the home●For wiki, the wiki is the home until production, then SVN is the home

●Home changes once, irrevocably●All production comes from SVN

Page 16: From XML to eBooks Part 2: The Details

ePub Structure

mimetypeApplication/epub_zip

Top Level Directory

META-INFOEBPS

container.xmlIdentifies this as an ePub file

Points to package file inOEBPS folder.

(folder)(folder)

(file)

(file)

(next page)

ePub file is simply a zip file of thisstructure, with mimetype as firstfile in the zip. Uses .epub suffix.

Page 17: From XML to eBooks Part 2: The Details

Ebook production - DocBookOEBPS Directory Contents

xyz.css

OEBPS

CSS fileHTML TOC

(folder)

This folder is like any website

ch01-toc.xhtml

ch01.xhtmlch01s02.xhtmlch01s03.xhtml…chXX.xhtml

HTML ContentMedia

figure.jpgscreen.png

...

Notes:●Names are arbitrary●Sub-folders ok

toc.ncxpackage.opf

Navigation file

OPF file

Page 18: From XML to eBooks Part 2: The Details

NCX View in Kindle

Button for NCX viewin emulator

Page 19: From XML to eBooks Part 2: The Details

Ebook production - DocBookOEBPS Directory Contents

xyz.css

OEBPS

CSS fileHTML TOC

(folder)

This folder is like any website

ch01-toc.xhtml

ch01.xhtmlch01s02.xhtmlch01s03.xhtml…chXX.xhtml

HTML ContentMedia

figure.jpgscreen.png

...

Notes:●Names are arbitrary●Sub-folders ok

toc.ncxpackage.opf

Navigation file

OPF file

Page 20: From XML to eBooks Part 2: The Details

OPF (Open Packaging Format)<package ...> <metadata ...> … Dublin Core Metadata elements … </metadata> <manifest> <item id=”ncx” media-type=”application/x-dtbncx+xml” href=”toc.ncx”/> <item id=”toc” media-type=”application/xhtml+xml” href=”ch01-toc.xhtml”/> <item id=”ch01” media-type=”application/xhtml+xml” href=”ch01-toc.xhtml”/> … </manifest> <spine toc=”ncx”> <itemref idref=”cover”/> <itemref idref=”toc”/> … </spine> <guide> <reference type=”text” title=”Startup page” href=”ch01.xhtml”/> </reference> </guide></package>

}}

What's inthe ePub?

What orderis it in?

Where doyou start?}

} Metadata

Change starting place

Page 21: From XML to eBooks Part 2: The Details

Other tweaks to XHTML

●Remove empty paragraphs (vestige of wiki past)●Remove <p> around first para after an <li> (for original Kindle)

●Work around a few epubcheck anomalies

Page 22: From XML to eBooks Part 2: The Details

ePub/Kindle from DocBook

●Based on open-source DocBook stylesheets●ePub3 transform by Bob Stayton●CSS added●A few minor tweaks for personal preference●Kindle (.mobi) produced using kindlegen●Amazon tests .mobi and converts to smaller file

Page 23: From XML to eBooks Part 2: The Details

Generating ePub from DocBook

ePub3 transform Based on HTML5

transformGenerates all ePub3 files

DocBook XSL

Page 24: From XML to eBooks Part 2: The Details

Generating ePub from DocBook

Adjust .opf file Clean up XHTML

File cleanup

Page 25: From XML to eBooks Part 2: The Details

Generating ePub from DocBook

Copy images Copy in CSS file Run zip to

create .epub file

File preparation

Page 26: From XML to eBooks Part 2: The Details

ePub/Kindle from DITA

●Based on DITA Open Toolkit and DITA for Publishers toolkit extensions (developed by Eliot Kimber)

●Does not require content to use DITA for Publishers specialization.

●Generates ePub2 compliant files●Kindle (.mobi) produced using kindlegen●Amazon tests .mobi and converts to smaller file

Page 27: From XML to eBooks Part 2: The Details

Thanks for listening

Richard HamiltonXML Press

[email protected]