from xml to ebooks part 2: the details
DESCRIPTION
LavaCon 2012 presentation about creating eBooks from DocBook XML. This presentation provides details of the XML Press process for creating eBooks. A companion presentation (From XML to eBooks Part 2: Overview) is an introduction.TRANSCRIPT
Slight Recap
For most tech comm situations:●Two formats matter: ePub & Kindle●XML processes (esp., Docbook or DITA) will make things much easier
●Content strategy is the hardest part●Authoring is next hardest●Production is tough, but doable●Distribution is easiest
Overview
●Authoring●Storing and managing content●Producing output
Content Strategy is critical,but not for this presentation
AuthoringAuthoring formats at XML Press:●DocBook XML: 5 books●DITA XML: 2 books (so far)●Word: 4 books●Wiki (Confluence): 1 book●Wiki (pbworks): 3 books●Author-it: 1 book● InDesign: 1 book
All but 3 (1 each in Word, InDesign, & Author-it)were ultimately produced from XML
Authoring in a Wiki
●Based on PBWorks●Authors, editor, reviewers, indexers, work in wiki●Parallel access throughout most of the process●Content exported for proofs as needed●Content moved to SVN for final production
Requires a clean, clear breaking pointwhere content moves from wiki to SVN
HTML to XHTML
Convert Cleanup
Tidy
Pre-process XHTML
Remove empty elements
Normalize Handle headings
XSL Stylesheet
Convert to DocBook
Infer hierarchy Convert Define structure
Herold
Process Supplemental Markup
Index entries Footnotes Endnotes Sidebars Epigraphs Block quotes Convert all to
DocBook
Perl script
Supplemental MarkupIndexing: {in primary; secondary; tertiary} {id term 1; term 2} {is id; primary; secondary; tertiary} {ie id} {is term; see term}
Footnotes, sidebars, etc. {if footnote text} {ib sidebar text} {ip epigraph text;;attribution;;source} {it endnote text} {iq quotation;;attribution}
Cleanup
Handle links Validate structure
XSL stylesheet
What about Confluence?
Confluence, Tech Comm, Chocolate used K15t Software's DocBook export plugin, which
also handles much of what the supplemental markup handles.
Storing and managing content
Content has one home, but...●That home can change at certain well-defined points
●For XML, SVN is the home●For wiki, the wiki is the home until production, then SVN is the home
●Home changes once, irrevocably●All production comes from SVN
ePub Structure
mimetypeApplication/epub_zip
Top Level Directory
META-INFOEBPS
container.xmlIdentifies this as an ePub file
Points to package file inOEBPS folder.
(folder)(folder)
(file)
(file)
(next page)
ePub file is simply a zip file of thisstructure, with mimetype as firstfile in the zip. Uses .epub suffix.
Ebook production - DocBookOEBPS Directory Contents
xyz.css
OEBPS
CSS fileHTML TOC
(folder)
This folder is like any website
ch01-toc.xhtml
ch01.xhtmlch01s02.xhtmlch01s03.xhtml…chXX.xhtml
HTML ContentMedia
figure.jpgscreen.png
...
Notes:●Names are arbitrary●Sub-folders ok
toc.ncxpackage.opf
Navigation file
OPF file
NCX View in Kindle
Button for NCX viewin emulator
Ebook production - DocBookOEBPS Directory Contents
xyz.css
OEBPS
CSS fileHTML TOC
(folder)
This folder is like any website
ch01-toc.xhtml
ch01.xhtmlch01s02.xhtmlch01s03.xhtml…chXX.xhtml
HTML ContentMedia
figure.jpgscreen.png
...
Notes:●Names are arbitrary●Sub-folders ok
toc.ncxpackage.opf
Navigation file
OPF file
OPF (Open Packaging Format)<package ...> <metadata ...> … Dublin Core Metadata elements … </metadata> <manifest> <item id=”ncx” media-type=”application/x-dtbncx+xml” href=”toc.ncx”/> <item id=”toc” media-type=”application/xhtml+xml” href=”ch01-toc.xhtml”/> <item id=”ch01” media-type=”application/xhtml+xml” href=”ch01-toc.xhtml”/> … </manifest> <spine toc=”ncx”> <itemref idref=”cover”/> <itemref idref=”toc”/> … </spine> <guide> <reference type=”text” title=”Startup page” href=”ch01.xhtml”/> </reference> </guide></package>
}}
What's inthe ePub?
What orderis it in?
Where doyou start?}
} Metadata
Change starting place
Other tweaks to XHTML
●Remove empty paragraphs (vestige of wiki past)●Remove <p> around first para after an <li> (for original Kindle)
●Work around a few epubcheck anomalies
ePub/Kindle from DocBook
●Based on open-source DocBook stylesheets●ePub3 transform by Bob Stayton●CSS added●A few minor tweaks for personal preference●Kindle (.mobi) produced using kindlegen●Amazon tests .mobi and converts to smaller file
Generating ePub from DocBook
ePub3 transform Based on HTML5
transformGenerates all ePub3 files
DocBook XSL
Generating ePub from DocBook
Adjust .opf file Clean up XHTML
File cleanup
Generating ePub from DocBook
Copy images Copy in CSS file Run zip to
create .epub file
File preparation
ePub/Kindle from DITA
●Based on DITA Open Toolkit and DITA for Publishers toolkit extensions (developed by Eliot Kimber)
●Does not require content to use DITA for Publishers specialization.
●Generates ePub2 compliant files●Kindle (.mobi) produced using kindlegen●Amazon tests .mobi and converts to smaller file