dynamic chunking of component-authored information
TRANSCRIPT
Dynamic Chunking of Component-Authored InformationBen Colborn Owen RichterManager, Technical Publications Web Application Architect
2
Converged compute and storage
All intelligence in software
Distributed everything
Self-healing system
Web-scale converged infrastructure
Automation and Rich Analytics
3
Technical publications responsibilities
› Software documentation
› Release documentation
› Hardware documentation
› Support knowledge base
› Education collaboration
› Localization
4
Problem
Ben didn’t like any available options for publishing documentation
5
Monolithic documentation
6
Fragmented documentation
7
Advantages
Monolithic
• Easy to produce
• Familiar for audience
• Portable
Fragmented
• Easy to link• Short page
load time• Familiar for
authors
8
Opportunity
Growing company; development of new support portal
9
Every page is page one
› Every page is a potential entry point› Sometimes hierarchy and sequence are relevant› Often hierarchy and sequence are not relevant› Multiplicity of navigation options is required
10
Information foraging behavior
› Information scent: Users estimate a given hunt’s likely success from … assessing whether their path exhibits cues related to the desired outcome.
› Informavores will keep clicking as long as they sense that they're “getting warmer”—the scent must keep getting stronger and stronger, or people give up.
› Progress must seem rapid enough to be worth the predicted effort required to reach the destination.
› As users drill down the site, … provide feedback about the current location and how it relates to users' tasks.
11
Documentation use cases
1. A new user may want to browse a complete high level document.
2. A developing user may want an intermediate-sized chunk that has subject/sequence affinity.
3. An experienced user may want a small chunk with a particular item of information.
4. A support technician may need to provide a chunk scoped at an intermediate level to a customer so they are not overloaded with too much information, but also not given too little.
12
Document levels
Document
Part
Chapter
Section
Topic
13
DITA gets us halfway there
Authoring and management is done at the topic level
Chunking exists as an approach but
Chunking control is manual Chunks are static
14
Ben’s magical solution
If I had an infinite number of monkeys, I could chunk all topics in all possible combinations
15
Cross-disciplinary thinking to the rescue
› We need a recursive document!› A document is:
1. A title2. A globally unique key (document name + sub document ID)3. A locally unique key (sub document ID)4. A list of tags5. A (recursive) list of documents
› DITA is recursive but none of the existing presentation mechanisms are recursive.
› JSON is a natural way to represent a recursive document.› XSLT is a natural way to generate such a JSON document.
16
JSON generation process
DITA Source HTML JSON
D4P HTML2
Content
Repository
Custom XSLT
File se
rver/
CDN
17
Theoretical document: Complete
Document1. Chapter
1.1 Section2. Chapter
2.1 Section2.1.1 Topic
2.2 Section2.2.1 Topic
3. Chapter
18
Theoretical document: Chunks
1. Chapter1.1 Section
2. Chapter2.1 Section
2.1.1 Topic2.2 Section
2.2.1 Topic
3. Chapter
2.1 Section2.1.1 Topic
2.2 Section2.2.1 Topic
2.1.1 Topic
2.2.1 Topic
1.1 Section
19
DITA to JSON 1: DITAMAP
DocumentProperties
TopicReferences
20
DITA to JSON 2: HTML index
DocumentProperties
TopicReferences
21
DITA to JSON 3: JSON
DocumentProperties
Topic
Topic
22
DITA to JSON 4: Sub-document
Field SourceTitle Topic titleID Topic filenameUnique key Top-level document filename +
topic filenameAncestors List of ancestor topics at all
levelsSummary* Topic shortdescBody Topic bodyHREF Topic path + topic filenameDocuments* List of sub-documents
23
Document Loading Process
Flatten each node Create Unique ID Establish ancestry
Convert relative image and cross
references to absolute links
Create a standalone
document of each node
Load to DB
Load to search index
24
Search
25
Task Topic
Breadcrumb
Scrollbar
26
Chapter
Breadcrumb
Scrollbar
27
Document
BreadcrumbScrollbar
28
TOC
29
Multi-modality
30
DITA output targets
1. PDF: monolithic2. ePUB: monolithic3. HTML: fragmented4. JSON: dynamically chunked
31
Conventions
› Images› All image paths need to be converted to absolute paths. Having all of them in a
flat folder called “images” is one easy way to accomplish this.
› Cross References› Cross reference links within the JSON are all relative. Like images, they need to
be converted to absolute links.
› JSON Tag Recursion› It is tedious to add tags to all levels of the JSON Document, so most tags are
programmatically pulled through to all sub documents. Tags can be overridden in children if desired.
› Permissions – can be set in source› Anchors not supported
› We currently have a single page app making anchors difficult, but somewhat irrelevant since each level is available as an independent link.
32
What’s next?
› More publishing automation› Publishing is currently a 2 step process. JSON Publication followed by document loading.
It would be better to provide a 1 step process controlled by the document publisher.
› Holistic approach› Search cultivation
› Search analytics› Chat› Case Deflection Analysis driving documentation.
› Tag-based navigation
33
Ben is less dissatisfied
Problems solved• Apparently dynamic presentation• Satisfactory context-sensitive help targets• CMS/search loading
Problems not solved• Static transformations
Problems created• Content removal• Proofing• Custom software