dynamic chunking of component-authored information

Dynamic Chunking of Component-Authored InformationBen Colborn Owen RichterManager, Technical Publications Web Application Architect

2

Converged compute and storage

All intelligence in software

Distributed everything

Self-healing system

Web-scale converged infrastructure

Automation and Rich Analytics

3

Technical publications responsibilities

› Software documentation

› Release documentation

› Hardware documentation

› Support knowledge base

› Education collaboration

› Localization

4

Problem

Ben didn’t like any available options for publishing documentation

5

Monolithic documentation

6

Fragmented documentation

7

Advantages

Monolithic

• Easy to produce

• Familiar for audience

• Portable

Fragmented

• Easy to link• Short page

load time• Familiar for

authors

8

Opportunity

Growing company; development of new support portal

9

Every page is page one

› Every page is a potential entry point› Sometimes hierarchy and sequence are relevant› Often hierarchy and sequence are not relevant› Multiplicity of navigation options is required

10

Information foraging behavior

› Information scent: Users estimate a given hunt’s likely success from … assessing whether their path exhibits cues related to the desired outcome.

› Informavores will keep clicking as long as they sense that they're “getting warmer”—the scent must keep getting stronger and stronger, or people give up.

› Progress must seem rapid enough to be worth the predicted effort required to reach the destination.

› As users drill down the site, … provide feedback about the current location and how it relates to users' tasks.

11

Documentation use cases

1. A new user may want to browse a complete high level document.

2. A developing user may want an intermediate-sized chunk that has subject/sequence affinity.

3. An experienced user may want a small chunk with a particular item of information.

4. A support technician may need to provide a chunk scoped at an intermediate level to a customer so they are not overloaded with too much information, but also not given too little.

12

Document levels

Document

Part

Chapter

Section

Topic

13

DITA gets us halfway there

Authoring and management is done at the topic level

Chunking exists as an approach but

Chunking control is manual Chunks are static

14

Ben’s magical solution

If I had an infinite number of monkeys, I could chunk all topics in all possible combinations

15

Cross-disciplinary thinking to the rescue

› We need a recursive document!› A document is:

1. A title2. A globally unique key (document name + sub document ID)3. A locally unique key (sub document ID)4. A list of tags5. A (recursive) list of documents

› DITA is recursive but none of the existing presentation mechanisms are recursive.

› JSON is a natural way to represent a recursive document.› XSLT is a natural way to generate such a JSON document.

16

JSON generation process

DITA Source HTML JSON

D4P HTML2

Content

Repository

Custom XSLT

File se

rver/

CDN

17

Theoretical document: Complete

Document1. Chapter

1.1 Section2. Chapter

2.1 Section2.1.1 Topic


3. Chapter

18

Theoretical document: Chunks

1. Chapter1.1 Section

2. Chapter2.1 Section

2.1.1 Topic2.2 Section

2.2.1 Topic

3. Chapter



2.1.1 Topic

2.2.1 Topic

1.1 Section

19

DITA to JSON 1: DITAMAP

DocumentProperties

TopicReferences

20

DITA to JSON 2: HTML index

DocumentProperties

TopicReferences

21

DITA to JSON 3: JSON

DocumentProperties

Topic

Topic

22

DITA to JSON 4: Sub-document

Field SourceTitle Topic titleID Topic filenameUnique key Top-level document filename +

topic filenameAncestors List of ancestor topics at all

levelsSummary* Topic shortdescBody Topic bodyHREF Topic path + topic filenameDocuments* List of sub-documents

23

Document Loading Process

Flatten each node Create Unique ID Establish ancestry

Convert relative image and cross

references to absolute links

Create a standalone

document of each node

Load to DB

Load to search index

24

Search

25

Task Topic

Breadcrumb

Scrollbar

26

Chapter

Breadcrumb

Scrollbar

27

Document

BreadcrumbScrollbar

28

TOC

29

Multi-modality

30

DITA output targets

1. PDF: monolithic2. ePUB: monolithic3. HTML: fragmented4. JSON: dynamically chunked

31

Conventions

› Images› All image paths need to be converted to absolute paths. Having all of them in a

flat folder called “images” is one easy way to accomplish this.

› Cross References› Cross reference links within the JSON are all relative. Like images, they need to

be converted to absolute links.

› JSON Tag Recursion› It is tedious to add tags to all levels of the JSON Document, so most tags are

programmatically pulled through to all sub documents. Tags can be overridden in children if desired.

› Permissions – can be set in source› Anchors not supported

› We currently have a single page app making anchors difficult, but somewhat irrelevant since each level is available as an independent link.

32

What’s next?

› More publishing automation› Publishing is currently a 2 step process. JSON Publication followed by document loading.

It would be better to provide a 1 step process controlled by the document publisher.

› Holistic approach› Search cultivation

› Search analytics› Chat› Case Deflection Analysis driving documentation.

› Tag-based navigation

33

Ben is less dissatisfied

Problems solved• Apparently dynamic presentation• Satisfactory context-sensitive help targets• CMS/search loading

Problems not solved• Static transformations

Problems created• Content removal• Proofing• Custom software

dynamic chunking of component-authored information

Internet