improving the output capabilities of stata with open document format xml adam jacobs dianthus...
TRANSCRIPT
Improving the outputcapabilities of Stata with
Open Document Format xml
Adam Jacobs
Dianthus Medical Limited
Stata’s 3-fold capabilities
Statistics
Graphics
Data management
Statistics
Graphics
Data management
But there is a 4th...
Text output
A recent clinical study:– 92 pages of raw data listings– 124 pages of descriptive data tabulations– 3 pages of statistical analysis
All from a study in 12 healthy volunteers
Stata’s text output
Problems with Stata’s text output
No pagination
No formatting (or limited formatting with smcl)
Variable labels not always shown
No Unicode support
No tables of contents
etc etc
Some examples...
So how did I do it?
Open Document Format
An open standard, approved by ISO
XML based
For a variety of office-type documents
Used by the popular open-source office suite OpenOffice.org
Here, we are just interested in word-processing documents
.odt files
A .odt file is the native file format of OpenOffice.org Writer
A zip file
Contains various files, the most important of which is content.xml
content.xml is simply a plain-text file
Stata is good at writing plain-text files!
The Stata code
Creates the content.xml file by writing data with appropriate xml tags
Added to other files, zipped to .odt file
.odt file can be opened directly with Writer
Some examples...
Basics of XML
<company name=“Dianthus Medical Limited”><employee role=“speaker”>
<firstname>Adam</firstname><lastname>Jacobs</lastname>
</employee><employee role=“delegate”>
<firstname>Flavia</firstname><lastname>White</lastname>
</employee></company>
XML code for start of table
<table:table table:style-name="Table42">
<table:table-column table:style-name="TabCol13"/>
<table:table-column table:style-name="TabCol9"/>
<table:table-column table:style-name="TabCol8"/>
<table:table-column table:style-name="TabCol8"/>
XML code for table cells
<table:table-cell table:style-name="cell1211"><text:p text:style-name="Table_20_Contents">
Mileage (mpg)</text:p></table:table-cell><table:table-cell table:style-name="cell1111">
<text:p text:style-name="Table_20_Contents">N</text:p></table:table-cell><table:table-cell table:style-name="cell1111"> <text:p text:style-name= "Table_20_ContentsNumeric">
52<text:s text:c="3"/></text:p></table:table-cell>
Was this a lot of work?
123 kB of code
21 ado files
45 Mata functions
And not finished yet!
Any questions?