the evolving information ecosystem of publishing

21
Evan Owens Chief Information Officer, Publishing American Institute of Physics JATS Conference 2 November 2010 The Evolving Information Ecosystem of Publishing

Upload: jesus

Post on 25-Feb-2016

52 views

Category:

Documents


1 download

DESCRIPTION

The Evolving Information Ecosystem of Publishing. Evan Owens Chief Information Officer, Publishing American Institute of Physics JATS Conference 2 November 2010. This Presentation. The Past & Present Standards The Future New Challenges. The World View in the 1990s. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Evolving Information  Ecosystem of Publishing

Evan OwensChief Information Officer, Publishing American Institute of Physics

JATS Conference 2 November 2010

The Evolving Information Ecosystem of Publishing

Page 2: The Evolving Information  Ecosystem of Publishing

2

This Presentation

The Past & PresentStandards

The FutureNew Challenges

Page 3: The Evolving Information  Ecosystem of Publishing

3

The World View in the 1990s• How to prepare for the electronic publishing future:

– Create a version of record in SGML full text– Make the perfect master file– Prepare to publish simultaneously to print and online

• Multiple outputs was the perceived benefit of SGML• How did you make that happen?

– Write your own DTD – Work with your vendors– Set up SGML-based production processes

A very document-centric viewBut what place did standards have in this picture?

Page 4: The Evolving Information  Ecosystem of Publishing

4

Journal Article StandardsA much cited paper on the history of journal standards:

A Decade of DTDs and SGML in Scholarly Publishing: What Have We learned? Bruce Rosenblum and Irina Golfman, Extreme Markup Languages 2002“The AAP and 12083 DTDs were important projects. They laid the structural foundations for subsequent DTDs used in journal publishing. They did not succeed, however, in their goal of becoming industry-standard DTDs. This goal was not reached because, while these DTDs were generalized for the needs of the industry, they did not meet the specific business requirements of individual organizations within the scholarly publishing community.”

• AAP Serial DTD (Z39.59, 1983 to 1987)• ISO 12083 (ANSI 1988, ISO 1993; last updated 1995)• NLM Tag Suite (v1 2003…v3 2010)• NISO JATS (in progress)

Page 5: The Evolving Information  Ecosystem of Publishing

Standards are Great: Everyone Should Have One!

Page 6: The Evolving Information  Ecosystem of Publishing

6

Standards• Role of standards

– Codify existing practices– Enable new practices or technologies

• Success of standards – Technical value– Business / political

• Must meet real biz needs• Costs must align with benefits

• Conventional wisdom in the 90s:– SGML succeeds best in highly concentrated industries with

strong exchange requirements; e.g., aviation, auto, defense– Scholarly Publishing was a highly fragmented industry

Page 7: The Evolving Information  Ecosystem of Publishing

7

What has Changed in the Ecosystem? • Rise of aggregations• Move away from proprietary delivery platforms • Publishers now managing current and back content

– Early online, current online, digitized back file• Exchange of data has changed business needs

– CrossRef for metadata– Multiple hosting, preservation for full text– Text mining will drive future

• Enormous amounts of content flowing around– Every publishing deal now includes “and also send to X, Y, Z”

Business conditions are now ripe for standardization

Page 8: The Evolving Information  Ecosystem of Publishing

8

Early AdoptersTypesetting service providers saw the need for standards well

before their customers:• Vendor A (1990s) produced content in their internal house

DTD then exported to the customer DTD• Vendor B (various) produced content in the Elsevier DTD

because they could, then exported to the customer DTD • Vendor C (2010) would rather produce content in NLM then

export to the customer’s DTD

• Vendor A (an early adopter) produced all content in SGML/XML workflows and just discarded it if the customer wanted only the PDF returned

Page 9: The Evolving Information  Ecosystem of Publishing

9

Why Adopt NLM / JATS Now? Preaching to the choir . . .• Delivery platform requirement• Business need for compatibility • Leverage the experience in the design• Concentrate on your specific customizations

– Rather than reinventing the wheel• Good documentation

University of Chicago Press moved to NLM when it moved to a shared delivery platform

AIP will moved to JATS in 2011

Page 10: The Evolving Information  Ecosystem of Publishing

10

Where are We Now?• Is the battle over? • Every problem solved?• Just implement NLM / JATS and all your publishing problems

will be solved?

We may have won this battle, but the real challenges of truly digital publishing are just starting to appear. For the first decade, online journal publishing was like old wine in new bottles; now we are seeing real innovations.

Page 11: The Evolving Information  Ecosystem of Publishing

11

SIDEBAR: Books versus Journals• Strong metadata exchange needs (e.g. Amazon)

– Strong standards and groups• Came later to online and electronic publishing• E-Book readers are intrinsically different:

– External to publisher’s platform– Forces standards conformance

• EPUB standard– Focus was packaging rather than text structuring– But is evolving quickly

A different ecosystem, but the boundaries are beginning to blurPerhaps we (books and journals) will meet in the middle?

Page 12: The Evolving Information  Ecosystem of Publishing

12

The Future

Page 13: The Evolving Information  Ecosystem of Publishing

13

Current and Future Trends in Journal Publishing• Articles, not issues• Rapid publication with limited prepress• Multimedia and “supplemental” stuff• Multiple “manifestations” and “expressions”

– HTML, PDF, app, reader– Article, Podcast

• Revisions (?)• Comments, annotations, blogs• Magazine-like features• Semantics, text mining• Information, not articles

Page 14: The Evolving Information  Ecosystem of Publishing

14

Ecosystem: The XML InstanceWe have come a long way!• Mechanics are easier

– Unicode, MathML, table models, etc. • Managing the structure of the content

– Much of this conference– XML Versioning Workshop at Balisage 2008

• Managing the instances– Version & validation checking

But the journal publishing world is becoming less static, less document-centric . . . and a lot more complicated!

Page 15: The Evolving Information  Ecosystem of Publishing

15

Ecosystem: Content and Metadata• The XML instance as pseudo-database:

<article copyeditor=“XYZ” maildate=“00/00/00”> • What metadata goes inside and what lives outside?

– Descriptive (bibliographic) – Provenance (process history)– Structural (components)– Technical (formats, versions)

• Is the XML instance just a piece of a larger system?– How does it fit into a larger information architecture?– Is the XML instance where this information should live?

• An implementation / design decision

Page 16: The Evolving Information  Ecosystem of Publishing

16

Ecosystem: Reference Linking• Connecting XML documents to external resources• Do we rewrite the XML or externalize the links?

– An implementation question only? • ApJ, NASA ADS, bibcodes

– Linking identifiers that could be pre-calculated– Resolution could be added afterwards

• CrossRef and DOI linking – Backfill problem: early or late binding– Dynamic resolution solutions ; e.g., Elsevier, AIP– Externalizes big parts of the document

Page 17: The Evolving Information  Ecosystem of Publishing

17

Ecosystem: Semantic Enrichment• An old-school example: updating classification schemes

– Do you update the instances retroactively?• Some approaches to semantic enrichment:

– Known entity identification– Generic entity extraction

• Resolution/identification done later– Inline markup; e.g., <named-content>

• Entities are known in advance– Completely externalized solutions

• In a separate delivery system or repository• In a search engine or XML database, not in the content

Page 18: The Evolving Information  Ecosystem of Publishing

18

Ecosystem: Identity Management• ORCID (Open Research Contributor ID)

– Logistical issues:• Known in advance or applied retroactively?• Future publications and/or historical?• Store in article instances or an external layer?

• Larger identity management issues:– Bibliographic identity– Business identity (author, reviewer, subscriber, etc.)– Community identity (ORCID, social networking, etc.)

• Another potential use of layered information architectures– Feels like an RDF kind of problem!

Page 19: The Evolving Information  Ecosystem of Publishing

19

Some Things to Think About• Content management strategy

– Standards, standards, standards– Versioning, formats, validation, necessary metadata

• Information lifecycle should inform everything– Not just publish once and we’re done– Formats change, needs change, even content changes

• Content is going to come at us from many directions– User-contributed, not just the formal publishing process

• Information architecture strategy – Think beyond just fixed documents– Plan for interactions with external systems

Page 20: The Evolving Information  Ecosystem of Publishing

20

NLM’s Contribution to Our Industry

Page 21: The Evolving Information  Ecosystem of Publishing

Evan OwensChief Information Officer, Publishing American Institute of [email protected]

Questions? Comments?