tech 802: data, databases & xml

39
Data, Databases & XML A Crash Course. Monique Sherre8 monique@boxcarmarke>ng.com

Upload: somisguided

Post on 12-Jan-2015

172 views

Category:

Documents


2 download

DESCRIPTION

Monday, January 14, 2012 presentation on 3 different data types (unstructured, structured and semi-structured) and how xml plays a role in content management systems, onix (bibliographic data sharing), RSS (real simple syndication) and xml-first publishing for ebooks.

TRANSCRIPT

Page 1: Tech 802: Data, Databases & XML

Data,  Databases  &  XMLA  Crash  Course.    

Monique  Sherre8monique@boxcarmarke>ng.com

Page 2: Tech 802: Data, Databases & XML

3  Types  of  DataUnstructured  Data• eg.  Word  documents,  PDFs,  audio/video  files,  emails,  • No  search• No  version  controlStructured  Data• eg.  Inventory  management  database,  wordpress• Searchable• Version  and  user  control  (secure  access)• Rela>onship  structures  (show  everything  tagged  “winter”)• Import  /  Export• Display  op>ons• Machine  readable;  run  queries  against  the  dataSemi-­‐Structured  Data• eg.  xml  (html,  onix,  rss)  • formal/standardized  data  

2

Page 3: Tech 802: Data, Databases & XML

Structured  Data:  Wordpress• Open  Source  content  management  system  based  on  PHP  and  MySQL

– Open  Source:  source  code  is  freely  available,  which  encourages  development  by  many  independent  programmers.  

– CMS:  a  database  +  presenta>on  layer  (set  of  templates)– MySQL:  a  type  of  database

– PHP:  a  scrip>ng  language  designed  to  produce  dynamic  web  pages

• Plugin  architecture  (Akismet  for  spam,  SEO  by  Yoast,  WP  to  Twi8er,  etc.)

• Pages  &  Posts

• Categories  &  Tags

3

Page 4: Tech 802: Data, Databases & XML

Pages  vs  PostsPage  (~unstructured)

• Sta>c  content,  won’t  change  frequently

• eg.  About  page

• Can  be  organized  manually  a  hierarchy.  Page  (parent)  and  subpages  (child)

– About  Us  >  Team;  About  Us  >  History

Post  (~structured)

• Frequently  updated  content  dynamically  organized  in  a  hierarchy  (chronological,  category),  plus  archive

– News  ar>cles,  Event  informa>on

– Frequently  published  in  an  RSS  feed  that  is  subscribed  to  by  users

4

Page 5: Tech 802: Data, Databases & XML

Semi-­‐Structured  Data:  RSS• Real  Simple  Syndica>on  or  Rich  Site  Summary

• Publish  it.  Subscribe  to  it.  Pull  it  into  other  websites.  

• RSS  is  a  standardized  XML  file  format.

5

Page 6: Tech 802: Data, Databases & XML

WordPress  As  Database• Instead  of  a  series  of  HTML  files,  WordPress  offers  a  system  that  allows  for  the  

organiza>on  and  efficient  storage  &  retrieval  of  informa>on.

– Structured  data  can  be  exported  into  semi-­‐structured  data  (RSS,  XML)

6

Page 7: Tech 802: Data, Databases & XML

RSS  is  XML• eXtensible  Markup  Language  (XML)  is  a  markup  language  that  defines  a  set  of  rules  

for  encoding  documents  in  a  format  that  is  machine-­‐  and  human-­‐readable.

• RSS,  XHTML  (unzipped  EPUB)  and  ONIX  (ONline  Informa>on  eXchange—standard  for  sharing  bibliographic  data)  are  some  of  the  100s  of  XML-­‐based  languages  that  have  been  developed.

• How  might  we  use  XML  for  the  Tech  Project?  

7

Page 8: Tech 802: Data, Databases & XML

8

Current db

New db

Export to XML

Rename / Modify XML

Import from XML

Page 9: Tech 802: Data, Databases & XML

9

Page 10: Tech 802: Data, Databases & XML

ONIX  is  XML• Interna>onal  standard  for  represen>ng  and  communica>ng  book  and  product  info  

in  electronic  form

– text-­‐readable  (human  &  computer)

– tagged/markup– transferred  by  email  or  rp  (file  transfer  protocol)

– More  info  Bisg.org

10

Page 11: Tech 802: Data, Databases & XML

11

Publisher db

Bookseller db

Export to ONIX & FTP file to

Server

Grab file from Server & Import

from ONIX

Server

Page 12: Tech 802: Data, Databases & XML

12

Publisher db

Bookseller db

Export to ONIX & FTP file to

Server

Grab file from Server & Import

from ONIX

Server

Page 13: Tech 802: Data, Databases & XML

EDI:  Electronic  Data  Interchange• structured  (db  to  db)  transmission  of  data

• Oren  XML  tagged  format

13

Sour

ce

Page 14: Tech 802: Data, Databases & XML

Ques>ons  on  XML?

• Data,  database  ques>ons?• Tech  project?

14

Page 15: Tech 802: Data, Databases & XML

WEBCAST

A Roadmap to Efficiently ProducingMulti-Format/Multi-Screen eBooks

Lessons from Market Innovators

November 8, 2012

Page 16: Tech 802: Data, Databases & XML

Speakers

§ Thad McIlroy– Electronic publishing analyst and author

The Future of Publishing

§ Stephen Driver – Vice President, Production Services

The Rowman & Littlefield Publishing Group

Page 17: Tech 802: Data, Databases & XML

XML  Workflows  for  eBooks

17

Page 18: Tech 802: Data, Databases & XML

XML Adoption by Sector

STM Educational Trade

Page 19: Tech 802: Data, Databases & XML

XML Defined

XML is:n A device-independent, system-

independent method of storing and processing electronic text

n Markup for form and/or meaningn A data interchange format used by many

applications on the Web.

Page 20: Tech 802: Data, Databases & XML

XML Provides Real Solutionsn But it is a big, ugly, unwieldy bearn And its conceptual metaphors bear little

resemblance for book publishersn It’s based on 25-year-old thinking about

technical documents and ecommercen Yet it’s the only real game in townn ONIX book metadata is enabled by XML

Page 21: Tech 802: Data, Databases & XML

The Importance of XMLn XML enables content managementn Separates form from contentn Combines of style sheets with the power

of databases in an extensible languagen Its long-term killer feature is semantic

markup – marking up meaning, making text discoverable

n Future-proofing content

Page 22: Tech 802: Data, Databases & XML

XML TaggingSemantic tagging requires human judgmentbut offers the benefit of meaning

<book price=“49.95" ISBN="string" publicationdate="2012-12-09"> <title>string</title> <author> <first-name>string</first-name> <last-name>string</last-name> </author> <genre>string</genre> </book>

Page 23: Tech 802: Data, Databases & XML

24

Structured Taggingby Authors?

Typéfi sample approach

Page 24: Tech 802: Data, Databases & XML

If you show this to editors... “They’re going to start drinking at their desks”

Page 25: Tech 802: Data, Databases & XML

Templated DesignsHow much book content fitsinto automatic composition?

Page 26: Tech 802: Data, Databases & XML

The Human FactorNew Internal Skills & Positions

n The production skill set changes substantially

n Much of the existing knowledge base changes or obsoletes

n The move from design & composition & production management to content & product architecting and engineering

n There is an enormous training challenge ahead

Page 27: Tech 802: Data, Databases & XML

Key Takeaways

n XML is complex, but packed with valuen XML is not an all-or-nothing deal

n Your should start with small stepsn XML’s complexity demands outside help

n Services, consultants, trainers, associationsn The rapid proliferation of output formats

can only be mastered with a structured approach like XML

Page 28: Tech 802: Data, Databases & XML

Obstacles  to  using  XML

• XML  is  in>mida>ng,  full  of  jargon

• We’re  editors,  not  programmers

• And  what  about  the  authors?

• You  mean  I  can’t  move  that  line  of  text  half  a  pica?!  And  other  design  concerns

• Editorial,  or  “my  book’s  too  good  for  a  template”

Page 29: Tech 802: Data, Databases & XML

So  how’d  we  solve  it?

• We  manipulated  XML  to  our  uses,  not  the  other  way  around

• We  s>ll  used  authors’  Word  documents  as  the  source

• Template  interiors  were  something  we  had  already  been  doing  for  years

• XML  coding  was  translated  into  a  coding  structure  virtually  all  produc>on  people  know:    typeseung  short  tags

• We  adapted  exis>ng  XML  approaches  to  our  specific  needs  by  discarding  coding  that  didn’t  fit  our  content

Page 30: Tech 802: Data, Databases & XML

But  weren’t  there  problems?

Page 31: Tech 802: Data, Databases & XML

A  Mul>-­‐Channel  Workflow  Example

Page 32: Tech 802: Data, Databases & XML

1.  Word  document  received  from  author

Page 33: Tech 802: Data, Databases & XML

2.  Word  file  coded  for  XML  conversion            (resembles  standard  typeseung  short  tags)

Page 34: Tech 802: Data, Databases & XML

         3.    Typeseung  short  tags  replaced  with  XML  via                    conversion  process  (some  file  edi>ng  required.)

Page 35: Tech 802: Data, Databases & XML

 4.  Final  PDF  generated            arer  style  template          applied  to  XML  file.

         EPUB,  .mobi  and            WebPDF  generated.

Page 36: Tech 802: Data, Databases & XML

Insider  Tips

• Know  your  staffWho  can  adjust  and  how  will  you  address  those  who  can’t?

• Know  your  contentUsing  the  right  tool  for  the  job  is  cri>cal,  not  all  content  is  suitable  for  XML  composi>on

• Be  realisCc  about  the  learning  curveIf  you’re  s>ll  paper  edi>ng,  making  the  leap  straight  to  XML  may  be  too  great,  so  start  small

• Be  flexibleYou’ll  likely  revisit  several  core  values  of  your  publishing  program,  iden>fy  the  most  important  things  and  be  honest  about  the  less  important  ones

Page 37: Tech 802: Data, Databases & XML

Insider  Tips,  cont.

• XML  need  not  be  an  off-­‐the-­‐shelf  productYou  can  and  should  work  to  customize  it  to  your  own  produc>on  needs

• See  it  throughIt’s  taken  us  two  years  to  arrive  at  a  point  where  we’re  comfortable,  and  we’re  s>ll  making  changes

• Partner  with  the  right  vendorsFind  someone  willing  and  capable  of  adap>ng  to  your  publishing  needs

• When  you  need  a  hammer,  use  a  hammerRemember  XML  is  just  another  tool,  it  shouldn’t  be  your  only  tool.  

Page 38: Tech 802: Data, Databases & XML

Ques>ons?

38

Page 39: Tech 802: Data, Databases & XML

What’s  NextTech  Course  802

1. Chris>ne  on  Tues  15th:  coming  in  to  talk  templates  and  wordpress

2. Next  Tues  22nd:  Chloe  and  Stacey  coming  in  to  talk  about  ebooks,  and  xml3. Following  Mon  28  and  Tues  29:  Brenda  J  Walker  and  Haig  Armen  on  apps

Tech  Project  6071. This  Wed  16th:  Content  to  present  assignment  to  Design  &  Tech  so  we  can  all  be  on  

the  same  page  and  on  Thurs  carry  on  with  wireframes/design  mockups  (Design),  plaworm  set  up  (Tech)  and  discoverability/ed  calendar  (Content)

2. Following  Wed  23rd:  Present  to  Alan  and  David  designs  and  ideas  so  far.