smart content navigation with the sap hana platform...smart content navigation with the sap hana...

9
Smart Content Navigation with the SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG) May 15, 2013

Upload: others

Post on 20-May-2020

34 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Smart Content Navigation with the SAP HANA Platform...Smart Content Navigation with the SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG) May

Smart Content Navigation with the

SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG)

May 15, 2013

Page 2: Smart Content Navigation with the SAP HANA Platform...Smart Content Navigation with the SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG) May

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 2

Springer Science – overview

Springer Science+Business Media is one of the leading publishers of

scientific publications

2,700+ journals and 7,000 new book titles annually

Our customers are accessing content via SpringerLink (link.springer.com)

Subscriptions cover e.g. discipline, business branches or collections of specific journals

Large archive of publications (from 1842 to today) – most of it digitized

~7.6 million documents available via SpringerLink

Continual migration from print to electronic publishing

Goal: identify new business models for journal and book content

Grow e-book revenues

Target corporate and individual customers

Simplify functional areas around the content database to be more efficient

Page 3: Smart Content Navigation with the SAP HANA Platform...Smart Content Navigation with the SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG) May

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 3

Springer Brands and Imprints

Page 4: Smart Content Navigation with the SAP HANA Platform...Smart Content Navigation with the SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG) May

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 4

Springer Science – current challenges

Most accessed through search engines (like Google)

One stop and leave – bad for cross-selling

Content is stored in eight blade servers for storing metadata (A++ XML format) and

indexing and searching documents (MarkLogic)

Heavy use of preaggregated indices, split per discipline

Frequent reprocessing of XML data

Approx. 400 million accesses per month (results in ~86GB raw server log data)

200+ million full-text downloads per year (as of 2012)

Structured data available only in server log data

Generation of usage reports takes up to several days – live reporting not easily possible

Springer is going through a transformation towards digital business

Page 5: Smart Content Navigation with the SAP HANA Platform...Smart Content Navigation with the SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG) May

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 5

What is smart content navigation?

Support for users in navigating large volumes of unstructured content

Enable users to find relevant documents efficiently

Enable explorative navigation as well as direct access

Refine search goal by current context

Enable automatic content recommendations

Offer helpful workflow-supporting features

Page 6: Smart Content Navigation with the SAP HANA Platform...Smart Content Navigation with the SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG) May

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 6

Text analytics overview in SAP HANA SPS05

SAP HANA

Text processor

Linguistic

processing

Entity and

fact

extraction

Column store

SQL engine

XS engine

Domain-specific

Terminology

Documents

Web application

Structured

Data

Page 7: Smart Content Navigation with the SAP HANA Platform...Smart Content Navigation with the SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG) May

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 7

The numbers

One box for SAP HANA

Intel Xeon CPU E7 – 8870 2.5GHz (8 CPUs à 10 cores)

1 TB of RAM

The data

50% of all Springer content (~ 3.8 million PDF documents)

– 3.57 TB stored as disk-based column

– 280 GB full-text and index (compressed and in memory)

– Text engine indexes ~ 50,000 documents per hour

– 98 million PDF metadata entries

2.8 billion entities extracted from text

– 39 GB (compressed and in-memory)

14 months of Web server access logs

– 160 GB (compressed and in memory)

Total: 482 GB of data in memory

Page 8: Smart Content Navigation with the SAP HANA Platform...Smart Content Navigation with the SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG) May

© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Thank you

Contact information:

Georg Nold Philipp Scholl

Director Platform Integration SAP Strategic Projects Lab

[email protected] [email protected]

Page 9: Smart Content Navigation with the SAP HANA Platform...Smart Content Navigation with the SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG) May

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 9

© 2013 SAP AG or an SAP affiliate company.

All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG.

The information contained herein may be changed without prior notice.

Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.

National product specifications may vary.

These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or

warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group

products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing

herein should be construed as constituting an additional warranty.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in

Germany and other countries.

Please see http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information and notices.