smart content navigation with the sap hana platform...smart content navigation with the sap hana...
Post on 20-May-2020
35 Views
Preview:
TRANSCRIPT
Smart Content Navigation with the
SAP HANA Platform Georg Nold (Springer Science+Business Media) and Philipp Scholl (SAP AG)
May 15, 2013
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 2
Springer Science – overview
Springer Science+Business Media is one of the leading publishers of
scientific publications
2,700+ journals and 7,000 new book titles annually
Our customers are accessing content via SpringerLink (link.springer.com)
Subscriptions cover e.g. discipline, business branches or collections of specific journals
Large archive of publications (from 1842 to today) – most of it digitized
~7.6 million documents available via SpringerLink
Continual migration from print to electronic publishing
Goal: identify new business models for journal and book content
Grow e-book revenues
Target corporate and individual customers
Simplify functional areas around the content database to be more efficient
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 3
Springer Brands and Imprints
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 4
Springer Science – current challenges
Most accessed through search engines (like Google)
One stop and leave – bad for cross-selling
Content is stored in eight blade servers for storing metadata (A++ XML format) and
indexing and searching documents (MarkLogic)
Heavy use of preaggregated indices, split per discipline
Frequent reprocessing of XML data
Approx. 400 million accesses per month (results in ~86GB raw server log data)
200+ million full-text downloads per year (as of 2012)
Structured data available only in server log data
Generation of usage reports takes up to several days – live reporting not easily possible
Springer is going through a transformation towards digital business
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 5
What is smart content navigation?
Support for users in navigating large volumes of unstructured content
Enable users to find relevant documents efficiently
Enable explorative navigation as well as direct access
Refine search goal by current context
Enable automatic content recommendations
Offer helpful workflow-supporting features
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 6
Text analytics overview in SAP HANA SPS05
SAP HANA
Text processor
Linguistic
processing
Entity and
fact
extraction
Column store
SQL engine
XS engine
Domain-specific
Terminology
Documents
Web application
Structured
Data
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 7
The numbers
One box for SAP HANA
Intel Xeon CPU E7 – 8870 2.5GHz (8 CPUs à 10 cores)
1 TB of RAM
The data
50% of all Springer content (~ 3.8 million PDF documents)
– 3.57 TB stored as disk-based column
– 280 GB full-text and index (compressed and in memory)
– Text engine indexes ~ 50,000 documents per hour
– 98 million PDF metadata entries
2.8 billion entities extracted from text
– 39 GB (compressed and in-memory)
14 months of Web server access logs
– 160 GB (compressed and in memory)
Total: 482 GB of data in memory
© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Thank you
Contact information:
Georg Nold Philipp Scholl
Director Platform Integration SAP Strategic Projects Lab
Georg.Nold@springer.com P.Scholl@sap.com
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 9
© 2013 SAP AG or an SAP affiliate company.
All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG.
The information contained herein may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.
These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or
warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group
products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing
herein should be construed as constituting an additional warranty.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in
Germany and other countries.
Please see http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information and notices.
top related