03 sap hana_data_provisioning

Welcome to today’s training on data provisioning with SAP HANA. In this training, we identify the methods and underlying processes for loading and extracting data from a source system into SAP HANA.

1

After completing this lesson, you will be able to:

Define methods of data provisioning for SAP HANA

Explain the initial and periodic-load processes for importing data from existing data sources into SAP HANA

Describe the key technologies used for loading data into SAP HANA

To load data from a source system to SAP HANA for the first time, the initial steps involve creating a connection to the source system, and importing all metadata into the SAP HANA database.

When your client is loading metadata into the SAP HANA database for the first time, or conducting an initial data load, you will need to create a connection and tables for the metadata to load into.

In order to get data into SAP HANA, tables must be available in the SAP HANA database into which data can be loaded. The tables can be in the row store or column store format. There are three different ways to create these tables in HANA:

First, within the SAP HANA Studio, you can create a table simply by right-clicking on the table node located under the desired schema. Forms allow you to enter the name of the new table and configure the properties of the columns.

The next method leverages a SQL statement. To create a table in this manner, open a SQL Editor window and create the column table SQL statement to describe the table you want to create. Both of these first two methods support import metadata from both SAP and non-SAP Business Suite systems.

The third method of table creation leverages a utility called the HANA Studio import functionality. This functionality only supports metadata imports from SAP Business Suite systems, and imports this metadata via the creation of an SAP Applications Datastore in Data Services. Using this functionality, the process of creating a system connection is somewhat more complex than the previous two methods. Once established, this connection will be used by the HANA Studio to retrieve metadata from the SAP Business Suite system.

4

Now that the connections to the source system are established, we need to import metadata into the SAP HANA database. For improved performance when using change data capture (CDC) or auto correct load, Data Services uses a temporary staging table in HANA to load the target table. Data Services first loads the data into a staging table and then it applies the operation codes (INSERT, UPDATE, and DELETE) to update the target table.

With the Bulk loader option selected in the target table editor, if the data flow contains a MAP CDC_Operation transform, Map Operation transform that outputs UPDATE or DELETE rows, or a Table Comparision transform it will trigger the staging mechanism. In addition, if the Auto correct load option in the target table editor is set to “Yes”, it will also trigger the staging mechanism.

5

When bulk loading is enabled, you have the ability to specify the mode for loading data to the target table. The default mode is “append” which adds new records to the table, but you also have the option to truncate or delete all existing records in the table then add new records as needed. You also have the option to determine the maximum number of rows loaded to the staging and target tables before saving the data. The default commit size is 10,000 for the column store and 1,000 for the row store. Lastly, you can specify how the input rows are updated or applied to the target table. If you select UPDATE, it issues an UPDATE to the target table. This is a default setting. If you select, DELETE-INSERT, it issues a DELETE to the target table for data that matches the old data in the staging table, then issues an INSERT with the new data. This is a default setting for Row Store tables. However, do not use DELETE-INSERT if the update rows do not contain data for all columns in the target table because Data Services will replace missing data with NULLs.

6

Next, the selected tables must be kept in synch with the SAP HANA database.

SAP HANA offers both real-time-replication and periodic-load options to move data from source systems to the SAP HANA database. Replication-based data provisioning provides near-real-time synchronization of data sets between source systems and SAP HANA. In this scenario the data is pushed from sources to SAP HANA when it is available.

On the other hand, extraction-based data provisioning, provided by SAP BusinessObjects Data Services, loads snapshots of data periodically as a batch and is triggered from the target system.

Trigger-based data replication captures database changes at a high level of abstraction in the source system. The SAP Landscape Transformation tool (a component of SAP Data Services) provides high availability and security by utilizing reliable replication features such as auto-reconnect functions and buffering of database changes in source systems, in case of power outages. SAP Landscape Transformation also provides flexibility to merge data from different source systems along with reliable central monitoring of the entire replication process.

There are numerous considerations when determining the best option for extracting data out of source systems and loading that data into HANA. In the next slide we’ll take a closer look at some of these considerations.

8

There are numerous things which need to be considered when determining the best option for extracting data out of source systems and into SAP HANA. For instance, are the tables “insert only” or do they also allow updates? Is it possible to use existing SAP Business Suite Extractors just like an SAP Business Warehouse implementation? Does the Relational Database Management System or RDBMS have Change Data Capture (CDC) capabilities? If not, do the tables you’re interested in have create and change timestamps for you to implement your own delta extraction mechanism?

Depending on the option chosen, the complexity, supportability and maintenance cost of the implementation can vary significantly.

9

For those already familiar and experienced with Data Services, a Direct RDBMS Connection is a viable option. Extracting data directly from SAP Business Suite tables is quick and easy; however, this approach has limitations since it doesn’t work with pooled or clustered tables. Also, you might not be able to access the RDMBS directly because of the license you have with your vendor. Because of some of these limitations, this approach is not currently supported by SAP. While it is technically possible, the recommendation is to extract SAP Business Suite data using the application layer only and not to establish a direct connection to the underlying RDBMS.

If you determine that the stated limitations don’t apply to you and you still want to extract SAP Business Suite data directly from the RDBMS, then you should consider the following:

A full refresh since it is easy to implement and easy to manage. It consists of deleting all records in the target table and re-loading it with a full extract of all records from the source table. This method

ensures that no data will be overlooked or left out due to technical or programming errors; however, this implementation is not recommended for tables with a large number of rows.

Another option is Source–based Change Data Capture (CDC), or sometimes called Incremental Extraction. Source-based CDC implementations extract only the changed rows from the source. This method is preferred because it improves performance by extracting the least number of rows.

RDBMS-Based CDC is a specific capability offered by some RDBMS vendors. With CDC scenarios, the RDBMS takes the changed data and places it in internally replicated tables. SAP BusinessObjects Data

Services can then use these records of changes to load the deltas. For extraction from the SAP Business Suite RDBMS, SAP BusinessObjects Data Services offers access to source-based CDC data from Oracle and Microsoft. Oracle's CDC packages are used to create and manage CDC tables. These packages make use of a “publish and subscribe” model. You can create a CDC datastore for Oracle sources using the Data Services Designer. Microsoft SQL Replication Server is used to capture changed data from Microsoft SQL Server databases.

If you’re using an RDBMS other than Microsoft SQL Server or Oracle for your SAP Business Suite system, you can consider using Timestamp-based CDC. There are many possible implementations of a time-stamp-based CDC solution depending on what types of timestamps are available in the source tables.

10

Target-based CDC extracts all of the data from the source system and compares it against the target system. The result of the comparison is a set of rows flagged as Insert, Update or Delete. It is

implemented in SAP BusinessObjects Data Services using the Table Comparison transform.

Lastly, extracting data from SAP Business Suite tables via the ABAP Application Layer is the recommended and supported approach for extracting data from SAP Business Suite tables. This method enables data extraction via tables or SAP Business Suite Content Extractors.

11

To allow you to familiarize yourself with some of the concepts covered in this training we have made available a number of offline demonstrations. The scenarios listed on this slide relate to tasks you would perform in the HANA modelling studio. Use these demonstration scenarios to gain a greater understanding of some of the technical aspects of SAP HANA. You should start with the Modelling scenario and only move on to the other scenarios when you are ready.

Note that these demonstrations require the iRise Reader to be installed on your system. You can download the iRise Reader for free from www.irise.com. All that is required for this download is an email address.

The demonstrations launch in a web browser - expand the view to full screen in the browser using the f11 key to avoid any issues with small areas being cut off the side of the demo.

12

Lastly, it is important to note that these demonstrations do not include a great deal of scripting so you must ensure that you are using the ‘guides’ to get to know the flow of the scenario. The clickable areas in the demo have a small black guide icon next to them. Hover your mouse over the guide to see instructions that will help you proceed through the scenario. The guide icons are only available when the Guides option in the iRise reader toolbar is On.

After you have installed the iRise Reader, come back to this page and simply click on the graphic to access all of the SAP HANA iRise demo content in the Partner Demo Library.

13

Correct!

In order to get data into HANA, there must be tables available in the SAP HANA database where data can be loaded. There are three different ways to create these tables in HANA before execution of the Data Services Job: right-clicking on the table node, SQL Statement, and HANA Studio import functionality.

Sorry, that is incorrect. The correct answers are:

Right-click on the table node

SQL Statement

HANA Studio Import

In order to get data into HANA, there must be tables available in the SAP HANA database where data can be loaded. There are three different ways to create these tables in HANA before execution of the Data Services Job: right-clicking on the table node, SQL Statement, and HANA Studio import functionality.

14

Correct!

Extracting data from SAP Business Suite tables via the ABAP Application Layer is the recommended and supported approach. This method enables data extraction via tables or SAP Business Suite Content Extractors.

Sorry, that is incorrect. The correct answer is:

Via the ABAP Application Layer

Extracting data from SAP Business Suite tables via the ABAP Application Layer is the recommended and supported approach. This method enables data extraction via tables or SAP Business Suite Content Extractors.

15

You should now be able to:

Define the methods of data provisioning for SAP HANA

Explain the initial and periodic-load processes for importing data from existing data sources into SAP HANA

Describe the key technologies used for loading data into SAP HANA

For more information on topics discussed in this course, see the references listed here.

Thank you for completing this training.

18

03 sap hana_data_provisioning

Technology

loading data

new data

data flow

snapshots of data

missing data

old data

sap hana database

component of sap data