0409 - data provisioning - slt for hana 1.0 - large cluster table loads

Contents Introduction ............................................................................................................................................... 2

Process Summary Checklist ....................................................................................................................... 2

What to know and do before you start........................................................................................................ 3

Source System Preparation ........................................................................................................................ 3

Table Data Balancing ................................................................................................................................ 6

Type 4 Table Load Configuration .............................................................................................................. 7

How do we change the number of concurrent access plans? ..................................................................... 27

How to monitor progress of the access plan execution ............................................................................. 28

Cluster Table Parallel Load Count ........................................................................................................... 29

Problem Resolution ................................................................................................................................. 30

Duplicate Records ................................................................................................................................... 31

Authors: Guenter Weber Mihajlo Mitevski

Greg Monaco Version: 1.5 Date: March 23, 2012 Contact: [email protected]

HANA SLT – Reading Type 4 Replication Techniques

2

Introduction The use of reading type 4 to load a large cluster table attempts to address 2 key challenges:

Load duration Database resource exhaustion

Initial attempts to use reading type 5 for the initial table load resulted in challenges with the Oracle-based source ECC system where, due to the full table scan executed by reading type 5, Oracle error ora-01555 “snapshot too old” would occur, often hours or days into the table load effort. Using reading type 4 coupled with suggested approaches found in this document enable the parallelization of both the access plan calculation and the data load for a cluster table which should hopefully result in greatly reduced initial load duration.

Process Summary Checklist

ECC: Create tablespace for DMC_INDXCL ECC: DMC_CREATE_CLUSTER ALL: Map optimal balancing of table data for multi-threading SLT: Configure table for Type-4 loading SLT: Configure table filter (optional) ALL: Optimize resource utilization during access plan calculation and data load

3

What to know and do before you start

When you are running an initial load which utilizes a reading type of 4, no other initial loads may run. This can be a bit of a surprise if you try to load some smaller tables and the loads just never start. So keep this in mind and arrange your loads accordingly. Maybe you will want to load all of the smaller tables first so that the modeling and reporting teams can have some data to work with while waiting, maybe for days, for the load of the larger cluster table(s).

Double-check that the trigger log for your cluster table is empty. If you have had multiple load tests,

it is possible that the trigger log has records which are now not needed. Make sure that this log is empty before starting your load.

Oracle and ASSM issues - Oracle 10.2.0.4 has issues with LOB tables which will impact inserts – they

hang – to table DMC_INDXCL. See note 1166242 for the work around:

alter table SAPR3.DMC_INDXCL modify lob (CLUSTD) (pctversion 10);

Configure, if applicable, filtering that will be used for replication.

For updates of cluster table records, the record must first be deleted in HANA and then inserted.

You may see ‘delete’ statements in the HANA and wonder why. This processing is unique to cluster tables.

Consider changing the mergedog parameters on HANA. Once a large table has been loading for a while, the default values will result in almost constant merging. This may lead to some replication disconnections and/or overall slower loading and replication. The recommended approach is:

Adjust the auto_merge_decision_func: For example, “DMS>1000 or (TMD>3601 and DRC > 1000000) or DCC>800 or DLS>5000” will merge the delta if there are more than 1.000.000 records in it.

Source System Preparation

In addition to the published requirements for an ECC source system in an SLT landscape, the following topics must be completed to utilize reading type 4:

Create a database container (ie:tablespace) to contain table DMC_INDXCL (optional) Create table DMC_INDXCL using program DMC_CREATE_CLUSTER

4

Always check to see if table DMC_INDXCL already exists on the ECC system. This may indicate that there is an ongoing TDMS project. If so, check with the TDMS project team to make sure that you do not impact each other’s project. Table DMC_INDXCL is loaded with the source system table data. Then it is DMC_INDXCL which will be the source table for the data load via SLT to HANA. Table DMC_INDXCL can be created in its own specific tablespace or you can allow the DMC_CREATE_CLUSTER program to determine the default tablespace for this table. A separate tablespace that does not contain productive tables seems like a safer approach. Sizing the DMC_INDXCL table and associated tablespace is, like any sizing exercise, based on a good guess. A good starting point is to assume a compression rate of 10:1 with the observation that for some tables like cluster tables, there may be hardly any compression. Note that table space assignment for table DMC_INDXCL is to be considered specifically if Oracle or DB6 is used as the database for the source system. For other database vendors, you need to ensure that enough space is available in the source system.

5

Note that after the successful initial load, it is perfectly acceptable to remove this cluster and this tablespace to reclaim space. It is not mandatory that you do so but if the space is needed, be confident that there will be no adverse impact on the ongoing operation of SLT.

6

Table Data Balancing

The objective of this effort is to understand how best to segregate and balance the table data into multiple threads. Those multiple threads should greatly reduce processing time for both the access plan calculation and table data loading into HANA. For our recent project, our objective was to load cluster table BSEG data into HANA for fiscal year’s 2010-current. We also know that the most active document numbers are in company codes “L001”, “201”, and “202-205”. With this information available, we separated our data access into the following 15 logical threads:

1. ( BUKRS BETWEEN ‘A ’ AND ‘L000’ ) AND GJAHR = 2010 AND MANDT = ‘100‘ 2. BUKRS = 'L001' AND GJAHR = 2010 AND MANDT = ‘100‘ 3. ( BUKRS BETWEEN ‘L002’ AND ‘ZZZZ’ ) AND GJAHR = 2010 AND MANDT = ‘100‘ 4. ( BUKRS BETWEEN ‘ ’ AND ‘200’ ) AND GJAHR = 2010 AND MANDT = ‘100‘ 5. BUKRS = '201' AND GJAHR = 2010 AND MANDT = ‘100‘ 6. ( BUKRS BETWEEN ‘202’ AND ‘205’ ) AND GJAHR = 2010 AND MANDT = ‘100‘ 7. ( BUKRS BETWEEN ‘206’ AND ‘9999’ ) AND GJAHR = 2010 AND MANDT = ‘100‘ 8. ( BUKRS BETWEEN ‘A ’ AND ‘L000’ ) AND GJAHR = 2011 AND MANDT = ‘100‘ 9. BUKRS = 'L001' AND GJAHR = 2011 AND MANDT = ‘100‘ 10. ( BUKRS BETWEEN ‘L002’ AND ‘ZZZZ’ ) AND GJAHR = 2011 AND MANDT = ‘100‘ 11. ( BUKRS BETWEEN ‘ ’ AND ‘200’ ) AND GJAHR = 2011 AND MANDT = ‘100‘ 12. BUKRS = '201' AND GJAHR = 2011 AND MANDT = ‘100‘ 13. ( BUKRS BETWEEN ‘202’ AND ‘205’ ) AND GJAHR = 2011 AND MANDT = ‘100‘ 14. ( BUKRS BETWEEN ‘206’ AND ‘9999’ ) AND GJAHR = 2011 AND MANDT = ‘100‘ 15. GJAHR = 2012 AND MANDT = ‘100‘ Only through testing can we determine if this criteria is optimal but it is a good starting point. Concerning the performance of an access plan calculation with such separations, it is important to keep in mind that with reading type 4, other than with reading type 5, we want the database to do an index access to the cluster table. The database system can only do this in an efficient way, if the selection criteria are specified in such a way that the first key fields of the physical cluster table is part of the selection. In our example of BSEG, we need to specify selection criteria for the first key fields of RFBLG, which are MANDT and BUKRS. Here we specify a single value for the client field MANDT and either single values or ranges for the next field BUKRS (the company code). If the customer had all their data in one client and company code, you would use the next field BELNR (the document number) for the selection, so the selections would be as follows: First access plan: MANDT = <only relevant client in system> AND BUKRS = <only company code> and BELNR < ‘1111111111’ Second access plan: MANDT = <only relevant client in system> AND BUKRS = <only company code> and BELNR BETWEEN ‘1111111111’ AND ‘2222222222’

7

….and so on. So it is very critical that your selection criteria allow an efficient selection on the primary index of the respective table, ideally with single values for the first one or two key fields, and a range for the subsequent one. In this example, it is fine to further subdivide by means of GJAHR, but GJAHR alone without the preceding key fields as selection criteria would make the DB optimizer to decide to rather do a full table scan, which would be equal to reading type 5.

Type 4 Table Load Configuration

The following steps demonstrate the process for configuring table BSEG within SLT for reading type 4.

1. Verify that Note 1671800 - HANA initial load for cluster table: reading type 4 or 5 is applied to the SLT system.

2. Verify that Note 1690485 - Parallelization of HANA data replication doesn't work is applied to the SLT system.

3. Verify that Note 1699822 - HANA Initial Load caused duplicates for parallelized objects is applied to the SLT system.

4. Decrease the number of total jobs and load jobs to 0 from the web interface. And verify in sm37 that all data load jobs will finish.

8

5. In HANA, add BSEG from data provisioning for replication.

9

6. Now is the perfect time to partition, if applicable, your cluster table in HANA.

10

7. Monitor the status of the load in MWBMON -> relevant tables: Ensure that the flags Def, Gen and Calc are set to ‘X’.

11

8. Stop the monitor job via SM37->Cancel Active Job. Also, verify that no IUUC* and DMC* jobs are

running. To be safe, check backwards a few months and for all users.

12

9. Modify the table entry for BSEG in table DMC_MT_TABLES via SE16:

Blocksize - 16000000 TABLE MAX PARALL – This value represents both the total concurrent number of access plan

batch jobs on the source server (MWB*) and the total concurrent number of parallel load processes on SLT during the load phase. You can change this value later so you may want to start with a low value and increase it as capacity allows. In the example below, I set it to 15 but I could as easily have set it to ‘1’ and then increased the value as capacity permits. I will cover this in a later chapter.

Copy the content of field COBJ_GUID for use in a subsequent step.

Record COBJ_GUID here: uL4eKnp4W}6S2uIFQOpTWm

13

10. Create a new record in table DMC_PREC_HDR with the following entries:

OWNER = COBJ_GUID from DMC_MT_TABLES <- uL4eKnp4W}6S2uIFQOpTWm ID = 1 (Note: this does not represent the MT_ID value. It is just a ‘1’) GUID = any arbitrary content, just don’t leave it blank DESCR – any value. is_calculated = 3 segment_size = 1 blocksize=16000000 search_depth = 1

Record GUID here: 12345678

11. Create 15 records in table DMC_ACS_PLAN_HDR, with the following content: owner = value of the GUID field from DMC_PREC_HDR <- 12345678 ID = consecutive numbers from 1 to 15 GUID = arbitrary content, don’t leave it blank, and use a unique value for each of these 15

records is_calculated = '-' blocksize=16,000,000 prec_state = 'D' failed = '-' loaded = '-' in_process = '-'

15

Below is an example of the contents of table DMC_ACS_PLAN_HDR after configuration for each of the 15 records. The picture only shows 12 of the 15 entries. You’ll have to trust me that that other 3 are there.

12. Create a record in table DMC_SELSTRING for each of the 15 records created in

DMC_ACS_PLAN_HDR: GUID = respective value from DMC_ACS_PLAN_HDR line_no = 1 LINE = corresponding WHERE clause

13. In MWB MON->Relevant Tables, change the reading type to 4 and set the size category to ‘A’.

After saving the changes it will tell you that you have to regenerate the migration object. Then in the “STEPS” tab of MWBMON you are able to regenerate the migration object for BSEG.

17

I am concerned that if I leave Parallel Load Progr set to 15 from my changes to DMC_MT_TABLES in prior steps, that 15 MWB* jobs will start concurrently on the source system against BSEG and swamp the system. So I limited the number to 7 as a part of this test. Also it is at this point that I set Size Category to ‘A’. Select ‘Save’ and review the status for BSEG in MWBMON.

19

14. Edit the record for BSEG in DMC_MT_TABLES

Set Calculated to ‘P’ Set Access Plan Type to ‘PRE’

20

15. Regenerate the BSEG runtime object

21

You will get a number of messages across the bottom of the screen.

23

16. Recalculate the Access Plan

Notice that I set the number of jobs to 10. I did this just to make a demonstration here. This value represents the number of ACC_PLAN_CALC* batch jobs which will be started on the SLT server. I set this value to 10 and the TABLE_MAX_PARALL field in DMC_MT_TABLES, which represents the number of MWB* batch jobs on the source ECC system, to 7 to demonstrate, later in this example, the relationship between the 2 values.

24

Note: There is an ‘X’ in the Err column for this screenshot. This was due to some residual configuration from prior tests. Please ignore.

25

We scheduled 10 jobs but 3 ended fast and now 7 are running. This is because there is a 1:1 between the ACC_PLAN_CALC* jobs on the SLT server and the MWBACPLCALC* batch jobs that are running on the source ECC system. From the source ECC system:

Now the access plan jobs are running. When all 15 jobs complete, the data load will start. Simple, right? You may now start the SLT monitor job via the LTR transaction and configure the total and initial load job count. Take care to not over-allocate batch work processes. For type 4, when LOAD finishes, be VERY patient waiting for Replication to start. Walk away for an hour. Do not panic if it does not switch to Replication right way.

27

How do we change the number of concurrent access plans? You have started your access plan execution. The MWB* job is running on the source ECC system and the ACC* job is running on the SLT server. You would like to run more than one job and get this load over with so you can go home. For our example, we will now run 3 access plan jobs concurrently.

1. On the SLT Server: Modify field TABLE_MAX_PARALL in table DMC_MT_TABLES for your table (BSEG, in our example) and set the value to the number of concurrent jobs which you would like to run.

2. On the SLT Server: MWBMON->

28

How to use the screen above to manage job count:

We have 1 job already running. To set “Number of jobs to be scheduled” to ‘3’ is translated to starting 3 additional ACC* jobs.

If we also select the check box for “Restart”, we are saying that we want a total of ‘3’ jobs to run and so 2 additional ACC* jobs would start.

Assuming that we have 8 concurrent jobs and we want to reduce the count to 6: Specify 6 in the “Number of jobs to be scheduled” field, select the check box for “Restart”. As access plan jobs complete, new jobs will not be started and ultimately, only 6 jobs will run concurrently.

How to monitor progress of the access plan execution

Each MWB* batch job in the source ECC system should, depending on record count, produce a job log that shows progress in 100,000 record increments. Review the job log via transaction SM37. Of course, you could have an access plan execution that returns only a few records and for this, there will be no record count recorded in the job log. You may also review the status of your access plan execution via table DMC_ACS_PLAN_HDR. It is a good idea to review and become familiar with this table. As for overall system monitoring, standard Basis transactions coupled with MWBMON->Application Logs should ensure that your source and SLT systems are optimized and functioning as required. Make sure that the sure that a backup was completed and the log directory cleaned up (alter system reclaim log) before starting the load. Monitor all capacities during the load. We had the log directory surprise us and fill up during our load of BSEG and can confirm that this leads to high levels of panic and sadness.

29

Cluster Table Parallel Load Count

To control the total number of concurrent load jobs for a cluster table, the Initial Load value found in the HANA LTR Web Dynpro is not used. The total job count that sets the number of concurrent cluster table load processes. So you may control this value by changing the Total Job count in the Web Dynpro or modify the value in field TABLE_MAX_PARALL in table DMC_MT_TABLES. It is best to use TABLE_MAX_PARALL to control this operation. Make sure that you have enough total jobs to manage the replication for your other tables. This is an odd bug that will likely be fixed in the next release.

30

Problem Resolution

Access Plan or Load Process phase dictates how to respond to a system failure. During the Access Plan Calculation -

1. Stop the SLT batch jobs. 2. Identify through standard SAP Basis skills and transactions the source of the failure. The

assumption though is that the access plan batch jobs have failed. Common issues include a full file system or tablespace due to DMC_INDXCL writes, a shutdown of the source system due to poor communication, or one of a million other possible issues. Identify and resolve the issue.

3. Review table DMC_ACS_PLAN_HDR on the SLT server. In column ‘IS_CALCULATED’, an ‘X’ means that the access plan calculation has completed successfully. If the value is ‘S’, this means that the access plan was in progress at the time of system failure. Reset the value ‘S’ to ‘-‘ for each corresponding record.

4. Reset the ‘FAILED’ flag in DMC_MT_TABLES for your table to ‘-‘. You can do this by editing the table or using the ‘Reset Mass Transfer Indicators’ in the Expert Functions of transaction IUUC_SYNC_MON on the SLT server.

5. Reset the mass transfer indicators. 6. Restart the batch jobs.

The access plans for the failed jobs will restart at the beginning. They do not start where they failed and continue forward.

During the Data Load -

1. Identify through standard SAP Basis skills and transactions the source of the failure. The assumption though is that the load batch jobs have been interrupted and/or have failed.

2. Halt the SLT batch jobs. Reset the MT indicators. 3. Review table DMC_ACS_PLAN_HDR on the SLT server. In column ‘FAILED’, an ‘X’ means that

the load has failed. Reset this value to ‘-‘ for each applicable record. 4. Restart the SLT batch jobs 5. Review the load in MWBMON->Runtime Information->1->Enter. Your load may have a few

error or duplicate records. If so, the error records are usually processed by the load. For duplicate records, open a ticket with BC-HAN-LTR because I am too lazy to document this process right now.

31

Duplicate Records Often when a load is restarted, you will detect possibly two new conditions in MWBMON->Runtime Information:

1. Many errors in the column ‘Err’ 2. An increase, probably x2, of the count in the ‘Run’ column.

How to address the duplicate records:

1. Let the load continue until it has completed. It will have an error state as reported in MWBMON->Relevant Tables.

2. Halt SLT batch jobs. 3. Reset mass transfer indicators in IUUC_SYNC_MON->Expert Functions. 4. Edit the table in MWBMON:

5. Change the Transfer Behaviour from ‘1’ to ‘3’ and SAVE the change.

32

6. Start batch jobs. 7. Review status in MWBMON->Relevant Tables 8. Review status in MWBMON->Runtime Information. The count in the Err column should be

decreasing.

How to explain the increase in the ‘Run’ column:

Access plans were being processed at the time of the server or processing halt and so some portions were still in status “IN_PROCESS”. When the batch jobs are restarted, a new group of processes are added while the existing assigned processes are still ‘tied’ to those access plans with the “IN_PROCESS” state. The key point is that you are not really running double the work processes. It is more a factor of the state of the access plans. Once all of those other “IN_PROCESS” access plans are actually processed, the number will decrease. Short answer? Don’t panic.

0409 - data provisioning - slt for hana 1.0 - large cluster table loads

Documents

initial table load

data load

table load effort

table load configuration

initial load duration

table scan

table data balancing

use of reading type