efficient transaction processing in sap hana …eecs.csuohio.edu/~sschung/cis601/efficient... ·...

Efficient Transaction Processing in

SAP HANA Database – The End of a

Column Store Myth

Ravi Raj Kadam(2709227)

ABSTRACT

• SAP’s (Systems, Applications and Products) new data management platform.

• Provide a generic but powerful system for different query scenarios-both transactional and analytical. (Scalable execution)

• General architecture- design criteria and Initial Myth about usage Of Column and Row Store Database

• Concept of record life cycle management to use different storage formats for different stages of the record.

• SAP HANA database abilities to efficiently work in analytical as well as transactional workload environment

Row VS Column Store:

INTRODUCTION

• Challenges of Data mining in modern business software industry.

• Classic Enterprise Resource Planning(ERP) and Online Transaction Processing(OLTP).

• SAP- efficient, flexible, robust and cost-efficient in Data Management(DM) layer in different application scenarios.

• OLTP workload of ERP systems typically required thousands of concurrent users and transactions with very selective point queries.

Observation Of Current Situation

Usage Perspective:• Users like to directly interact with the database. Application layer and scripting languages are main mechanisms with built-in database for specific application domains like R for statistical programming, Pig to work on ha-doop and SAP FOX for functional Planning scenarios.

Cost Awareness:• Clear demand to provide Lower Total Cost Of(TCO) for the complete DM(Hardware setup cost to operational and maintenance costs)

PERFORMANCE:• Performance is the main reason to use specialized systems.

• Challenge is to provide flexible solution with the ability to use specialized operators or data structures whenever they are used in the database.

Features Of SAP HANA DB

• It is a combination of hardware and software made to process massive real time data using In-Memory computing.

• It combines row-based, column-based database technology.

• Data now resides in main-memory (RAM) and no longer on a hard disk.

• It’s best suited for performing real-time analytics, and developing and deploying real-time applications.

• Complex calculations on data are not carried out in the application layer, but are moved to the database.

Contribution and Outline

• HANA DB comprises a multi- engine query processing environment.

• Offers different degrees of structure from well- structured relational data to irregularly structured data graphs to understand texts.

• Supports ‘Transactional Level’ snapshot isolation and ‘Statement Level’ snapshot isolation.

• Represents the application specific business objects (OLAP Cubes) and Logic(Domain- Specific Function Lib)

Contribution and Outline

• HANA DB is optimized to efficiently communicate between the Data Management(DM) and Application Layer(AL)

• The HANA DB supports and handles the SAP application server that supports all the data types for scripting languages. So, it is highly-optimized Column- Oriented data represented. It is achieved by Multi- step record life cycle management approach

• Transactional process uses Multi-Version Concurrency Control(MVCC) to implement transaction level isolation and statement level Isolation.

Layered Architecture- SAP HANA DB

• SAP provides support for SAP Business Warehouse(BW) to speedup query and transactions.

• In- order to provide this capability data loading and transformational tools module are used to create and maintain complex data flows in and out of SAP HANA.

• Business Intelligence Consumer Services(BICS), MDX SQL can be used on SAP HANA Appliance.

• SAP Business Suite, SAP NetWeaver Business Warehouse(NW- BW) and other third party provide services for this database.

Overview of HANA DB Layered Architecture

Calculation Graph

Model

•“Calculation Graph” (calc graph for short), which forms the heart of the logical query processing framework.

•Calc model defines a set of intrinsic operators e.g. aggregation, projection, joins, union etc. On the other side, the calc model provides operators which implement core business algorithms like currency conversion.

Example of a SAP HANA Calc Model Graph

Calculation Graph Model

Deploy and Compile Time Run Time

• Dynamic Sql Nodes: Calc operator that execute sql statement in data flow. It can be resulting in a form of “Nested calc” models.

• Custom Nodes: This are used to implement domain specific operations in C++ for performance reasons.

• R Nodes: This is used to forward incoming data sets to an R execution environment.

• LNODE

• Relational Operators: This are the collection of relational operators that handles relational Query graph. Ex: Equi- Join, Unified Table.

• OLAP Operations: These are optimized for Star-Join with fact and Dimension tables.

• L runtime: Runtime for the internal language reflects building block to execute L code. Using split & combine operator pair, the L runtime in invoked in parallel runtime.

• Text Operators & Graph Operators

Life-Cycle Management of DB Records

• The physical Operators provide excellent performance for both aggregation queries and also highly selective queries.

• This is generally known as update-in-place-style databases systems.

• This database generally Row and column based Module.

• L1 Delta: This is a Row level module where it accepts all the incoming data requests and stores them in a write-Optimized manner, i.e. it provides a logical flow format of the record. It is optimized for fast insert/ delete/ Update and record projection.

• L2Delta: This is a Column level module. In contrast to the L1-delta this employs dictionary encoding to achieve better memory usage.

• Main Store: This is the final index level which represents the core data format with the highest compression rate exploiting a variety of different compression schemes.

Overview of the Unified table concept:

Column StoreStructure:

Memory Compression

1) Prefix Encoding 2) Run Length Encoding 3) Cluster Encoding4) Sparse Encoding 5) Indirect Encoding

Persistency Mapping

• This Mapping has two levels REDO Log and SAVEPOINT data area.

REDO Log:• This generally stores the entry log between the L1-Delta-to-L2Delta whenever the system crash occurs this log is checked by the user/admin and the transformation continues from the next entry. The L2-Delta saves the data till where it has been updated.

SAVEPOINT data Area: • This is majorly the backup data that have been created between the L2-Delta-to-Main. This is used whenever the system entry failure occurs in the transaction level this will not save L1-Delta-to-L2 Delta.

Overview of the Persistency mechanisms of the unified table

Details of the L1-to-L2-Delta Merge

Characteristics of the SAP HANA database record life cycle

CONCLUSION

• Column store systems are well known to provide superb performance for OLAP-style workload. Typically aggregation queries touching only a few columns of 100s of millions of rows benefit from a column-oriented data layout. On the one hand, operational systems embed more and more statistical operations for the on-the-fly business decision into the individual business process. On the other hand, classical data-warehouse infrastructures are required to capture transactions feeds for real-time analytics. Additionally, we explained in more detail the common unified table data structure consisting of different states on the one hand but providing a common interface to the consuming query engines on the other hand.

General Questions

• What if system failure occurs?

�Although the SAP HANA database is a main-memory-centric database system, its full ACID support guarantees durability as well as atomicity and recovery in case of a system restart after regular shutdown or system failure.

• Does this support all languages?

� Yes, it supports all the languages if they have Common Connection Layer. Basically this database is recommended for OLAP and OLTP transactions.

Encouraging SAP Community:

Ravi Raj Kadam(2709227)

efficient transaction processing in sap hana …eecs.csuohio.edu/~sschung/cis601/efficient... ·...

Documents