data schema registry sap ixpproject 2020
Post on 06-Feb-2022
7 Views
Preview:
TRANSCRIPT
INTERNAL
Heet Rajesh PalodAugust 03, 2020
Data Schema RegistrySAP iXp Project 2020
2INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ
q Why?
q Objectives & Goals
q Solution: Challenges & Outcome
q Business Impact
Setting the Agenda
Why? What’s the problem?
4INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ
SAP Concur has thousands of data schemas at rest and in motion in various systems.
This creates following problems for developers working with data –
Ø Lack of shared understanding of o privacy, o security,o compliance requirements
when storing and processing data.
Ø Lack of shared understanding of how data objects relate to each other.
What’s the problem?
Objectives & Goals
6INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ
q Collaborate and partner with different teams to gather requirements.
q Identify the pain-points and overcome them by charting strategies and actionable solutions.
q Design and propose a new data modeling application to enable sharing of data, via a centralized query-able source of schema knowledge.
q Develop an end-to-end application to automate decisions about correctly handling data as it lives and moves within the systems.
q Minimize resource consumption like time, memory, efforts, etc.
Objectives & Goals
Solution: Challenges & Outcome
8INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Should we use relational database? RDBMS demands multiple tables with multiple foreign keys.
Nested SQL queries and complex joins can become unwieldy while navigating through data, and will not perform well as the size of data grows over time.
Amazon Neptune, a fully managed graph database service, uses graph structures such aso nodes (data entities), o edges (relationships), o and properties
to represent and store data.
Graph-Knowledge Powered Solution…
Challenge
Solution
9INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Schema-managed Solution…
Employee
employee_id
employee_name
Challenge
Solution
How to manage and store the schema?
Model schema data as a graph with data fields as vertices and relationships as edges.
label = Domain Object
Employee Report
report_id
belongs_
to belongs_to
belongs_tobelongs_to
belongs_to
contains
source = employee
_db
name = Employee
id = 2344-2123-
12212342
contains
classification = PII
name = employee_id
id = 2323-3343-
34343355
label = Domain Object
label = Field
label = Field
label = Field
data_type = long
10INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ
How to manage data interaction?
RESTful API layer on the graph database enables users to query and retrieve the data as and when they need.
v PubSub/Data Platform engineer - can query the data classification level for a particular field/set of fields to make decisions about -o Who the data can be shared with?o How the data can be shared with them?o What compliance responsibilities the consumer takes on with
receiving this data?
v Privacy expert - can add/update data classification for any data fields.
v Data owner - can add, update relationships or other metadata about datasets.
RESTful API-Tier Solution…
Challenge
Solution
Is our solution secured?
v An Amazon Neptune DB cluster can only be created in an Amazon Virtual Private Cloud (Amazon VPC).
v Its endpoints are only accessible within that VPC, usually from an Amazon Elastic Compute Cloud (Amazon EC2) instance running in that VPC.
v Manageably secure..
Challenge
Solution
Secured Solution… In-line with AWS Migration Directive
12INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Technology Stack
Business Impact
14INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Business Impact (1) – Speed, Storage and Security
Challenges• Lack of understanding of
how data objects are related.
• Slow query processing leads to slow data attestation rate.
• Lack of security.
Solutions• Purpose-built to store and
navigate data objects and relationships.
• Graph query boosts processing speed, hence improved data attestation rate.
• AWS facilitates better security.
15INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Business Impact (2) – Data Governance and Efforts
Reactive Approach• Manually perform the data
governance-related activities time and again.
• Demands human efforts and hence prone to human error.
Proactive Approach• Automate the data
governance-related activities to meet the dynamic data compliance laws and regulations of the governments.
• Reduces manual efforts by >50%.
16INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Business Impact (3) – Data, Metadata and Structure
Challenges• Contains duplicated data.• Schema-less structure, no
metadata and hence requires high memory storage.
Solutions• De-duplicates data by
maintaining a graph.• Schema structure allows
storing metadata, re-use of existing data; and hence saves memory storage by >70%.
Thank You.Heet Rajesh PalodSoftware Developer, SAP iXp InternSeattle, WA
Contact information:heet.palod@sap.com(206)-697-6374
top related