modern data platform at scale : role of cloud ... · modern data platform at scale : role of cloud,...
TRANSCRIPT
CONFIDENCE
AT EVERY TURN
CONFIDENCE
AT EVERY TURN
Prakriteswar Santikary, PhD
Vice President and Global Chief Data Officer
13 FEBRUARY, 2019
MODERN DATA PLATFORM AT SCALE : ROLE OF CLOUD, MICROSERVICES
AND SERVERLESS ARCHITECTURE
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
Agenda Items
• Data Volume
• Data Variety
• Data Security
• Data Privacy
• Data Protection
Modern Data Architecture
• Microservices
• Serverless pipeline
• Lambda Architecture
• Automation
• AI-enablement
Modern Data Platform
• Cloud architecture
• Integration at scale
• Data-as-a-service
• Analytics-as-a-service
• Open Data API
Challenges in Clinical Trials
| Copyright ERT 2018 2
01
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
ABOUT ERT
│ © Copyright ERT 2018 3
Founded in 1977; privately held
Supporting Pharmacos,
Biotechs & CROs around the
world
Operations in 12 countries
2500+ employees
MINIMIZING RISK &
UNCERTAINTY, SO
YOU CAN MOVE
AHEAD QUICKLY
WITH CONFIDENCE
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
4
CLINICAL TRIAL INDUSTRY CHALLENGES
│ © Copyright ERT 2018 4
Complexity
In clinical
trials
Multiple data sources
Multiple devices
Multiple data types
Multiple vendors
Data siloes
Difficult to identify issues and
resolve them in real time
Lack of visibility into actions
being taken to address issues
Disparate systems
Lack of metadata management
Lack of master data
management
Lack of data governance
Regulated Industry
GDPR
Data Collection
Data Security and Privacy
Data Processing and Access 01 02
03 04 Data Quality
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
CLINICAL TRIALS GETTING MORE COMPLEX
│ © Copyright ERT 2018 5
Increasing interest
in building protocols
based on real-time
outcomes;
mitigating risk of
protocol change
mid trial
Increase in
Complex Trials
Continued move to
a single Cloud-
based platform as
trial methodology of
choice powered by
data. Allows for
agility and speed to
market.
Key I
ns
igh
t Im
plic
ati
on
s
Technology
advancements
driving highly
individualized
patient therapeutics
administration.
Advancement of
Precision Medicine
Ability to do smaller
targeted trials
improving
outcomes based on
lifestyle,
environment and
genetic makeup.
Shift in more active
patient engagement
- improving access
to trials via
mHealth, virtual
trials and focusing
on outcomes.
Patient Centricity
Further investment
in clinical trials that
meet patient needs
– less burdensome,
increased diversity
and better
outcomes.
An explosion of
data from
wearables,
genomics, social,
imaging, etc. –
driving advances in
data interaction and
visualization.
More Big Data &
AI Modeling
Need to
standardize data
sets - mine
previous data sets,
and better ingest,
analyze and
manager larger,
more complex
data.
The promise of
Blockchain to help
concerns over
privacy, data
sharing and
reproducibility.
Blockchain
Increased
investment in
blockchain
technology to
improve data
security –
competitive
differentiation.
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
6
VOLUME
VELOCITY
VARIETY
VERACITY
EXPONENTIAL DATA GROWTH IN LIFE SCIENCES
│ © Copyright ERT 2018 6
Clinical, compliance, medical,
RWE/commercial and actimetry
Ability to gain insight from
aligned data sets
Digitization has increased the
speed of information
More types along with more
endpoints
Organizations must implement scalable
infrastructures to deal with increased
data volumes
Leading customer experiences require
contemporary insights that can be
acted on
Flexible data models are required to
integrate customer information
Need to enable multiple user groups
with consumable insights
Implications
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
7
CONDUCTING CLINICAL TRIALS COMES WITH UNCERTAINTY AND RISK
│ © Copyright ERT 2018 7
TRIAL
START
TRIAL
END
<10% OF
TRIALS END ON TIME
48% OF SITES
UNDER-ENROLL SUBJECTS
5-12 SYSTEMS
PER GLOBAL STUDY
58% INCREASE IN SITES
PER TRIAL
71% INCREASE IN
AVERAGE ENDPOINTS
30% AVG PATIENT
DROPOUT RATE
GROWING TRIAL COMPLEXITY ADDS TO DECLINING PERFORMANCE
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
High administrative burden to extract and re-enter
data Into other systems
Difficult to filter out noise to identify real issues that
require attention
More effort is spent confirming the accuracy of data
than interpreting it for decision-making
Cost overrun and delay to time-to-market
Our customers are crying out for help –
“Please simplify the complex!”
CLINICAL TRIAL TEAMS ARE DROWNING IN DATA
WAVES OF INFORMATION THAT AREN’T ACTIONABLE
│ © Copyright ERT 2018 8
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
9
Data
Availability
Data
Quality
Data
Consistency
Data
Security
Data
Auditability
Reporting, Analytics and Data Science Services
Data
Lineage
Data
Standards
Data Policies
and Procedures
Business
Metadata
Technical
Metadata
Master Data Management, Metadata Management and Data Governance
Reporting Self-service BI Open Data API Data Science
Relational Dimensional In-memory Polyglot
ERT’s APPROACH - MODERN DATA ARCHITECTURE AND SERVICES
Data Architecture and Data Technology
Structured Data Unstructured Data Semi-structured Data Binary Data
Data Integration, Data Services and Data Adapters
Internal External Third party Future M&A
Data Sources
│ © Copyright ERT 2018 9
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
Microservices Architecture
• Decentralized
• Independent
• Do one thing well
• Polyglot
• API-first Design
• You build it; You own it
Benefits
• Agility
• Innovation
• Quality
• Scalability
• Availability
Challenges
• Distributed systems
• Monolith->Microservices
transition not easy
• Organizational issues
(DevOps)
• Skillsets
Characteristics
| Copyright ERT 2018 10
Build scalable platform and applications
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
Serverless Architecture
• Function-as-a-service
• Compute as a service
(100ms interval)
• Stateless
• Ephemeral
• Event-triggered
Benefits
• Code without provisioning
• No server HW to maintain
• No server SW to maintain
• Increase productivity
• Scale your code with HA
• Pay-per-use CPU cycle
• Zero administration
Challenges
• Vendor lock-in
• Vendor control
• Monitoring and debugging
• Startup latency
Characteristics
| Copyright ERT 2018 11
Build applications without having to manage Server
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
Modern Data Platform - Serverless Architecture
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
13
EXAMPLE USE CASES FROM OUR MODERN DATA PLATFORM
│ © Copyright ERT 2018 13
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
EXAMPLE USE CASES FROM OUR CLOUD PLATFORM
| Confidential – Internal Use Only
Real-time data integration
from any System of Record
once available.
Configurable State Machines as
serverless Step Functions to compute
real-time compliance and data state.
Push transformed data in real-time to an
external endpoints or request
incrementally or cumulatively.
14
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
15
OUR DATA SCIENCE APPROACH AT SCALE
│ © Copyright ERT 2018 15
Role of Scalable
Data Platform
Garbage-in, garbage-out
Missing value
Outliers
Data quality (cleaning)
Normalization
Transformation
Requires Platform that Scales
Difficult step in AI/ML
Time consuming
Requires expert domain
knowledge
Feature selection
Feature extraction
Requires Platform that Scales
Analytic sandbox
Availability of integrated dataset
Require Platform that Scales
productization
automation
Dynamic training of model
Requires Platform that Scales
Data pre-processing
Model Development
Feature Engineering 01 02
03 04 Algorithm choice
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
KEY TAKEAWAYS FROM TODAY’S DISCUSSION
│ © Copyright ERT 2018 16
Data Governance
To unleash the potential of data - Master data
management, data quality, data profiling, data
policies and standards – fosters cross-
organizational collaboration
Modern Data Foundation
Ingest any data of any type of any velocity
Data processing at scale, easy data access,
provides self service, provides advanced
analytics capabilities including AI and ML
Data security and access Governed data store, transparency,
regulatory compliance,
R-0
G-146
B-188
R-64
G-173
B-205
R-128
G-200
B-221
R-191
G-228
B-238
R-0
G-179
B-152
R-64
G-198
B-178
R-128
G-217
B-203
R-191
G-236
B-229
R-132
G-189
B-0
R-163
G-205
B-64
R-193
G-222
B-128
R-224
G-238
B-191
Use
th
es
e t
ints
fo
r a
cc
ura
cy o
f th
e b
ran
d c
olo
rs
R-0
G-56
B-101
R-64 G-106 B-139
R-128 G-155 B-178
R-191 G-205 B-216
R-0
G-109
B-104
R-122 G-210 B-232
R-166 G-225 B-239
R-211 G-240 B-247
R-116
G-118
B-120
R-151 G-152 B-154
R-185 G-186 B-187
R-220 G-221 B-221
R-101
G-49
B-101
R-139
G-100
B-139
R-178
G-152
B-178
R-216
G-203
B-216
R-188
G-189
B-188
R-205
G-205
B-205
R-221
G-222
B-221
R-238
G-238
B-238
R-255
G-92
B-57
R-255
G-133
B-106
R-255
G-173
B-156
R-255
G-214
B-205
R-147
G-50
B-142
R-174
G-101
B-170
R-201
G-152
B-198
R-228
G-204
B-227
17
QUESTIONS?
? …
Prakriteswar Santikary, PhD