big data governance - sas · big data governance is part of a broader data governance program that...
TRANSCRIPT
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
BIG DATA GOVERNANCE
GÖRKEM ŞEVİK
BİLGİ YÖNETİMİ ÇÖZÜM LİDERİ
Big data governance
is part of a broader data governance program
that formulates,
monitors
and enforces policies
relating to big data.
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
BİG DATA GOVERNANCE CONSIDERATİONS
Big transaction data
Human generated data
Web & social media
Internet of Things
Big Data Reference Architecture
provides a common framework that describes all the components needed for the required functionality.
Enterprise data architects
can use a big data reference architecture to plot big data road maps and discover gaps in technology implementations.
Data scientists and business users
can take advantage of this approach to make sense out of a complex landscape.
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
A comprehensive
Big Data Reference
Architecture
Consists of
16 regions
in 3 main sections
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SECTION 1 INGESTION
REGION 1: BIG DATA SOURCES
Access big data sources
Virtually blend data from multiple sources without physically moving
REGION 2: HADOOP DISTRIBUTIONS
Access and process on the Hadoop distributions
Push processing down to the Hadoop for improved performance
REGION 3: DATA STREAMS
Examine, filter and analyze large volumes of real-time data
REGION 4: DATABASES
Read and write more than 60 relational and big data sources
Manage the business glossary for attributes in these data sources
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SECTION 2 CLEANSING, INTEGRATION AND GOVERNANCE
REGION 5: BIG DATA
INTEGRATION
Self-service big data preparation to
enhance analyst productivity and
improve the quality of data
Bridge the skills gap, giving all users
access to their Hadoop data
regardless of technical ability
Business analysts and data scientists
to summarize, aggregate, merge,
transpose, or join data in Hadoop
REGION 6: TEXT ANALYTICS
Derive value from the unstructured data by answering business questions related to call logs,
claims adjuster notes, social media, etc.
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SECTION 2 CLEANSING, INTEGRATION AND GOVERNANCE
REGION 7 and 8:
BIG DATA DISCOVERY AND
BIG DATA QUALITY
Business users and data scientists
to profile data natively, in cluster and in
parallel,
standardize, parse, match or de-duplicate
data in Hadoop without writing code
Gender analysis, pattern analysis, etc.
Run in different runtime
environments, incl. in-stream, in-
memory, in-Hadoop
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SECTION 2 CLEANSING, INTEGRATION AND GOVERNANCE
REGION 9 and 10: METADATA AND INFORMATION POLICY MANAGEMENT
Common metadata repository that spans data management, analytics and third-party data
sources and tools, to store, manage and deliver metadata
Workflow driven business glossary to relate and track the associated business rules, reference
data, Technical owners, data stewards and other roles
Visualization of relationships and impact analysis to see how changes to one data element affect
information in other systems
Document acceptable use standards
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SECTION 2 CLEANSING, INTEGRATION AND GOVERNANCE
REGION 11: MASTER DATA MANAGEMENT
Organically built on top of data quality
Support multiple data domain
Data access from Apache Hive, Cloudera Impala
Include data glossary and
lineage
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SECTION 3 ANALYTİCS, SECURİTY AND LİFE CYCLE
REGION 12: DATA WAREHOUSES AND DATA MARTS
Run model scoring inside databases for improved performance and governance
REGION 13: BIG DATA ANALYTICS AND REPORTING
Advanced analytical capabilities for big data analytics, visualization and reporting
Take governance of analytical models into consideration
Streamline the process of creating, managing, administering and monitoring analytical models
Integrate with workflow engine
REGION 14: BIG DATA SECURITY AND PRIVACY
Data stewards classify data from an information security perspective, such as «Public»,
«Internal», «Confidential»
Dynamic data masking and encryption
Right users have access to the right data
Secure Data
Access
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SECTION 3 ANALYTİCS, SECURİTY AND LİFE CYCLE
REGION 15: BIG DATA LIFECYCLE MANAGEMENT
Understand and categorize the usefulness of the data and how it relates to other
data sets
Consider data source onboarding, how and when to archive data, what stage the
data is in its life cycle, and who owns the decision rights around that data
REGION 16: CLOUD
Manage cloud-based data
Support cloud-based executions with web-based services
architecture
Deploy to Amazon EC2, Rackspace or other platform-as-a-service
providers
SAS offers a robust platform and comprehensive approach to big data governance, data managementand analytics that can accommodate the data evolution.
C op yr i g h t © 2016, SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
THANK YOU
#SAS#DataManagement
#DataStrategy#DataPrep
#Hadoop#BigData
#DataIntegration
#DataQuality
#DataGovernance
#IoT
#BigDataGovernance
#DataEvolution
#DataLifecycle
#MDM
#Data