sql server 2016 operational analytics. sponsorzy strategiczni sponsorzy srebrni
TRANSCRIPT
SQL Server 2016Operational Analytics
Sponsorzy strategiczni
Sponsorzy srebrni
Łukasz Grala Microsoft MVP Data Platform | MCT | MCSE
• Architect - Mentor Data Platform & Business Intelligence Solutions
• Trainer Data Platform and Business Intelligence
• University Lecturer
• Author Webcasts and Publications
• Microsoft MVP Data Platform
• Leader PLSSUG Poznań
• Phd Student on Poznan University of Technology, Faculty of Computing Science(topics – database and datawarehouse architecture, data mining, machine learning)
Marcin Szeliga• Data Philosopher
• BI Expert and Consultant
• Data Platform Architect
• 20 years of experience with SQL Server
• Ph.D. Candidate at Politechnika Śląska
Operational Database Management Systems
Data Warehouse Database Management Systems
Business Intelligence and Analytics Platforms
x86 Server Virtualization
Cloud Infrastructure as a Service
Enterprise Application Platform as a Service
Public Cloud Storage
Leader in 2014 for Gartner Magic QuadrantsMicrosoft platform leads the way on-premises and cloud
Hyperscale cloud
Deeper insights across data
Do more. Achieve more.
Performance Security Availability Scalability
Operational analyticsInsights on operational data; Works with in-memory OLTP and disk-based OLTP
In-memory OLTP enhancementsGreater T-SQL surface area, terabytes of memory supported, and greater number of parallel CPUs
Query data store Monitor and optimize query plans
Native JSON Expanded support for JSON data
Temporal database supportQuery data as points in time
Always encryptedSensitive data remains encrypted at all times with ability to query
Row-level securityApply fine-grained access control to table rows
Dynamic data maskingReal-time obfuscation of data to prevent unauthorized access
Other enhancementsAudit success/failure of database operationsTDE support for storage of in-memory OLTP tablesEnhanced auditing for OLTP with ability to track history of record changes
Enhanced AlwaysOnThree synchronous replicas for auto failover across domainsRound robin load balancing of replicas Automatic failover based on database health DTC for transactional integrity across database instances with AlwaysOnSupport for SSIS with AlwaysOn
Enhanced database caching Cache data with automatic, multiple TempDB files per instance in multi-core environments
SQL Server 2016 improvements
Performance Security Availability Scalability
Operational analyticsInsights on operational data; Works with in-memory OLTP and disk-based OLTP
In-memory OLTP enhancementsGreater T-SQL surface area, terabytes of memory supported, and greater number of parallel CPUs
Query data store Monitor and optimize query plans
Native JSON Expanded support for JSON data
Temporal database supportQuery data as points in time
Always encryptedSensitive data remains encrypted at all times with ability to query
Row-level securityApply fine-grained access control to table rows
Dynamic data maskingReal-time obfuscation of data to prevent unauthorized access
Other enhancementsAudit success/failure of database operationsTDE support for storage of in-memory OLTP tablesEnhanced auditing for OLTP with ability to track history of record changes
Enhanced AlwaysOnThree synchronous replicas for auto failover across domainsRound robin load balancing of replicas Automatic failover based on database health DTC for transactional integrity across database instances with AlwaysOnSupport for SSIS with AlwaysOn
Enhanced database caching Cache data with automatic, multiple TempDB files per instance in multi-core environments
Mission-critical performance
•Refers to Operational Workload (i.e. OLTP)•Examples:
• Enterprise Resource Planning (ERP) – Inventory, Order, Sales, • Machine Data – Data from machine operations on factory floor• Online Stores (e.g. Amazon, Expedia)• Stock/Security trades
•Mission Critical• No downtime (High Availability) – impact on revenue• Low latency and high transaction throughput
What does operational mean?
•Analytics• Studying past data (e.g. operational, social media) to identify potential trends • To analyze the effects of certain decisions or events (e.g. Ad campaign)• Analyze past/current data to predict outcomes (e.g. credit score)
•Goals• Enhance the business by gaining knowledge
to make improvements or changes
Source – MIT/SLOAN Management Review
What does analytics mean?
SQL Server
Database
Application Tier
Presentation Layer
IIS Server
SQL ServerRelational DW
Database
ETL
BI and analytics
SQL ServerAnalysis Server
Key Issues• Complex Implementation
• Requires two Servers (CapEx and OpEx)
• Data Latency in Analytics
• More businesses demand/require real-time Analytics
Hourly, Daily, Weekly
Traditional BI architecture
SQL Server
Database
Application Tier
Presentation Layer
IIS Server
BI and analytics
Benefits• No Data Latency• No ETL • No Separate DW
Challenges• Analytics queries are resource intensive
and can cause blocking• How to minimize Impact on Operational
workload• Sub-optimal execution of Analytics on
relational schema
Add analytics specific indexes
This is OPERATIONAL ANALYTICS
SQL ServerAnalysis Server
Minimizing data latency for analytics
SQL Server 2016
16
Quick Recap: Columnstore Index
Improved compression:Data from same domain
compress better
Reduced I/O:
Fetch only columns needed
…
Data stored as rows Data stored as columns
Ideal for OLTP Efficient operation on small set of rows
C1 C2 C3 C5C4
Improved performance:More data fits in memoryOptimized for CPU utilization
Ideal for DW workload
17
Clustered Columnstore Performance: TPC-H
19
Key Points• Create an updateable non-clustered columnstore index (NCCI) for analytics queries• Drop all other indexes that were created for analytics• No application changes• ColumnStore index is maintained just like any other index• Query Optimizer will choose columnstore index where needed
Relational Table(Clustered Index/Heap)
Btree IndexD
elet
e b
itma
pNonclustered columnstore index (NCCI)
Delta rowgroups
Operational Analytics with columnstore index
20
Key Points• Create Columnstore only on cold data – using filtered predicate to minimize maintenance• Analytics query accesses both columnstore and ‘hot’ data transparently• Example – Order Management Application – CREATE NONCLUSTERED COLUMNSTORE INDEX ….. WHERE order_status = ‘SHIPPED’
Relational Table(Clustered Index/Heap)
Btree Index
Del
ete
bitm
ap
Nonclustered columnstore index (NCCI) – filtered index
HOT
Delta rowgroups
DML Operations
Minimizing CSI overhead
22
Operational Analytics with columnstore on In-Memory Tables
No explicit delta rowgroup Rows (tail) not in columnstore stay in in-
memory OLTP table No columnstore index overhead when
operating on tail Background task migrates rows from tail to
columnstore in chunks of 1 million rows not changed in last 1 hour
Deleted Rows Table (DRT) – tracks deleted rows
Columnstore data fully resident in memory Persisted together with operational data No application changes required
In-Memory OLTP Table
Updateable CCI
DRT Tail
Range Index
Hash Index
Hot
Like
Delta rowgroup
Query processing
Demo time
Performance improvments
Scan type Elapsed time (s) Speedup
Row store scan, interop 44.441
Row store scan, native 28.445 1.6x
CSI scan, interop 0.802 55.4x
Insert, Update, Delete costs and query time
Operation Elasped time (s) with CSI
Elasped time (s) No CSI
Increase % Update
Increase % Query
CSI scan, interop 0.802 BASE
Insert 400 000 rows 53.5 47.8 11.9%
CSI scan, interop 0.869 8.4%
Update 400 000 rows 42.4 28.9 46.7%
CSI scan, interop 1.181 47.3%
Delete 400 000 rows 38.3 30.5 25.6%
CSI scan, interop 1.231 53.5%
Single thread insert and update
Operation Rows affected Row store (s) Secondary CSI (s) Primary CSI (s)
1000 updates 10 000 0.893 1.400 6.866
10% insert 18M 233.9 566 291.4
2% update 3.96M 123.2 314.3 275.9
Single thread scan
Millions of rows Row store Secondary CSI Primary CSI
New built 180 99.1 4.7 1.71
After 1000 updates 180 99.4 5.4 1.75
After 10% inserts 198 108.7 14.5 9.5
After 2% updates 198 109.5 16.8 10.0
Comparing performanceOperation Billions of value
per secondNo SIMD
Billions of value per secondSIMD
Speedup
Bit unpacking 6bits 2.08 11.55 5.55x
Bit unpacking 12 bits 1.91 9.76 5.11x
Bit unpacking 21 bits 1.96 5.29 2.70x
Compaction 32 bits 1.24 6.70 5.40x
Range predicate 16 bits 0.94 11.42 5.06x
Sum 16 bit values 2.86 14.46 5.06x128-bit bitmap filter 0.97 11.42 11.77x64KB bitmap filter 1.01 2.37 2.35x
Query performance (1)
Predicate or aggregation Duration SQL2014 (ms) Duration SQL2016 (ms) Speedup Billion of rows per s
Q1-Q4: select count(*) from LINEITEM where <predicate>
L_ORDERKEY = 235236 220 140 1.57x 12.9
L_QUANTITY = 1900 664 68 9.76x 26.5
L_SHIPMODE='AIR' 694 147 4.72x 12.2
L_SHIPDATE between '01.01.1997' and '01.01.1998'
512 87 5.89x 20.7
Query performance (2)
Predicate or aggregation Duration SQL2014 (ms) Duration SQL2016 (ms) Speedup Billion of rows per s
Q5-Q6: select count(*) from PARTSUPP where <predicate>
PS_AVAILQTY < 10 50 27 1.85x 8.9
PS_AVAILQTY = 10 45 15 3.00x 16
Q7-Q8: select <aggregates> from LINEITEM avg(L_DISCOUNT) 1272 196 6.49x 9.1avg(L_DISCOUNT), min(L_ORDERKEY), max(L_ORDERKEY)
1978 356 5.56x 5.1
Availability Groups as data warehouse
Key points
• Mission Critical Operational Workloads typically configured for High Availability using AlwaysOn Availability Groups
• You can offload analytics to readable secondary replica
Secondary Replica
Secondary Replica
Secondary Replica
Primary Replica
Always on Availability Group
SQL Server
Database
Application Tier
Presentation Layer
IIS Server
BI and analytics
Add analytics specific indexes
SQL ServerAnalysis Server
Minimizing data latency for analytics
High-end Server Hardware
SSAS Enterprise Readiness: Tabular
New DirectQuery
DirectQuery for Oracle, Teradata, ASP
DirectQuery support for MDX query(Excel Tools)
Sponsorzy strategiczni
Sponsorzy srebrni