marko.hotti@microsoft€¦ · gartner disclaims all warranties, expressed or implied, with respect...
TRANSCRIPT
* Disclaimer: Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors
with the highest ratings. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner
disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. 2
GARTNER MAGIC QUADRANT DW & BI
Business Intelligence and Analytics PlatformsData Warehouse Database Management Systems
The Traditional Data Warehouse
3
Breaking Points of The Traditional Data Warehouse
5
1 2
3
4
Introducing The Modern Data Warehouse
6
Data Sources
Business Intelligence
Microsoft Hadoop VisionInsights to all users by activating new types of data
Diminishing performance
Limitations: Performance and Scale today
Existing Tables (Partitions)
Rowstore
Diminishing Scale as
requirements grow
Non-optimal performance
for many DW queries
Scale UP
SQL Server 2012 Parallel Data Warehouse (PDW)Insights on any data of any size
Next-generation
Performance At ScaleBuilt For Big Data
Manageable Costs
Appliance
Simplicity:
HW + SW
Query
Performance
Scale Out MPP versus Scale Up SMP
“Big Data”
Integration
Updateable
xVelocity
Columnstore
What is Parallel Data Warehouse?
• Shared-nothing parallel database system» Massively parallel processing (MPP)
» A “Control” server that accepts user queries, generates a plan, and distributes operations in parallel to compute nodes
» Multiple “Compute” servers running SQL Server
» A “Management” server for administering the system
» A “Data Movement Service” that facilitates parallel SQL operations
• Delivered as an appliance» Balanced and pre-configured software and industry standard hardware from Dell
or HP
» Single Call Support
» Fastest Time to Market
» Scales from 2 to 56 Nodes
HP Example
Key Design Elements
• Modular Design
• High Density
• Leverage latest Microsoft software
features
» Windows Server 2012 Storage Spaces
» Windows Server 2012 Hyper-V
» SQL Server 2012 xVelocity ColumnStore
HP Example
Ultra Shared Nothing architecture: DistributionLarger Fact Table is Hash Distributed
Across All Compute NodesTD
SD
PD
MD
SF
01-08Time Dim
Date Dim ID
Calendar Year
Calendar Qtr
Calendar Mo
Calendar Day
Store Dim
Store Dim ID
Store Name
Store Mgr
Store Size
Product Dim
Prod Dim ID
Prod Category
Prod Sub Cat
Prod Desc
Sales Facts
Date Dim ID
Store Dim ID
Prod Dim ID
Mktg Camp ID
Qty Sold
Dollars Sold
Mktg Campaign
Dim
Mktg Camp ID
Camp Name
Camp Mgr
Camp Start
Camp End
TD
SD
PD
MD
SF
09-16
TD
SD
PD
MD
SF
17-24
TD
SD
PD
MD
SF
25-32
TD
SD
PD
MD
SF
33-n
• xVelocity in-memory columnstore in PDW columnstore index as primary data store in a scale-out MPP Data Warehouse - PDW V2 Appliance
• Updateable clustered columnstore index (CCI)
• Support for bulk load and insert/update/delete
• Extended data types – decimal/numeric for all precision and scale
• Query processing enhancements for more batch mode processing (for example, Outer/Semi/Antisemi joins, union all, scalar aggregation)
Customer benefits
• Outstanding query performance from in-memory columnstore index
• 600 GB per hour for a single 12-core server
• Significant hardware cost savings due to high compression
• 4–15x compression ratio
• Improved productivity through updateable index
• Ships in PDW V2 appliance and SQL Server 2014
In-Memory Columnstore in PDW V2 & SQL Server 2014
14
Introducing PolyBaseFundamental breakthrough in data processing
Single Query; Structured and Unstructured
• Query and join Hadoop tables with Relational Tables
• Use Standard SQL language
• Select, From Where
Existing SQLSkillset
No ITIntervention
Save Timeand CostsDatabase HDFS
(Hadoop)
SQL Server
2012 PDW
Powered by
PolyBase
SQL
Analyze AllData Types
External Tables» An external table is PDW’s representation of data residing in HDFS
» The “table” (metadata) lives in the context of a SQL Server database
» The actual table data resides in HDFS
CREATE EXTERNAL TABLE table_name ({<column_definition>} [,...n ])
{WITH (LOCATION =‘<URI>’,[FORMAT_OPTIONS = (<VALUES>)])}
[;]
Required to indicate
location of Hadoop clusterOptional format options
associated with parsing of data
from HDFS (e.g. field delimiters
& reject-related thresholds)
Native Query Across Hadoop and PDWParallel Data Import from HDFS into PDW
Persistently storing data from HDFS in PDW tablesFully parallelized via CREATE TABLE AS SELECT (CTAS) with external tables as source table and PDW tables (either distributed or replicated) as destination
CREATE TABLE ClickStream_PDW WITH DISTRIBUTION = HASH(url)
AS SELECT url, event_date, user_IP FROM ClickStream
Retrieval of data in HDFS “on-the-fly”
Enhanced
PDW query
engine
CTAS Results
External Table
DMS
Reader
1
DMS
Reader
N
…
HDFS bridge
Parallel
HDFS Reads
Parallel
Importing
Sensor
&
RFIDWeb
Apps
Unstructured data
Hadoop
Social
Apps
Mobile
Apps
Structured data
Traditional DW
applications
PDW
Sensor
&
RFIDWeb
Apps
Unstructured data
Social
Apps
Mobile
Apps
HDFS data nodes
Native Query Across Hadoop and PDWParallel Data Export from PDW into HDFS• Fully parallelized via CREATE EXTERNAL TABLE AS SELECT (CETAS) with external tables as
destination table and PDW tables as source
• ‘Round-trip of data’ possible with first importing data from HDFS, joining it with relational data, and then exporting results back to HDFS
CREATE EXTERNAL TABLE ClickStream (url, event_date, user_IP)
WITH (LOCATION =‘hdfs://MyHadoop:5000/users/outputDir’, FORMAT_OPTIONS
(FIELD_TERMINATOR = '|')) AS SELECT url, event_date, user_IP FROM ClickStream_PDW
Enhanced
PDW query
engine
CETAS Results
External Table
DMS
Writer
1
DMS
Writer
N
…
HDFS bridge
Parallel
HDFS Writes
Parallel
Reading
Structured data
Traditional DW
applications
PDW
PDW V2.0 Management Dashboard
PDW V2.0 Management Dashboard
PDW V2.0 Management Dashboard
Microsoft Business Intelligence Platform