riaan vermeulen data solutions architect eclipse networks [email protected]

43

Upload: kory-gibson

Post on 04-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za
Page 2: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Taking Your Database beyond Relations with Microsoft SQL Server 2008With Special Focus on FILESTREAM Riaan Vermeulen

Data Solutions Architect

Eclipse [email protected]

Page 3: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Information-Centric Applications

Relational Data

XML Documents and Multimedia

Spatial

Applications

Structured

UnstructuredSemi-Structured

Structured

Page 4: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Key Application Challenges

Growth of the types of data to be hosted

Incr

easi

ng c

ompl

exity

Search and IndexingCaching and Synch

Object MappingRich Query Capability

Data Model Tier

Business Intelligence (BI) AnalysisApplication Integration Compliance

Reporting

Application Tier

Large Data SetsTransactions and Security

Reliability and ScaleReferential Integrity

Storage Tier

RelationalData

Documents and Multimedia Spatial

XML

Page 5: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Strategic Challenges and Goals

Dealing with relational and non-relational data platformsGrowth in application complexity and duplicated functionalityCompensating for unavailable services

Pain Points

Goals

Reduce the cost of managing all types of dataSimplify the development of applications which use relational and non-relational dataExtend services currently available for relational data to non-relational dataProvide non-relational services to relational data

Page 6: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Four Pillars of SQL ServerYour Data, Any Place, Any Time

Dynamic Development

Beyond Relational

Pervasive Insight

Enterprise Data Platform

ServerMobile and

Desktop

OLAP2

FILE

XMLRDBMS1

Services

Query

Analysis

Reporting Integration

Synch

Search

Cloud

1: Relational database management systems2: Online Analytical Processing

Page 7: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Beyond Relational Feature OverviewSQL Server 2005 SQL Server 2008

Large UDTsSparse ColumnsWide Tables/Column SetFiltered IndicesHierarchyID

Relational BR Support

User Defined Types

Full Text IndexingDocuments &

Multimedia

Remote BLOB Store APIFILESTREAMIntegrated Full-Text Search (FTS)

Spatial

Fully supported Geometry and Geography data types and Functions

XML Data Type and Functions

XML Upgrades

XML

Page 8: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Spatial

Key Application Challenges

Growth of the types of data to be hosted

Incr

easi

ng c

ompl

exity

Search and IndexingCaching and Synch

Object MappingRich Query Capability

Data Model Tier

Business Intelligence (BI) AnalysisApplication Integration

ComplianceReporting

Application Tier

Large Data SetsTransactions and Security

Reliability and ScaleReferential Integrity

Storage Tier

RelationalData

Documents and Multimedia XML

Page 9: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

BLOB storage options

Low cost per GBStreaming Performance

Complex application development and deploymentIntegration with structured data

Advantages

Challenges

Integrated managementData-level consistency

Poor data streaming supportFile size limitationsHighest cost per GB

Lower cost/GB at scaleScalability & Expandability

Complex application development and deployment

Separate managementEnterprise-scales only

Example Windows File ServersNetApp NetFiler

EMC CenteraFujitsu Nearline

SQL Server VARBINARY(MAX)

Documents & Multimedia

Use File Servers

DB

Application

BLOBs

Dedicated BLOB Store

DB

Application

BLOBs

Store BLOBs in Database

DB

Application

BLOBs

Page 10: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

SQL Server 2008 BLOBs

Remote BLOB Storage FILESTREAM StorageSQL BLOB

Documents & Multimedia

Use File Servers

DB

Application

BLOB

Dedicated BLOB Store

DB

Application

BLOB

Store BLOBs in Database

DB

Application

BLOB

Store BLOBs in DB + File

SystemApplication

BLOB

DB

Page 11: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAMStorage Attribute on VARBINARY(MAX)

Works with integrated FTSUnstructured data stored directly in the file system (requires NTFS)Dual Programming Model

TSQL (Same as SQL BLOB)Win32 Streaming APIs with T-SQL transactional semantics

Data ConsistencyIntegrated Manageability

Back Up/RestoreAdministration

Size limit is the file system volume sizeSQL Server Security Stack

Documents & Multimedia

Store BLOBs in DB + File System

Application

BLOB

DB

Page 12: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAM : BasicsEnabled per instance

When enabling for the first time, computer or instance restart may be required to hook in the file-system filter driver

FILESTREAM data stored in a FILESTREAM FILEGROUPSStored as NTFS directories, called ‘data containers’Can be on compressed volumes

Create a FILESTREAM FILEGROUP at database create time or add one later

Documents and Multimedia

Page 13: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAM : BasicsFILESTREAM columns are defined as varbinary(max) columns that have the FILESTREAM attribute

Table must have a uniqueidentifier ROWGUIDCOL column and a varbinary(max) column with the FILESTREAM attribute

If full-text indexing is required, the table also needs a column to store the document file type extension

Documents and Multimedia

Page 14: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAM : ProgrammingT-SQL

Cannot be used to perform partial updates to FILESTREAM data.Underlying BLOB data in the file system is deleted asynchronously by Garbage Collector.

Win32 APIWin32 Streaming APIs with T-SQL transactional semantics.A token must be obtained before FILESTREAM files can be accessed.GET_FILESTREAM_TRANSACTION_CONTEXT(), provides the token that binds the FILESTREAM file system streaming operations with a started transaction.SqlFileStream API can be used to call Win32 streaming interfaces, such as ReadFile() and WriteFile().

Documents and Multimedia

Page 15: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAMdemo

Page 16: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAM : SecurityFILESTREAM directories have restricted access while the database is open

Users have permissions to open files based on SQL Server table/column security settingsPrivileged users *can* mess with the files though…

When the database is closed, FILESTREAM directory access is gated by Windows security

As the Demo showed, the FILESTREAM directories and files are not file locked like open databases and can be altered/deleted when not actively in use

Documents and Multimedia

Page 17: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAM : Data ConsistencyWhen a FILESTREAM value is updated (by creating a new file), the old file is not deleted immediately

The link on the data page is updated to point to the new fileAfter the transaction has committed, the OLD file is made available for garbage collectionIf the transaction rolls back, the change to the link in the data page is rolled back and the NEW file is made available for garbage collection

Garbage collection is driven by tombstone tables (stored as internal tables in the default FILEGROUP)

Isolation SchematicsWin 32 – Only support Read-Commited Isolation LevelT-SQL – Full serializable Support

Documents and Multimedia

Page 18: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAM : Interaction with other Features

Failover clusteringFILESTREAM filegroups must be on shared storage

Log shippingBoth databases must have FILESTREAM enabled

Backup/RestoreFILESTREAM data backed up as part of regular backup typesFILESTREAM filegroups can be excluded from database backups

Full-Text IndexingThe table must have a column storing the FILESTREAM BLOB file type extension

SQL Express is Supported The 4GB database size limit does not include FILESTREAM data

Documents and Multimedia

Page 19: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAM : Replication

FILESTREAM data can be replicated

When the replication topology involves instances using different versions of SQL Server, there are limitations on the size of data than can be sent to down-level instances.

The replication filter options determine whether the FILESTREAM attribute is replicated or not using transactional replication.

When merge replication is used, both it and FILESTREAM require a uniqueidentifier column. Care must be taken with the table schema when using merge replication so that the GUIDs are sequential (i.e., use NEWSEQUENTIALID() rather than NEWID()).

Documents and Multimedia

Page 20: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAM : Unsupported Features

Database mirroringDBM cannot be configured if the database contains FILESTREAMThis may be a MAJOR barrier to adoption in V1

Database snapshotsSnapshot can be created on non-FILESTREAM FILEGROUP only

SQL Server encryptionFILESTREAM data cannot be encrypted using either column level encryption OR Transparent Data Encryption

Documents and Multimedia

Page 21: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Write Performance (Remote)

240 KB 480 KB 1 MB 2 MB 4 MB 8 MB

-200

-100

0

100

200

300

400

500

600Insert

Filestream Win32 (Filesystem) Access

Filestream T-SQL

Varbinary

Filesystem Win32 Ac-cess Gain (%)

Th

rou

gh

pu

t (M

bp

s)

Documents and Multimedia

Page 22: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Read Performance (Remote)

240 KB 480 KB 1 MB 2 MB 4 MB 8 MB0

100

200

300

400

500

600

700

800

900

Filestream Win32 (Filesystem) Ac-cess

Filestream T-SQL

Varbinary

Filesystem Win32 Access Gain (%)

Th

rou

gh

pu

t (M

bp

s)

Documents and Multimedia

Page 23: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAM : LimitationsOnly works with NTFS

FILESTREAM files cannot be properly deleted or renamed through Win32

Operations have to be done transactionally otherwise they cause corruption

"SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()"FILESTREAM data cannot be stored remotely

FILESTREAM columns cannot be INCLUDED columns

In-place partial updates not supportedNew file created and old file only available for deleting after commit

Documents and Multimedia

Page 24: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Migration SQL BLOBs to FILESTREAM

Conceptual walkthroughCreate one (or more) FILESTREAM (filegroup)Alter table to add FILESTREAM columnFor each existing row, update data in new row with empty FILESTREAM valueWrite LOB data (File I/O or T-SQL access)Optional: drop Varbinary(MAX) column

BLOB migration utilities support (intended for SQL Server 2008 R2)

Page 25: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Case Study – What are we doing?

Department of Education – MpumalangaScript and Marksheet Management SystemTrack, Scan & Store 500k+ Marksheets every year.650 GB – 1.3 TB of Files per AnumFILESTREAM Win 32 API Write and Read Access (C# VS 2008)Partitioned over 6 LUNsUse ITFsSharePoint Search Portal

Page 26: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Customer Application

Remote Blob Store Architecture

Customer applications can transparently support different BLOB storesEach Remote Blob Store vendor responsible for delivering their own providers

NetApp lib IBM lib Centera lib

SQL RBS API

NetApp IBM Centera

SQL DB

Provider API

RBS Services•Create•Fetch•GC•Delete

Documents & Multimedia

Page 27: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

RBS Workflow

Application

RBS Client Library

BLOB Store Provider Library

BLOB Store SQL Server

ClaimID ClaimDate PhotoRef

4390 6/5/2007 <Binary(20)>1

2

3

1 Write BLOB(Photo)

Return Blob ID

Write Blob ID to PhotoRef field

2

3

Machine Boundary

Page 28: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

RBS Fundamentals

Most useful in environments where interoperability is requiredNo restrictions on back-end storeBack-end can change with no app changeLooser (link level) consistency guaranteesSQL Server handles link consistency and garbage collection

Page 29: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Unstructured Storage In SQL08File Stores /

External Blob Stores (CAS)

SQL BLOBs Remote Blob API FILESTREAM

Streaming Performance

Depends on external store

Depends on external store

Link Level Consistency

Data Level Consistency

Integrated Management

Non-local Windows File

Serversn/a

External Blob Stores

n/a

Documents & Multimedia

Page 30: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

FILESTREAM ComparisonComparison

point

Storage solution

File server / file system SQL Server (using varbinary(max)) FILESTREAM

Maximum BLOB size NTFS volume size 2 GB – 1 bytes NTFS volume size

Streaming performance of large BLOBs Excellent Poor Excellent

Security Manual ACLs Integrated Integrated + automatic ACLs

Cost per GB Low High Low

Manageability Difficult Integrated Integrated

Integration with structured data Difficult Data-level consistency Data-level consistency

Application development and deployment More complex More simple More simple

Recovery from data fragmentation Excellent Poor Excellent

Performance of frequent small updates Excellent Moderate Poor

Page 31: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Full Text Search Challenges

Indexes stored outside SQL Server lead to manageability challengesMixed query performance suffers from having to pull over complete full-text result setScaling issues on big boxes

Documents & Multimedia

Page 32: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Full Text Indexing Upgrades

Full-text indexes fully integrated into SQL ServerMake mixed queries perform and scale

SELECT * FROM candidates WHERE CONTAINS(resume,’”SQL Server”’)

AND ZipCode = ‘98052’

Page 33: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Spatial Data OverviewStorage and retrieval of spatial data using standard SQL syntax

New Spatial Data Types (geometry, geography)New Spatial Methods (intersects, buffer, etc.)New Spatial Indexes

Offers full set of Open Geospatial Consortium components (OGC/SQL MM, ISO 19125)Spatial Builder InterfaceSSMS VisualizationIntegration with Virtual Earth

Spatial

See also:DAT03-HOL Integrating Microsoft SQL Server 2008 Spatial Support with Microsoft Virtual Earth

Page 34: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

XML ImprovementsImproved XML Schema Validation

Support for storing and validating Office 12 Document formatsSupport for lax validationFull xs:dateTime support

Support for values without timezonetimezone preservation

Improved support for lists and union typesAdded support for let-clause in XQueryAdded fn:upper-case()/fn:lower-case()Added support forinsert sql:variable(“@xml”) into /a/b

XML

Page 35: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Storing Relational Data RelationalData

1

2

3

4

5

HierarchyID Store arbitrary hierarchies of data and efficiently query them

Large UDTs No more 8K limit on User Defined Types

Sparse Columns Optimized storage for sparsely populated columns

Wide Tables Support for hundreds of thousands of sparse columns

Filtered Indices Define indices over subsets of data in tables

1 3

4

5

2

Page 36: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Filtered IndexesFiltered Indexes and Statistics

Indexing a portion of the data in a tableFiltered/co-related statistics creation and usageQuery/DML Optimization to use filtered indexes and statistics

RestrictionsSimple limited grammar for the predicateOnly on non-clustered indexes

BenefitsLower storage and maintenance costs for large number of indexesQuery/DML performance benefits: IO only for qualifying rows

Relational BR Support

Page 37: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Summary

SQL Server 2008 will make it easier to create information-centric applications that require

Unstructured documentsXMLSemi-structured informationCombine the above with relational data by:

Reducing the cost of managing all types of dataSimplifying the development of applications which use relational and non-relational dataExtending services currently available for relational data to non-relational data

Page 38: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Future of Beyond Relational

Rich unstructured dataEnable existing BR apps to store data in SQL Server

Win 32 File I/O API compatibilityBetter integration of FILESTREAM and RBS programming modelsBetter scalability of FILESTREAMProperty Search and promotioniFTS improvements in functionality and scale/performance

Deep SpatialMore functionalityAcross BI components

Page 39: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

ResourcesWhitepapers & Videos

FILESTREAM/RBS Whitepapers: http://msdn.microsoft.com/en-us/library/cc949109.aspx http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-manage-unstructured.aspx

What’s new for XML in SQL Server 2008: http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-whats-new-xml.aspx iFTS: http://msdn.microsoft.com/en-us/library/cc721269.aspx

SQL Server 2008 Business Value Calculator: http://www.moresqlserver.com

Page 40: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

[email protected] & answer

Page 41: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

www.microsoft.com/teched

Sessions On-Demand & Community

http://microsoft.com/technet

Resources for IT Professionals

http://microsoft.com/msdn

Resources for Developers

www.microsoft.com/learning

Microsoft Certification & Training Resources

Resources

Page 42: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

Please complete an evaluation

Page 43: Riaan Vermeulen Data Solutions Architect Eclipse Networks riaanv@eclipse.co.za

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,

IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.