riaan vermeulen data solutions architect eclipse networks [email protected]
TRANSCRIPT
Taking Your Database beyond Relations with Microsoft SQL Server 2008With Special Focus on FILESTREAM Riaan Vermeulen
Data Solutions Architect
Eclipse [email protected]
Information-Centric Applications
Relational Data
XML Documents and Multimedia
Spatial
Applications
Structured
UnstructuredSemi-Structured
Structured
Key Application Challenges
Growth of the types of data to be hosted
Incr
easi
ng c
ompl
exity
Search and IndexingCaching and Synch
Object MappingRich Query Capability
Data Model Tier
Business Intelligence (BI) AnalysisApplication Integration Compliance
Reporting
Application Tier
Large Data SetsTransactions and Security
Reliability and ScaleReferential Integrity
Storage Tier
RelationalData
Documents and Multimedia Spatial
XML
Strategic Challenges and Goals
Dealing with relational and non-relational data platformsGrowth in application complexity and duplicated functionalityCompensating for unavailable services
Pain Points
Goals
Reduce the cost of managing all types of dataSimplify the development of applications which use relational and non-relational dataExtend services currently available for relational data to non-relational dataProvide non-relational services to relational data
Four Pillars of SQL ServerYour Data, Any Place, Any Time
Dynamic Development
Beyond Relational
Pervasive Insight
Enterprise Data Platform
ServerMobile and
Desktop
OLAP2
FILE
XMLRDBMS1
Services
Query
Analysis
Reporting Integration
Synch
Search
Cloud
1: Relational database management systems2: Online Analytical Processing
Beyond Relational Feature OverviewSQL Server 2005 SQL Server 2008
Large UDTsSparse ColumnsWide Tables/Column SetFiltered IndicesHierarchyID
Relational BR Support
User Defined Types
Full Text IndexingDocuments &
Multimedia
Remote BLOB Store APIFILESTREAMIntegrated Full-Text Search (FTS)
Spatial
Fully supported Geometry and Geography data types and Functions
XML Data Type and Functions
XML Upgrades
XML
Spatial
Key Application Challenges
Growth of the types of data to be hosted
Incr
easi
ng c
ompl
exity
Search and IndexingCaching and Synch
Object MappingRich Query Capability
Data Model Tier
Business Intelligence (BI) AnalysisApplication Integration
ComplianceReporting
Application Tier
Large Data SetsTransactions and Security
Reliability and ScaleReferential Integrity
Storage Tier
RelationalData
Documents and Multimedia XML
BLOB storage options
Low cost per GBStreaming Performance
Complex application development and deploymentIntegration with structured data
Advantages
Challenges
Integrated managementData-level consistency
Poor data streaming supportFile size limitationsHighest cost per GB
Lower cost/GB at scaleScalability & Expandability
Complex application development and deployment
Separate managementEnterprise-scales only
Example Windows File ServersNetApp NetFiler
EMC CenteraFujitsu Nearline
SQL Server VARBINARY(MAX)
Documents & Multimedia
Use File Servers
DB
Application
BLOBs
Dedicated BLOB Store
DB
Application
BLOBs
Store BLOBs in Database
DB
Application
BLOBs
SQL Server 2008 BLOBs
Remote BLOB Storage FILESTREAM StorageSQL BLOB
Documents & Multimedia
Use File Servers
DB
Application
BLOB
Dedicated BLOB Store
DB
Application
BLOB
Store BLOBs in Database
DB
Application
BLOB
Store BLOBs in DB + File
SystemApplication
BLOB
DB
FILESTREAMStorage Attribute on VARBINARY(MAX)
Works with integrated FTSUnstructured data stored directly in the file system (requires NTFS)Dual Programming Model
TSQL (Same as SQL BLOB)Win32 Streaming APIs with T-SQL transactional semantics
Data ConsistencyIntegrated Manageability
Back Up/RestoreAdministration
Size limit is the file system volume sizeSQL Server Security Stack
Documents & Multimedia
Store BLOBs in DB + File System
Application
BLOB
DB
FILESTREAM : BasicsEnabled per instance
When enabling for the first time, computer or instance restart may be required to hook in the file-system filter driver
FILESTREAM data stored in a FILESTREAM FILEGROUPSStored as NTFS directories, called ‘data containers’Can be on compressed volumes
Create a FILESTREAM FILEGROUP at database create time or add one later
Documents and Multimedia
FILESTREAM : BasicsFILESTREAM columns are defined as varbinary(max) columns that have the FILESTREAM attribute
Table must have a uniqueidentifier ROWGUIDCOL column and a varbinary(max) column with the FILESTREAM attribute
If full-text indexing is required, the table also needs a column to store the document file type extension
Documents and Multimedia
FILESTREAM : ProgrammingT-SQL
Cannot be used to perform partial updates to FILESTREAM data.Underlying BLOB data in the file system is deleted asynchronously by Garbage Collector.
Win32 APIWin32 Streaming APIs with T-SQL transactional semantics.A token must be obtained before FILESTREAM files can be accessed.GET_FILESTREAM_TRANSACTION_CONTEXT(), provides the token that binds the FILESTREAM file system streaming operations with a started transaction.SqlFileStream API can be used to call Win32 streaming interfaces, such as ReadFile() and WriteFile().
Documents and Multimedia
FILESTREAMdemo
FILESTREAM : SecurityFILESTREAM directories have restricted access while the database is open
Users have permissions to open files based on SQL Server table/column security settingsPrivileged users *can* mess with the files though…
When the database is closed, FILESTREAM directory access is gated by Windows security
As the Demo showed, the FILESTREAM directories and files are not file locked like open databases and can be altered/deleted when not actively in use
Documents and Multimedia
FILESTREAM : Data ConsistencyWhen a FILESTREAM value is updated (by creating a new file), the old file is not deleted immediately
The link on the data page is updated to point to the new fileAfter the transaction has committed, the OLD file is made available for garbage collectionIf the transaction rolls back, the change to the link in the data page is rolled back and the NEW file is made available for garbage collection
Garbage collection is driven by tombstone tables (stored as internal tables in the default FILEGROUP)
Isolation SchematicsWin 32 – Only support Read-Commited Isolation LevelT-SQL – Full serializable Support
Documents and Multimedia
FILESTREAM : Interaction with other Features
Failover clusteringFILESTREAM filegroups must be on shared storage
Log shippingBoth databases must have FILESTREAM enabled
Backup/RestoreFILESTREAM data backed up as part of regular backup typesFILESTREAM filegroups can be excluded from database backups
Full-Text IndexingThe table must have a column storing the FILESTREAM BLOB file type extension
SQL Express is Supported The 4GB database size limit does not include FILESTREAM data
Documents and Multimedia
FILESTREAM : Replication
FILESTREAM data can be replicated
When the replication topology involves instances using different versions of SQL Server, there are limitations on the size of data than can be sent to down-level instances.
The replication filter options determine whether the FILESTREAM attribute is replicated or not using transactional replication.
When merge replication is used, both it and FILESTREAM require a uniqueidentifier column. Care must be taken with the table schema when using merge replication so that the GUIDs are sequential (i.e., use NEWSEQUENTIALID() rather than NEWID()).
Documents and Multimedia
FILESTREAM : Unsupported Features
Database mirroringDBM cannot be configured if the database contains FILESTREAMThis may be a MAJOR barrier to adoption in V1
Database snapshotsSnapshot can be created on non-FILESTREAM FILEGROUP only
SQL Server encryptionFILESTREAM data cannot be encrypted using either column level encryption OR Transparent Data Encryption
Documents and Multimedia
Write Performance (Remote)
240 KB 480 KB 1 MB 2 MB 4 MB 8 MB
-200
-100
0
100
200
300
400
500
600Insert
Filestream Win32 (Filesystem) Access
Filestream T-SQL
Varbinary
Filesystem Win32 Ac-cess Gain (%)
Th
rou
gh
pu
t (M
bp
s)
Documents and Multimedia
Read Performance (Remote)
240 KB 480 KB 1 MB 2 MB 4 MB 8 MB0
100
200
300
400
500
600
700
800
900
Filestream Win32 (Filesystem) Ac-cess
Filestream T-SQL
Varbinary
Filesystem Win32 Access Gain (%)
Th
rou
gh
pu
t (M
bp
s)
Documents and Multimedia
FILESTREAM : LimitationsOnly works with NTFS
FILESTREAM files cannot be properly deleted or renamed through Win32
Operations have to be done transactionally otherwise they cause corruption
"SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()"FILESTREAM data cannot be stored remotely
FILESTREAM columns cannot be INCLUDED columns
In-place partial updates not supportedNew file created and old file only available for deleting after commit
Documents and Multimedia
Migration SQL BLOBs to FILESTREAM
Conceptual walkthroughCreate one (or more) FILESTREAM (filegroup)Alter table to add FILESTREAM columnFor each existing row, update data in new row with empty FILESTREAM valueWrite LOB data (File I/O or T-SQL access)Optional: drop Varbinary(MAX) column
BLOB migration utilities support (intended for SQL Server 2008 R2)
Case Study – What are we doing?
Department of Education – MpumalangaScript and Marksheet Management SystemTrack, Scan & Store 500k+ Marksheets every year.650 GB – 1.3 TB of Files per AnumFILESTREAM Win 32 API Write and Read Access (C# VS 2008)Partitioned over 6 LUNsUse ITFsSharePoint Search Portal
Customer Application
Remote Blob Store Architecture
Customer applications can transparently support different BLOB storesEach Remote Blob Store vendor responsible for delivering their own providers
NetApp lib IBM lib Centera lib
SQL RBS API
NetApp IBM Centera
SQL DB
Provider API
RBS Services•Create•Fetch•GC•Delete
Documents & Multimedia
RBS Workflow
Application
RBS Client Library
BLOB Store Provider Library
BLOB Store SQL Server
ClaimID ClaimDate PhotoRef
4390 6/5/2007 <Binary(20)>1
2
3
1 Write BLOB(Photo)
Return Blob ID
Write Blob ID to PhotoRef field
2
3
Machine Boundary
RBS Fundamentals
Most useful in environments where interoperability is requiredNo restrictions on back-end storeBack-end can change with no app changeLooser (link level) consistency guaranteesSQL Server handles link consistency and garbage collection
Unstructured Storage In SQL08File Stores /
External Blob Stores (CAS)
SQL BLOBs Remote Blob API FILESTREAM
Streaming Performance
Depends on external store
Depends on external store
Link Level Consistency
Data Level Consistency
Integrated Management
Non-local Windows File
Serversn/a
External Blob Stores
n/a
Documents & Multimedia
FILESTREAM ComparisonComparison
point
Storage solution
File server / file system SQL Server (using varbinary(max)) FILESTREAM
Maximum BLOB size NTFS volume size 2 GB – 1 bytes NTFS volume size
Streaming performance of large BLOBs Excellent Poor Excellent
Security Manual ACLs Integrated Integrated + automatic ACLs
Cost per GB Low High Low
Manageability Difficult Integrated Integrated
Integration with structured data Difficult Data-level consistency Data-level consistency
Application development and deployment More complex More simple More simple
Recovery from data fragmentation Excellent Poor Excellent
Performance of frequent small updates Excellent Moderate Poor
Full Text Search Challenges
Indexes stored outside SQL Server lead to manageability challengesMixed query performance suffers from having to pull over complete full-text result setScaling issues on big boxes
Documents & Multimedia
Full Text Indexing Upgrades
Full-text indexes fully integrated into SQL ServerMake mixed queries perform and scale
SELECT * FROM candidates WHERE CONTAINS(resume,’”SQL Server”’)
AND ZipCode = ‘98052’
Spatial Data OverviewStorage and retrieval of spatial data using standard SQL syntax
New Spatial Data Types (geometry, geography)New Spatial Methods (intersects, buffer, etc.)New Spatial Indexes
Offers full set of Open Geospatial Consortium components (OGC/SQL MM, ISO 19125)Spatial Builder InterfaceSSMS VisualizationIntegration with Virtual Earth
Spatial
See also:DAT03-HOL Integrating Microsoft SQL Server 2008 Spatial Support with Microsoft Virtual Earth
XML ImprovementsImproved XML Schema Validation
Support for storing and validating Office 12 Document formatsSupport for lax validationFull xs:dateTime support
Support for values without timezonetimezone preservation
Improved support for lists and union typesAdded support for let-clause in XQueryAdded fn:upper-case()/fn:lower-case()Added support forinsert sql:variable(“@xml”) into /a/b
XML
Storing Relational Data RelationalData
1
2
3
4
5
HierarchyID Store arbitrary hierarchies of data and efficiently query them
Large UDTs No more 8K limit on User Defined Types
Sparse Columns Optimized storage for sparsely populated columns
Wide Tables Support for hundreds of thousands of sparse columns
Filtered Indices Define indices over subsets of data in tables
1 3
4
5
2
Filtered IndexesFiltered Indexes and Statistics
Indexing a portion of the data in a tableFiltered/co-related statistics creation and usageQuery/DML Optimization to use filtered indexes and statistics
RestrictionsSimple limited grammar for the predicateOnly on non-clustered indexes
BenefitsLower storage and maintenance costs for large number of indexesQuery/DML performance benefits: IO only for qualifying rows
Relational BR Support
Summary
SQL Server 2008 will make it easier to create information-centric applications that require
Unstructured documentsXMLSemi-structured informationCombine the above with relational data by:
Reducing the cost of managing all types of dataSimplifying the development of applications which use relational and non-relational dataExtending services currently available for relational data to non-relational data
Future of Beyond Relational
Rich unstructured dataEnable existing BR apps to store data in SQL Server
Win 32 File I/O API compatibilityBetter integration of FILESTREAM and RBS programming modelsBetter scalability of FILESTREAMProperty Search and promotioniFTS improvements in functionality and scale/performance
Deep SpatialMore functionalityAcross BI components
ResourcesWhitepapers & Videos
FILESTREAM/RBS Whitepapers: http://msdn.microsoft.com/en-us/library/cc949109.aspx http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-manage-unstructured.aspx
What’s new for XML in SQL Server 2008: http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-whats-new-xml.aspx iFTS: http://msdn.microsoft.com/en-us/library/cc721269.aspx
SQL Server 2008 Business Value Calculator: http://www.moresqlserver.com
[email protected] & answer
www.microsoft.com/teched
Sessions On-Demand & Community
http://microsoft.com/technet
Resources for IT Professionals
http://microsoft.com/msdn
Resources for Developers
www.microsoft.com/learning
Microsoft Certification & Training Resources
Resources
Please complete an evaluation
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.