16.07.2015 sql server 2008 beyond relational yoni okun dba team manager srl group [email protected]
Post on 22-Dec-2015
221 views
TRANSCRIPT
01:08:26 AM
SQL Server 2008
Beyond Relational
Yoni Okun
DBA Team Manager
SRL Group
1999- 2003 (20 Ex)
12Exabytes
2003: 7.0
2002: 5.3
2001: 3.8
2000: 2.9
1999: 2.2
New Digital Information Created
55% in personal PCs
16% in corporate data warehouses
Internet only 21 TB
Email 500x more than Internet/year
(~400TB)
Data Storage Explosion!
Magnetic
Analog
28%
Magnetic
Digital
64%Film
7%Paper
1%
Storage of New Information
Hard Drive Prices
1980…40,000$!
1 Megabyte
2 – 250 page novels
1 Gigabyte:
a pickup truck filled with books
2 Terabyte:
100,000 trees made into paper and printed.
2 Petabytes:
All U.S. academic research libraries
2 Exabytes:
Total volume of information generated in 1999
Data Storage Explosion!
Agenda
Spatial Data Types Filestream Data Storage
HierarchyID Data Type
Sparse Design – Columns & Stats
Filtered Indexes
Integrated Full Text Search
Administrative Features
Spatial Data Type
Answers to location-based queries
Which roads intersect campus?
Does my land claim overlap yours?
All Italian restaurants within 5 km
Does Your Database Include Address? SELECT *FROM roads WHERE roads.geom.Intersects(@ms)=1
Where are all SQL Server Geeks?
Spatial Data Type
Geography
A geodetic model A planar model (Created by the Mercator projection)
Geometry
Supports two dimension Only While Points can hold members:
Z (for elevation) and M (for measure)They are Not used in any spatial method
Coordinate order is Latitude /Longitude Longitude/Latitude In RTM
Spatial Data Type
1. Use with DDL
DECLARE @x geography, @y geographyselect @x = HotelLocation from Hotels where HotelName = 'Dan'select @y = HotelLocation from Hotels where HotelName = 'Herdos'
2. Insert Values
3. Use Methods & Properties to work with data
CREATE TABLE Hotels(HotelName nvarchar(200), HotelLocation )
INSERT INTO Hotels VALUES ('Dan', ),('Herdos'
)
geography
geography::Point(29.548306566591655,34.96617794036865,4326) , geography::STGeomFromWKB(0x0101000000B699452D3C8C3D4000000082 877B4140,4326)
select @x.ToString(), @y.ToString()select @x.STDistance(@y)
Working With Spatial Data Type
Input: Binary - ST[Type]FromWKB Text - ST[Type]FromText GML – GeomFromGml
Output: Binary - STAsBinary Text - STAsText GML - AsGml Text with Z and M – AsTextZM
SRID: Represent different assumption around what is the earth elipse. SRID 4326 – GPS sys.spatial_reference_systems
Spatial Data Type
Descriptive STArea STLength STCentroid
Relation between two instances STIntersects STDistance
Collections STGeometryN - geography element in a GeometryCollection
STPointN – Nth point of a geometry
Useful Methods / Properties
Spatial Is Indexed With An Adaptive Multi- level Grid
Index Is Integrated Into The SQL Server
Index Consists Of A Grid- based Hierarchy
Each Level Subdivides The Grid Sector That Is Defined In The Level Above
Spatial Index - Conceptual Model
Spatial Indexes
1. Decomposing space into Grid Hierarchy:
Grids parameter determines density:
Low = 4x4
Medium = 8x8 (Default)
High = 16x16
4x4x4 = 65,536 cells !!
2. Tessellation
Fitting Objects Into Grid (Touched Cells).
Spatial Indexes
Level 1 Intersections Level 2 Intersections Level 3 & 4 Intersections85 matching Cells
Complete Match Cells Aren’t Broken To Lower Level(42 Cells)
Cells Per Object Stops Tessellation At Limit: (Per Object = 15,Cells = 13)
Tessellation Process
CREATE SPATIAL INDEX Sindx_col2ON SpatialTable (geometry_col)USING GEOMETRY_GRIDWITH
(BOUNDING_BOX = ( xmin=0, ymin=0, xmax=500, ymax=200 ),GRIDS = (LOW, LOW, MEDIUM, HIGH),CELLS_PER_OBJECT = 64
)
BOUNDING_BOX - GEOMETRY index only
GRIDS - 4 Grid Levels, Grid Densities For Level - Low, Medium, High
CELLS_PER_OBJECT - number of cells recorded for matching
Creating Spatial Indexes
Agenda
Spatial Data Types
FILESTREAM Data Storage HierarchyID Data Type
Sparse Design – Columns & Stats
Filtered Indexes
Integrated Full Text Search
Administrative Features
To Blob Or Not To Blob?
Mgmt Complexities Vs. Streaming & Performance
Cons
LOBS take Memory buffers
Updating LOBS cause fragmentation
$ Per GB
However, File system "update" is delete and insert
Pros
Transactional consistency
Point-in-time backup & restore
Single storage and query vehicle
?
FILESTREAM Storage
• Low cost per GB• Streaming Performance
• Complex application development & deployment
• Integration with structured data
Advantages
Challenges
• Integrated management• Data-level consistency
• Poor data streaming support
• File size limitations• Highest cost per GB
• Lower cost per GB at scale• Scalability &
Expandability• Complex application
development & deployment
• Separate data management
• Enterprise-scales only
Example • Windows File Servers• NetApp NetFiler
• EMC Centera• Fujitsu Nearline
• SQL Server VARBINARY(MAX)
Use File ServersUse File Servers
DBDB
ApplicationApplication
BLOBsBLOBs
Dedicated BLOB StoreDedicated BLOB Store
DBDB
ApplicationApplication
BLOBsBLOBs
Store BLOBs in Database
Store BLOBs in Database
DBDB
ApplicationApplication
BLOBsBLOBs
FILESTREAM Storage
FEATURES: Uses NT Cache For Caching File Data.
SQL Bpool Not Used And Is Available To Query Processing
Win 32 File System Interface Provide Streaming Access To Data
Compressed volumes are supported
FILESTREAM Combines The Best Of 2 Worlds Integrates DB Engine With NTFS
Storing BLOB Data As Files
FILESTREAM Storage
It’s not only about Storing But About
Working With BLOBS:
Image Analysis
Voice Interpretation & Scripting
Mixing Satellite Feeds & Spatial Data Type For
Weather Reports
Etc..
FILESTREAM Storage
At Database Level
Declare A Filegroup & Map To Directory
At Table Level
Define On VARBINARY(MAX)
Must Have UNIQUEIDENTIFIER Column
Integrated Security & Management:
Permissions On FILESTREAM Implied On Files.
Tools And Functions Work For Filestream Data. (Backup)
FILESTREAM Implementation
FILESTREAM Programming
Dual Programming Model
TSQL (Same as SQL BLOB)
Win32 Streaming File IO APIs
1. Begin a SQL Server Tran
2. Obtain a symbolic PATH NAME & TRANSACTION CONTEXT
3. Open a handle using sqlncli10.dll - OpenSqlFilestream
4. Use Handle Within System.IO Classes
5. Commit Tran
FILESTREAM Programming
// 7. Commit transaction, cleanup connection. - txn.Commit();
// 1. Start up a database transaction - SqlTransaction txn = cxn.BeginTransaction();
// 2. Insert a row to create a handle for streaming.new SqlCommand("INSERT <Table> VALUES ( @mediaId, @fileName, @contentType);", cxn, txn);
// 3. Get a filestream PathName & transaction context.new SqlCommand("SELECT PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT() FROM <Table>", cxn, txn);
// 4. Get a Win32 file handle using SQL Native Client call.SafeFileHandle handle = SqlNativeClient.OpenSqlFilestream(...);
// 5. Open up a new stream to write the file to the blob.FileStream destBlob = new FileStream(handle, FileAccess.Write);
// 6. Loop through source file and write to FileStream handlewhile ((bytesRead = sourceFile.Read(buffer, 0, buffer.Length)) > 0) {destBlob.Write(buffer, 0, bytesRead);}
Remote FILESTREAM storage - Not supported
DB Snapshot and Mirroring – Not supported
Features not integrated
SQL Encryption
Table Value Parameters
Initial FILESTREAM Limitations
Agenda
Spatial Data Types
FILESTREAM Data Storage
HierarchyID Data Type Sparse Design – Columns & Stats
Filtered Indexes
Integrated Full Text Search
Administrative Features
Scenarios:
List Forum Threads
Business Organization Charts
Product Categories
Files/Folders Management
Features:
Compact - 100,000 Nodes, 6 Level ~ 5 Bytes / Node
Available To Clr Clients As The Sqlhierarchyid Data Type
HierarchyID
2k5 Alternatives - Adjacency Model
Pros:
Understandable
2k5 – Recursive CTE
Cons:
De-Normalized (Personnel+Chart)
Not Set-Based
Holds Path As A String Concatenation
Pros:
Logical Representation
Cons:
Searches Done With String
Functions
And Predicates On Those Path
Strings
2k5 Alternatives – Path Enumeration
"Left" And "Right" Columns Represent Edges
Pros:
Predictable , Set Based Results
Cons:
Must Be Maintained Separately
2k5 Alternatives – Nested Sets
1. Insert Root
2. Insert 1st Subordinate
3. Enter Rest of Tree
4. Query Hierarchical Data
5. Reparent Employee
6. Add Subordiante
7. Reparent Node
Demo Structure
GetRoot() - root of hierarchy tree
ToString() - Logical string representation of Value
GetDescendant()- a child node x of this
GetAncestor() - hierarchyid of the nth ancestor of this
IsDescendantOf() - true if child is a descendant of this
GetLevel() – integer representing depth of the node this
GetReparentedValue() -path to newRoot, followed by oldRoot to this
HierarchyID Methods
Depth-first Index
HierarchyID Indexes
Breadth-first Index
Employees that Report Directly
To The Same ManagerEmployees That Report
Through A Manager
Agenda
Spatial Data Types
FILESTREAM Data Storage
HierarchyID Data Type
Sparse Design – Columns & Stats Filtered Indexes
Integrated Full Text Search
Administrative Features
How Can We Model property bags?
Products Catalog
Lab tests with different readings per test
Because they don't appear on each row
they are difficult to model
Sparse Properties
Entity-Attribute-Value
Non Relational
Value Column Issues
Need PIVOT to Make Sparse
Modeling Sparse Properties
Xml Non Relational
Complex Updates
Sparse Columns Hit 1,024 Limit
Storage
Sparse Columns
Efficient Way Of Managing Empty Data
Null Data No Physical Space
30,000 Column Limit (RTM)
1024 For "Non-sparse" Columns
Column Set - Xml Of All Sparse Values
For Web Site That Needs To Show The
Properties
Data type Nonsparse bytes Sparse bytes NULL percentage
int 4 8 64%
float 8 12 52%
money 8 12 52%
datetime 8 12 52%
uniqueidentifier 16 20 43%
date 3 7 69%
Varchar, char, nchar, nvarchar, binary, varbinary
4+avg. data 2+avg. data 60%
decimal/numeric(38,s) 21 17 42%
Sparse require more storage for nonnull values then regular Columns
NULL Percentage
percent of the data that must be NULL for savings 40% space
Sparse Columns
Agenda
Spatial Data Types
FILESTREAM Data Storage
HierarchyID Data Type
Sparse Design – Columns & Stats
Filtered Indexes Integrated Full Text Search
Administrative Features
Filtered Indexes
Dramatic Effect on Performance & Storage:
1. Smaller. Stats is more accurate
2. Reduced Management Costs (on Changes)
Different indexes on frequently \ infrequently
changed columns.
3. Storage-wise is the same.
Do your queries relate to subsets of Data??
Candidate Columns:
1. Heterogeneous Categories Of Values
2. Columns With Distinct Ranges Of Values
3. Partitioned Tables
4. sparse columns
Can keep track of only non-null value distribution
Filtered Indexes
Create Index Ind1 on t(c1)
where C1=‘A’ or C1 = ‘D’
Create index Ind2 on t(sc7)
where sc7 is not null
Filtered Indexes
Agenda
Spatial Data Types
FILESTREAM Data Storage
HierarchyID Data Type
Sparse Design – Columns & Stats
Integrated Full Text Search Administrative Features
Integrated In The Engine
Stored As Internal Table
Index Changes Are Now Fully Logged
Log Backup, Log Shipping, Db Mirroring
New Dmvs To Access Full-text Data
Sys.Dm_fts_index_keywords
Sys.Dm_fts_index_keywords_by_document
DBCC CHECKDB Validates Full-text Index Structures
Integrated Full Text
Noise Words Now Stop Lists Altered On The Fly Configurable For Column, Or Query Time
Query Semantics Transparent Dm_fts_parser Returns Query Terms
Query Improvements All Relational Operators Can Be Used Filters Can Be Applied Anywhere No Surrogate Keys In Use (If PK Is Integer)
Integrated Full Text
Agenda
Spatial Data Types
FILESTREAM Data Storage
HierarchyID Data Type
Sparse Design – Columns & Stats
Integrated Full Text Search
Administrative Features
Resource Governor
‘Quotas’ on SQL Server
Scenarios: Run-away queries Unpredictable workload execution Setting workload priority
Usages Consolidation Servers Moving Test To Prod
Resource Governor: Limitations
Database Engine only
Each instance controlled individually
I/O controls are planned for V2
Certain workloads may not be entirely suited
short-lived OLTP queries
Resource Governor: Concepts
User connects
Connection is: Classified Assigned to a workload
group
Workload group is already bound to a pool with limits
Queries execute within the limits of the pool
65
Resource Governor Resource Pools
MAX settings are only enforced when contention occurs
E.g. If a pool has a max CPU of 10% and another at 90%, why can a query from pool1 exceed 10% CPU?
Two possibilities: multiple CPUs or the second pool is not using its max CPU
Image and scenario taken from the PSS blog
Resource Governor: Dynamic Control
ALTER RESOURCE GOVERNOR {DISABLE | RECONFIGURE}
RECONFIGURE starts the governor if it’s currently disabled
Resources
SQL Server 2008 Home Pagehttp://www.microsoft.com/sqlserver/2008/en/us/default.aspx
SQL Server 2008 Demos and Videos http://www.microsoft.com/sqlserver/2008/en/us/demos.aspx
Microsoft Jump Start http://sqlserver2008jumpstart.microsofttraining.com/content/secure/AttendeeLogin.asp?CcpSubsiteID=69
Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet
Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx