sql server 2012 database engine

Image Area

Whi

te P

aper

Abstract

Microsoft has come up with yet another fascinating release of SQL Server with a bunch of new features as well as significant enhancements to the existing feature set. While Always On feature addresses the critical high availability and disaster recovery requirements, Columnar indexes boosts up the performance of query execution drastically in warehousing scenarios. With SQL Azure gaining momentum, it is important to have code developed which is compatible across on-premise and cloud. SQL Server Data Tools helps to great extent to achieve this. Further, there are improvements in Full Text Search, introduction of Semantic search which strengthens the SQL Server searching capabilities on unstructured data, File table and many more features. This document is aimed to enable designers/architects to get an overview on these key features and help them in building solutions.

SQL Server 2012 Database Engine Key features and enhancements

www.infosys.com

Sandeep Kalra, Manoj Chandran Nair, Phaneendra Babu Subnivis

02 | Infosys

ContentsIntroduction ...3

Always on ...4

Contained databases ...6

Columnstore indexes ...8

Spatial data enhancements ...10

Search features & enhancements ...11

SQL Server Data Tools (SSDT) ...12

File Table ...14

Conclusion ...16

References ...17

Acknowledgements ...18

Infosys | 03

IntroductionMicrosoft SQL Server got established as a prominent database product with release of SQL Server 2005. From there on it had evolved significantly as a Business Intelligence (BI) suite with key enhancements in Integration Services, Analysis Services and Reporting services along with strengthening of database management capabilities. In SQL Server 2008 R2, inclusion of concepts like Self Service BI and features like PowerPivot along with BI suits integration with SharePoint, established it as an enterprise class product in this space. Currently SQL Server 2012 RTM is out in market with exciting feature set and enhancements.

Every release Microsoft will come up with themes based on which the overall feature set is identified to be part of the release. In SQL Server 2012, it came up with three themes i.e., Mission critical confidence, Breakthrough Insights, and cloud on your terms. While Breakthrough Insights cover more of features related to BI, this document covers, features coming under the themes Mission critical confidence and cloud on your terms.

The mission critical confidence theme has a bunch of features that addresses the likes of high availability, performance, maintainability, accessibility, and search enhancements. Following is the table describing the key feature set being mapped with the themes and brief description of each of these.

Mission critical

condence

Always On

Columnar Indexes

Contained Databases

A feature which enables to setup both high availability as well as the disaster recovery of the database environment. Further, it is allowed to take database backup/read only usage of secondary copies of the databases which are congured for failover purposes increasing ROI for the customers investments.

This is a relational database engine enhancement which improves data retrieval performance drastically in warehouse environments. However, there are certain limitations when this feature is enabled.

This is a new concept introduced which helps movement of database across dierent on-premise servers and even to SQL Azure. Current release only partial containment is supported while the end goal is to achieve full containment.

Theme Feature Description

Search

There are certain signicant enhancements made to existing Full Text Search (FTS) by introducing NEAR operator enriching users to provide exible search criteria. Further, there is a new feature namely Semantic Search introduced which helps in building key phrases based on the score of occurrence of the same. It also enables users to search for content based on the meaning.

Cloud on your terms

SQL Server Data Tools

Spatial data

File Table

This is an integrated environment to develop database as well as the Business Intelligence (BI) applications. It even facilitates development targeting to specic platform like on-premise/o-premise with support to intellisense.

There are improvements to existing spatial data capability by representing full globe wherever required. Support for representing circular shape has also been included. Further there are improvements made in indexes as well as precision to store longitude and latitude information.

An enhancement to FileStream data type where in the data is stored across database and File System (NTFS) in an integrated fashion. It also helps in accessing the data directly from the File System. It oers extensive exibility to access data both from database as well as the application side.

04 | Infosys

Always on

This is one of the key features planned to be part of SQL Server 2012 which was code named as HADRON initially. Later it has been publicized as Always On. Before getting into more details, let us have a quick look at the limitations that the current high availability options i.e., Failover Clustering, and database mirroring:

Let us look into more details of above mentioned features and get a feel of their capabilities and usage.

Figure 1: Failover Clustering

Failover clustering:

Mandatetoimplementattheinstancelevel.Hencefordatabasesofhugesizes,maintenancecouldbeaproblem

SecondnodeisunusableincaseofActivePassivemodeofimplementationofFailoverclustering.Thepassivenodecannotbeusedforany of data access or backup purposes.

Database mirroring:

Painfultoimplementforapplicationsaccessingmultipledatabases(Ex:LargeSharePointinstallations)

For disaster recovery, combination of above two solutions should be implemented with/without combining with Log Shipping with either of the one of the above two choices.

With AlwaysOn, Microsoft provides a solution which addresses both High Availability and Disaster recovery since this has capability of being implemented at multisite as well as the capability of using the secondary node for offloading reporting as well as the backup requirements.

Improvements in Failover Clustering Instance (FCI):

FailoverclusteringacrossmultiplesubnetsWithSQLServer2008andWindowsServer2008onwardsgeoclustering(clusteringacrossdatacenters/servers across disparate geographic locations) was made possible. Having said that, the downside of it is that the data centers should be connected through VLAN so as to meet the basic requirement of having all the nodes in the cluster belongs to the same subnet. Below figure depicts the architecture of the same:

Infosys | 05

In SQL Server 2012, the multi-subnet failover has been introduced which means there will be no pre-requisite for the nodes which participate in clustering needs to be part of the same VLAN. Below figure depicts the architecture of the same:

In case of multi subnet failover cluster configuration, failover time across subnet would be relatively longer when compared with the normal cluster. Hence it is advised to have a failover cluster at each of the location in the multi subnet scenario. This way in case of local failures, the failover happens with in the same cluster which would be faster. The failover across subnet happens only in cases of disaster (entire local cluster goes down).

MonitorhealthstatusThereare3typesinwhichtheSQLServerhealthismonitored

StateofSQLServerserviceTheWindowsServerFailoverClustering(WSFC)servicemonitorstheSQLServerserviceontheactivenodeanddetects when the service stops

Responsiveness of SQL Server instance While installing SQL Server, WSFC initiates a separate thread to exclusively monitor theresponsiveness of the. The server response is checked against the value configured for HealthCheckTimeout property

SQLServercomponentdiagnosticsThesystemstoredproceduresp_server_diagnosticsperiodicallycollectsthediagnosticsontheserverinstance.Thediagnostics are collected for the components system,Resource,queryprocess, io_subsystemandevents.Thefirst threecomponents are used for failure detection. Remaining two is used for diagnostic purposes only. This information is written into SQL Server Cluster diagnostics log.

Robust failure detection and management of the servers :

Current version of cluster health-check is based on the state of SQL Server service. As long as the service is running, the instance is considered to be UP/live and when the service stops/doesnt respond, then it is considered as node failure which subsequently initiates a failover.

In SQL Server 2012, this has been made more robust. It is being done in three ways:

Figure 2: Multi-subnet failover architecture

06| Infosys

DeterminefailuresTheSQLServerdatabaseengineresourceassemblydetermineswhetherthedetectedhealthstatusisaconditionforfailureusing FailureConditionLevel property. This property defines the detected health status and cause for failure overs/restart. There are multiple states based on which the failure condition is determined. Refer to the link: http://msdn.microsoft.com/en-us/library/ff878664(v=sql.110).aspx#respond for more details on this.

RespondtofailuresTheWSFCrespondstofailoverdependingonthefailureconditionsdetected.Anattempttorestartthefailednodeismade depending on whether FCI retains WSFC quorum. If FCI loses the WSFC quorum the entire FCI should be brought offline which means it had lost its high availability.

Availability groups:

An availability group is a feature that supports failover environment for a discrete set of user databases that can failover together. The database set is hosted by an availability replica. There are two types of replicas i.e., primary replica and secondary replica. Primary replica enables application to connect and perform read-write. Secondary replica serves as a potential failover environment for the availability group. The transaction log of primary replica is periodically (based on the configuration setup) applied on secondary replica to replicate the data across the environment and get it ready for handling a failover scenario. In an environment, there can be only one primary replica and one to 4 secondary replicas. The secondary replicas can be configured for read only operations and off load the reporting requirements to it. Any of the secondary nodes can be configured for database backup operations as well. Another important point is that (each of ) the secondary replica should exist in separate node of the same WSFC cluster.

Data synchronization across primary and secondary replicas happens in two ways i.e., Synchronous and Asynchronous (same as in database mirroring). In case of synchronous setup, data in primary replica gets completed only after it gets committed in the secondary replicas. For asynchronous setup, data initially gets committed in primary replica but gets moved onto secondary replica subsequently in batches. The later one is preferred as we dont want to have primary replica hampered due to the data synchronization.

An availability group can be defined as a group of databases put together for the purpose of high availability. For ex., consider an application which is dependent on several databases (say 5 databases) to access and store data, while designing high availability for this application, one should ensure that all these 5 databases should be part of the availability. Earlier, this type of requirements was addressed by configuring database mirroring at each database level (when clustering is not an option to go for because of its cost). With the availability groups feature, all these databases are put together defined as an availability group. Further, the high availability configuration is done so that whenever failover happens, it takes care of all five databases availability.

Basic requirement for Availability group configuration setup is to have WSCF. When the availability groups are setup, it should have one SQL ServerinstancehostingthemaingroupwhichisactiveandtitledasPrimaryrole.Theprimaryavailabilitygroupwillhavereadwriteaccessand that is the server with which the application interacts and performs the required operations on the databases. Along with this there would be another set of availability group defined as Secondary role. This would serve as a failover copy. There is a possibility of configuring one to four secondary groups.

Contained databases

Contained database is a new feature in SQL Server 2012 which supports the concept of containment and tries to align the SQL Server (on-premise) and SQL Azure (cloud) technologies. In this case, the database is considered as an application package which can be easily moved between SQL servers. All the database settings and metadata required to define the database resides inside the database itself and there is no dependency on the server instance.

A database can be fully contained, partially contained or uncontained. A full containment is provided by the SQL Azure databases and which can be easily moved to other SQL Server instances. As of now, for on-premise SQL Servers only partial containment is supported. The two prominent features in SQL Server 2012 that support containment are

Infosys | 07

Collation:Thecollationinformationforthedatabasewhichwasstoredinthemasterdatabaseontheserverinstanceisnowstoredinsidethe database itself. The database will retain the collation even though it is moved onto another server with a different collation. The same collation will be applicable for the temporary tables that are created inside the database.

ContainedAuthentication:InSQLServerManagementStudio,nowthereisnoneedtocreateSQLLoginobjectstoauthenticatetheusersatthe server instance. With the use of contained user objects, a user can directly log into the contained database in SSMS and start working. The user will still remain as a guest user on the server instance and wont be able to access other databases on the server instance.

The option for creating contained databases needs to be enabled on the SQL Server 2012 server instance as shown below. Once the above setting is enabled, then the database can be set to containment type as Partial or None in Database Properties --> Options

The concept of containment is to realize database as a package which can be moved between SQL servers. By categorizing the databases objects as contained and uncontained, we made sure that the constraint for application boundary has been imposed. Still the task of deploying the contained database to another database server needs to be carried out. DACPAC helps us to move databases between environments easily.

Figure 3: Contain DB Settings

Relevance of DACPAC with contained databases

DACPAC or data-tier application is a feature introduced in SQL Server 2008 R2 where in a SQL Server database can be implemented as a Visual Studio project and built into a deployment package. This deployment package can subsequently be used to deploy the database across the environment. As part of SQL Server 2012 and specifically SQL Server Data Tools, some enhancements have been introduced for creating data-tier applications (for more information, please refer to the section on Juneau).

A SQL database either contained or uncontained can be register as a data-tier application by right clicking on the database and selecting Tasks --> Register as Data-tier Application. This will create a DAC entry in the Management folder on the server instance as shown below. The validation check during the DAC creation makes sure that no uncontained objects or features are implemented in the database thus aligning it to SQL Azure.

Figure 4: Data Tier Application

08 | Infosys

Once the new DAC is created, using Visual Studio, it can be used to upgrade by right clicking the existing DAC in the Managements folder and selecting Upgrade Data-tier Application.

Contained databases and DAC are symmetrical to SQL Azure and would improve the movement of databases between on-premise and cloud versions of SQL Server. Though containment is a nice feature to start with, there are some limitations which the user has to face while using them like : Cannotusereplication,CDCandchangetracking Numberedprocedures Schema-boundobjectsthatdependonbuilt-infunctionswithcollationchanges Databasescannotbeaccessedinthesamesecuritycontextandneedtomakeuseofcontainedauthentication.

Going forward, more features will be supported in SQL Azure and any new features will be released in both SQL Server versions to align thetechnologies which would help to achieve full containment further improves the seamless portability of databases between on-premise andcloud databases. Also realizing your database as a deployment package using data-tier applications, issues with maintaining different versionsof databases will be reduced and will help in ease of deployment.

Columnstore indexes

Column store indexes introduced in SQL Server 2012 is a new way of storing and accessing the data residing in SQL tables. It provides improved query processing abilities and speed up the process of data analysis for business users. This would be hugely helpful in data warehouse scenarios where significant amount of time is spent on creating aggregations, summary tables or views in order to provide business value to the end user or while using MIS based reports on large relational databases. Using columnstore indexes, it is now possible to query the data warehouse tables containing millions of records directly and analyze the results quickly.

When column store indexes are created on a particular fact table, it stores each column in the table in separate disk pages rather than storing rows in a disk page as in case of traditional indexes. The figure below shows the arrangement of columns in disk pages.

The benefit of such an arrangement is that only the columns needed to execute the query which in reduces I/O. Also these frequently accessed columns remain in memory which improves query performance and execution. Also due to column wise data storage, the values can be highly compressed due to similar values in a particular column. SQL Server makes use of Vertipaq (now called xVelocity) technology which provides high compression and better buffer hit rates. This same technology is used in Powerpivot and the newly introduced Business Intelligence Semantic Model (BISM) tabular models in Analysis services.

The syntax for creating a column store index is as follows:

CREATE NONCLUSTERED COLUMNSTORE INDEX NewNCCSI ON DemoTable(Column1,Column2,Column3)

Figure 5: Columnstore Index Creation

Alternatively, you can create it in Management Studio as well by right-clicking on Indexes folder and selecting New Index Non-Clustered Column Store Index. The below window will open where you can add the columns to be included as part of the index.

Infosys | 09

Note that there are only specific data types that can be added as part of the index (int, bigint, float, char, varchar, nvarchar, decimal, money, bit, Date, time etc). Also there can be only one non-clustered column store index per table and it cannot contain SPARSE Columns.

The above screenshots show that with column store index, the query hardly took a second the fact table contains around 17 million records. Columns store indexes can cater to more complex scenarios wherein, if the query is properly structured, it is bound to provide huge performance benefits. However please note that, even though the index is created, it depends upon the optimizer when to use it. The optimizer may still make use of the traditional clustered/non-clustered indexes as seemed fit for a particular scenario.

Column store indexes will benefit in scenarios where in there is a need to quickly analyze large amounts of data residing in a data warehouse and obtain useful information instead of maintaining dimensional models and aggregations. This would in turn reduce the development costs and increase business value with rapid exploration. However, column store indexes cannot be thought as a replacement for UDM and cubes which would still exist and be first choice for data warehouse scenarios. Also as mentioned in the previous paragraph, column store indexes will be only used in cases where the query optimizer feels it is apt; it wont be used if the query uses a seek operation.. Also another point worth noting is that the table with a column store index cannot be updated and the index needs to removed and rebuild after each incremental update.

Figure 6: Columnstore Index Execution Plan

Spatial Data enhancements

Spatial data types were introduced in SQL Server 2008 to provide the ability to store location information (geospatial data) in SQL databases. Along with the data types, came a host of T-SQL methods and functions which can be used analyzing and querying location information stored in these data types. Additionally there were provisions to improve the performance of these queries using spatial indexes.

As part of the SQL Server 2012 suite, there have been some major upgrades to the spatial data types

Introduction of Circular Arcs: Circular arc objects can be created in SQL Server alongside Line strings and Polygons. Collection of zero or more continuous arc segments is termed as Circular String. A geometric collection object can be created to include CircularStrings and LineString objects, this collection is termed as Compound Curve. Same way, collection of CircularStrings, LineStrings, CompundCurve having at least 4 points and same X and Y coordinates of start and end point can be termes as CurvePolygon. Using circular arcs, the locations on the earth can be represented more accurately e.g. curved roads or water bodies. Also there are new extended methods on geometry instances like BufferWithCurves(), CurveToLineWithTolerance() and new geography Data type method references like STNumCurves(), STCurveN(), STCurveToLine() which can used for circular arc segments. Improved Precision: The precision for storing the longitude and latitude values in spatial data objects have been improved to 48 bit compared to 27-bit in earlier versions. This would help in plotting the locations on the map area with more accuracy or while using spatial methods during analysis.

Full globe Support: The restriction that a spatial object should be no larger than a single hemisphere has been removed. In SQL Server 2012, objects can be created which can represent the complete earth as an object. This is done with the use of the key word FULLGLOBE during object creation. This could help in represent large objects which earlier had to implemented as combination of different spatial objects.

Spatial Index Improvements: In SQL Server 2012, during spatial index creation, there is option now to set the appropriate grid system to be used. The two options available are auto grid and manual grid. Wherein manual grid stands for the traditional grid system (with 4 grid levels) used in SQL Server 2008, auto grid has been introduced in SQL Server 2012 with 8 grid levels for better approximation of grid objects.Alsothegridcanbespecifiedbasedonthespatialdatatype:GEOGRAPHY_AUTO_GRIDorGEOMETRY_AUTO_GRID.

10 | Infosys

AnotherfeatureaspartoftheperformanceimprovementsistheSPATIAL_WINDOWS_MAX_CELLSqueryhintwhichcanbeusedduringspatialanalysis as follows

This query hint helps in controlling the tradeoff between performance and index efficiency. The default values for geometry and geography data types are 512 and 768 respectively but this can be changed to suit specific needs. Compression is also supported now for spatial indexes. For DBAs two special helper stored procedures to evaluate the histogram is available:

sp_help_spatial_geography_histogramHelpsdeterminehowmanycellsagivenfeatureintersects.Outputsagridatagivenlevelandresolution. Useful for performance tuning.

sp_help_spatial_geometry_histogram Helps determing how geography index cells are created and in turn determines howmanyfeatures intersect each cell.

Both of these features helps in visually analyzing and tuning of indexes.

Overall, there have been significant improvements for spatial data in SQL Server 2012 and this in turn opens up a lot of options for creatingspatial applications. The support for curve objects, full globe and performance improvements in spatial indexes provide opportunities tobuild complex spatial queries which otherwise had to be implemented on the application front. Also with improved precision, locationinformation can be accurately represented and plotted onto maps and visualizations to provide informed value and help is making betterdecisions.

Search features & enhancements

Traditionally the Full Text Search (FTS) feature is used to perform search on objects which are stored in blob kind of storage in the database. This has a separate search engine which took care of performing the search based on catalogs and indexes that were built while creating the objects. Till SQL Server 2005, the FTS engine was outside SQL Server core database engine because of which there were certain concerns on things like scalability, security etc. However, this has been addressed in SQL Server 2008 release where in the FTS is fully integrated with the database engine and the functionality was further enhanced. While there were quite a few enhancements to this feature in SQL Server 2008 R2 release addressing some of the performance related issues, in SQL Server 2012, they introduced semantic search along with FTS to cater the requirements of searching data stored in blobs. Below section describes about the enhancements done to FTS in SQL Server 2012 release and also provides an overview of Semantic search capabilities.

Full Text Search (FTS):Following are some of the key enhancements made in FTS as part of SQL Server 2012 release:

Optimizedqueryperformancewhile indexupdateshappening inparallel.When indexupdatehappens, therewillbeashared lockonschema which was causing this bottleneck. There has been significant boost in performance (10X in most of the cases as published by Microsoft) without having to change the structures.

SupportforpropertyscopedsearchesisenhancedwhereinuserscanissuequeriesonthedocumentswhicharestoredinatablewithFTSisenabled. Users can query those document properties like author of the document using CONTAINS clause without maintaining the author of the document as separate column.

ThereisanintroductionofcustomNEARoperatorwhichenablesuserstoprovidethesearchwithfollowingoptions:

Searchforwordswithanumberofwordsbetweenthem.Forexample,toidentifydocumentswhichhaveapatternofwordsSQLServer and expertise and 5 words between these two terms. This can be written as WHERE CONTAINS (Resume, NEAR((SQL Server ), expertise, 5, false)). Here, order of SQL Server and expertise is not mandatory for the record to be part of the result set.

IfitiswrittenasWHERECONTAINS(Resume,NEAR((SQLServer),expertise,5,true))meansthattheorderofwordsismandatory.While searching, SQL Server results only that content which starts with SQL Server followed by expertise.

For more information on these enhancements, go through the links given in the References section.

SELECT *FROM table t with(SPATIAL_WINDOW_MAX_CELLS=1024)WHERE t.geom.STIntersects(@window)=1;

Infosys | 11

Semantic search:

FTS helps users to search those documents which are meeting the search criteria. However, it has a limitation that it cannot fetch the related documents for the criteria set. This is where semantic search comes to rescue. Statistical Semantic Search goes one step ahead and enables users to discover statistically relevant insights through efficient keyword searches . Semantic Search picks up all key phrases across the documents with ranking based on the occurrence of the word/key phrase, that can be used further to carry out search operations and provide the relevant documents for the criteria set.

Currently, there are 3 functions provided to accomplish the semantic search:

EMANTICKEYPHRASETABLE: Used to find keyphrases in the document.

SEMANTICSIMILARITYTABLE:Usedtofindsimilarorrelateddocuments.

SEMANTICSIMILARITYDETAILSTABLE: Returns a table containing the key phrases common across two documents whose content issemantically similar.

Note:

Semanticsearchneedsafulltextindextobedefined

DocumentsshouldbestoredaspartofFileTableinordertoworkwithsemanticsearch.Formoreinformation,refertoFileTablesectioninthis document.

SQL Server Data Tools (SSDT)

SQL Server Data Tool (Code Name: Juneau) is a new integrated tool for database development to be made available in the next version of Visual Studio. This tool brings to developers the familiar Visual Studio Tools along with standard database features like Intellisense, T-SQL Editor, Visual table designer and lot more. With this new tool, the database development will be the same experience as it is in the application development. People familiar with previous versions of database projects will find this tool as an evolution of the existing project templates. It also has SQL Server Management Studio (SSMS) kind of browsing structure for hardcore database developers.

SSDT is aimed at professional database and application developers, as well as administrators. Some of the core feature set involves: OnlineDatabasedevelopmentwithaSSMSliketreeviewinserverexplorer Importfromexistingdatabase,snapshotorDACPACproject Intellisensesupportacrossallthedatabaseobjectfiles Newdesignertoolforvisualexperience Refactoringtablenames,fieldnames,etc.withoutlosingdata NavigationtoolsforofflinedevelopmentinT-SQLEditor ErrorscausedimmediatelyshowupintheErrorlistpaneandareplatformspecific Previewingdatabaseupdateswithanoptiontopublishontargetdatabase Schemacomparisonbetweencurrentprojectandexistingstateofthedatabase Optiontosyncthechangesonceschemacomparisonisreviewed OnlinevsOfflineProjectDevelopment

Core Features

Figure7: SQL Server Data Tools (Source: TechEd 2011)

12 | Infosys

Online Mode

SSDT provides an online development platform in the same way as there in management studio (SSMS), where you can connect to an instance of SQL Server through Server Explorer. You will see the same hierarchy tree structure as there in SSMS, and it also includes the T-SQL Editor and Intellisense support.

For Troubleshooting errors, SSDT gives the facility of editing in either the T-SQL editor or table designer. The errors immediately show up in error pane, which enables to follow errors for further troubleshooting.

Offline Mode

You can also create a database project using SSDT and import an existing database for offline development. This will import the schemas and database objects in solution. While working in offline mode, the same T-SQL editor and designer tools will still work. This means you would still get the management studio kind of interface to work with different databases in different versions of SQL Server (including SQL Azure).

The T-SQL Editor for offline development also provides for two useful navigation tools for Visual Studio developers, Go to Definition and Find All References. Right clicking a particular table in a SP/Function will give you the option to Find All References of the table in the project. You can also right click on the table name and click Go To Definition, which will open the T-SQL for table definition.

Project Deployment on target platform, Code Analysis

Using the project properties dialog box in Visual Studio, one can change the target platform on which the project needs to be deployed. Currently supported platforms are SQL Server 2005, SQL Server 2008/2008 R2, SQLs Denali (2012) and SQL Azure. You can also create a DACPAC file as output or a SQL script file.

Right clicking on the project and running Code Analysis will give you errors in scripts based on the platform you selected. Same is the case when you build the project.

Once all the errors are removed, you can publish the project. You also have options to define your publishing preferences, which can then be saved in a profile to be reused in future deployments.

Figure 8: SSDT Project Deployment

Figure 8: SSDT Project Deployment

Infosys | 13

SQL Compare and Sync

This is a roll-forward of the Visual Studio Database edition power tools, which has been integrated now to the main product. The SQL compare box provides an out of box functionality to visually compare two different databases (Right click on solution -> Schema Compare). You also have the option to update the databases with the changes to get them in sync or generate the script to be executed at later point.

Overall, SQL Server Data Tools provides functionalities for app-tier as well as data-tier applications where developers can perform their development work against any SQL platform from within the visual studio. This tool can also be used by database administrators to script out objects which can be moved across different servers, store the versions in TFS and also get a utility to compare the changes done by developers with existing database before applying them on the actual environment.

Figure 9: SSDT Schema Compare

File Table

Most of us are aware of the Filestream object feature introduced in SQL 2008, which integrates the relational database engine with the windows file system to provide efficient storage and management of data. FileStream was one of major feature introduced in SQL to store unstructured data and maintain transactional consistency between structured and unstructured data.

File Table is a new feature introduced in SQL Server 2012, which builds on the already existing functionality of FileStream objects. File Table takes the concept of Filestream object one step ahead and enables the non-transactional access to file table data. This means the file table data can be accessed through SQL in a transactional way, as well as by the Windows APIs as if it was accessing a file object. File table basically converts SQL tables into folders which can be accessed through Windows explorer. The directory structure and the file attributes are stored into Table as columns. Files can be bulk loaded, updated as well as managed in T-SQL like any other column. SQL Server also supports backup and restore job for this.

stream_id le_stream name .........NTFS

Folder

Figure 10: File Table

14 | Infosys

File Table is a great step in the direction of maintaining unstructured data in SQL which is currently residing as files in the servers. Enterprises and other applications can move their file sources into file tables which will enable integration with administration capabilities provided out of box in SQL Server. At the same time, existing Windows applications which access these files through file system will continue to function as such through non transactional access to file table data.

File Table Schema

AFileTablerepresentsahierarchyofdirectoriesandfiles.Path_Locatorandparent_path_locatorfieldsareusedtorepresentingthefilesandthecorrespondinghierarchytheybelongto.Theis_directoryattributeisusedtoidentifyiftherowrepresentsfilesordirectory.

File table contains a fixed schema and is currently nor updatable. Every row contains the following items:

Figure 11: File Table [Pre-Defined Schema of File Table]

File table internals

A File Table is a specialized user table which already has a pre-defined and fixed schema. The only column values which File Table lets you specify are FileTable Directory Name and FileName Collation. Additional objects are automatically created while creating a file table. To get a list of objects, run this query:

Loading files into file table is as simple as drag and drop files in the file table directory in Windows Explorer. You can also use command line options such as XCopy, RoboCopy etc. to achieve the same. SQL Server has in-built triggers which intercept the Windows APIs, and automatically insert/delete entries as and when new files are inserted/deleted into the File Table directory.

Overall, file table looks like a powerful tool which brings conglomeration of Windows APIs with file data stored in SQL server. File table lets an application integrate its storage components with DB space, and provides integrated SQL Server services which can be used for data management and data analysis for unstructured data and metadata.

SELECT object_name(object_id) Object Name,object_name(parent_object_id) as File Table

FROM sys.filetable_system_defined_objects

Infosys | 15

Conclusion As always, we will soon witness yet another exciting SQL Server release which has many prominent features which would enable user/developer community and customer in many ways not only from the database engine and storage perspective but also with respect to the BI and cloud flavors of its offerings. We hope that we were successful in enabling the readers to get an initial handle on the new capabilities/enhancements of SQL Server 2012 in this document which is an output of our experiences gained while working with its early bits. We will continue to contribute to the community about our experiences in the BI space of SQL Server 2012 in another article.

Refe

renc

es Spatial Searchhttp://msdn.microsoft.com/en-us/library/ff929187.aspx

SSDThttp://www.jamesserra.com/archive/2011/07/sql-server-%E2%80%9Cdenali%E2%80%9D-sql-server-developer-tools-codename-juneau/

http://msdn.microsoft.com/en-us/data/gg427686

Full Text Search and Semantic search http://technet.microsoft.com/en-gb/library/cc721269(SQL.100).aspx#_Toc202506226

http://blogs.msdn.com/b/sqlfts/archive/2011/04/12/sql-server-2008-r2-fulltext-search-fix-for-improving-queries-performance-during-concurrent-index-updates-http-support-microsoft-com-kb-958947.aspx

http://msdn.microsoft.com/en-us/library/hh213079(v=SQL.110).aspx

http://blogs.msdn.com/b/sqlfts/archive/2011/06/02/fulltext-search-improvements-in-sql-server-denali-ctp1.aspx

http://msdn.microsoft.com/en-us/library/ms187787(v=SQL.110).aspx

http://blogs.msdn.com/b/sqlfts/archive/2011/07/21/introducing-fulltext-statistical-semantic-search-in-sql-server-codename-denali-release.aspx

http://msdn.microsoft.com/en-us/library/ms143544.aspx

http://msdn.microsoft.com/en-us/library/gg492075.aspx

File Table http://www.infosys.com/microsoft/resource-center/Documents/SQLServer-FILESTREAM-BLOBs.pdf

http://msdn.microsoft.com/en-us/library/gg492084(v=SQL.110).aspx

http://ozamora.com/2010/11/denali-the-next-release-sql-server/

http://coolthingoftheday.blogspot.com/2011/07/sql-server-denali-filetables-feature.html

http://lennilobel.wordpress.com/2011/09/11/its-a-file-system-its-a-database-table-its-sql-server-denali-filetable/

Acknowledgements We would like to thank Vinod Kumar, Pinal Dave and Vikram Rajkondawar from Microsoft for taking out time from their busy schedule to review our paper and providing thoughtful comments which helped us improve the content of this paper. Also we would like to thank our manager Naveen Kumar for his kind support and providing us with the resources for implementing and trying out the new features of SQL Server.