copying, managing, and transforming data with dts
TRANSCRIPT
Copying, Managing, and Transforming Data
With DTS
Defining Bulk Insert Task Functionality
Quickly Loads Data from a File into SQL Server
Encapsulates the Transact-SQL Bulk Insert Statement
Supports Table or View Destinations in SQL Server
Loads Data with No Applied Transformations
Supports Format Files to Specify File Layout
Requires Sysadmin or Bulkadmin Fixed Server Roles Membership
The Bulk Insert Task is One of Three Ways to Run SQL Server Bulk Copy Operations
Sidebar: SQL Server Bulk Copy Operations
1) Bcp Utility
2) Bulk Insert Task or T-SQL Bulk Insert Statement
3) Bulk Copy APIs for OLE DB, ODBC, DB-Library Applications
What Do Bulk Copy Operations Offer?
Allow Fast Loading of Data into SQL Server
Configure Data Load Batches
Allow You to Control Logging Operations
Ways to Access Bulk Copy Operations
Defining the Sales_stage Table Load
DTSDTS
PolarisPolarisTab Delimited FileTab Delimited File
Using the Bulk Insert Task to Load Tab-delimited File Data into Sales_stage
Loading Sales_stage with Data Bound for Sales_fact
Defining Execute SQL Task Functionality
Executing SQL Statements
Source database must understand SQL syntax SQL statement determines task performance Task supports single or multiple SQL statements You can create queries in the DTS Query Designer
Running Parameterized Queries
Input parameters Output parameters
Using Parameterized Queries
Understanding Global Variable Basics
User-defined storage locations
Information is shared across package steps
Using Parameters with Global Variables
Assign global variable values to query input parameters
Store query results to a global variable with output parameters
Creating Dynamic Queries
SELECT *
FROM product_dim
WHERE product_name = ?
AND category_name = ?
Parameter 1
Parameter 2
Parameter
Parameter 1 The Parameter’s Position in the Query Determines Its Name
ProductName
CategoryName
Global Variables
Global Variables Provide Data to Input Parameters
? Question Marks Represent Query Parameters
Storing Query Results
SELECT begin_date,
end_date
FROM financial_period
WHERE quarter = 1
SELECT *
FROM product
Global Variables
BeginDate
EndDate
Product
Global Variable
Output Parameter
begin_date
end_date
Storing Row Values
Storing Entire Rowsets
Store Query Results in Global Variables
Output Parameter
Entire Rowset
Time_dim_build
Stored Procedure
DTSDTSInput Parameters
- @p_start_date
- @p_end_date
Defining the Time_dim Data Load
Defining the DTS Data Pump
DTS Mechanism for Moving and Transforming Data
Allows for High-speed Batch Copying of Data
Contains Supplied Data Transformations
Can Also Define ActiveX Script Transformations
Provides An Extendable COM-based Architecture That Allows for Custom Transformations (C++)
Permits the Application of Transformation Logic to Specific Phases of a Data Pump Operation
Multi Phase Data Pump
Understanding How the Data Pump Processes Data
OLE DB OLE DB ODBCODBC
Source
OLE DB OLE DB ODBCODBC
Destination
DTS Data PumpDTS Data PumpInIn OutOut
ActiveX ScriptActiveX ScriptCopyCopyTrim StringTrim String……CustomCustom
X Forms
1. Connects to the source and destination
2. Reads OLE DB metadata about source and destination columns
3. Gathers data transformation definitions
4. Implements the transformation
5. Writes completed record to the destination
Defining the Tasks That Transform Data
The Transform Data Task
Inserts
The Data Driven Query Task
Inserts
Updates
Deletes
The ParallelDataPumpTask
Processes hierarchical rowsets
The Transform Data Task
Inserts
Defining the Transform Data Task
Data Movement and Transformation Functionality
Copying data between heterogeneous data sources
Applying optional column level transformations
Extended Data Transfer Functionality
Supporting batch processing of data
Providing error-handling capabilities
Containing optimization settings for SQL Server destinations
Selecting Transformation Types
TransformationTransformationTransformationTransformation DescriptionDescriptionDescriptionDescription
ActiveX ScriptActiveX Script Invokes user-defined ActiveX scripts.Invokes user-defined ActiveX scripts.
Copy ColumnCopy Column Copies data from source to destination.Copies data from source to destination.
DateTime StringDateTime String Converts a date to a new destination format.Converts a date to a new destination format.
Lowercase StringLowercase String Converts a string to lowercase characters.Converts a string to lowercase characters.
Uppercase StringUppercase String Converts a string to uppercase characters.Converts a string to uppercase characters.
Middle of StringMiddle of String Extracts a sub string of source data.Extracts a sub string of source data.
Trim StringTrim String Removes white space from a source string. Removes white space from a source string.
Read FileRead File Copies contents of a file to a destination column. File path is specified by a source column.Copies contents of a file to a destination column. File path is specified by a source column.
Write FileWrite File Copies contents of a source column to a file. File path is specified by a second source column.Copies contents of a source column to a file. File path is specified by a second source column.
Defining Column Mappings
One-to-One Mappings
Symmetric Many-to-Many Mappings
Asymmetric Mappings
Creating Efficient Column Mappings
Minimizing the Number of Column Mappings
Using Many-to-Many Mappings When Possible
Grouping Common Transformations Together
Loading Customer_dim
Northwind OLTP SQL Server Database
Performance Settings
Enabling Fast Load
Using high-speed bulk copy processing
Accepting batches of transformed data
Only applies to SQL Server destinations
Using a Table Lock
Configuring Batch Size
Configuring Batch Size
Assembling Records into Groups
DTS commits records to database as a group
Insert batch size sets the number of records in the group
Understanding Default Behavior
Insert batch size is 0
DTS assigns one batch for all records
Setting the Insert Batch Size
Value between 0-9999
Setting value can improve performance
Defining SQL Solutions
You Can Use the Source Query of the Transform Data Task to Implement Data Transformations
The Source SQL Statement Must Be Understood by the Source Database
The Performance of the Source Query Depends on the SQL Statement
You Can Use Parameters in the Source Query to Create Dynamic Source SQL Statements
If You Use the Source Query to Manipulate Data, You Can Use the Copy Column Transformation to Load Data into the Destination
Applying SQL Solutions to Load Fact Tables
Using the Source Query to Join Staging Table Data to Dimension Tables
Retrieving Primary Key Values to Store as Foreign Keys on the Fact Table
Using a Copy Column Transformation in the Transform Data Task
Configuring Fast Load for SQL Server Destinations
Loading the Fact Table
DimensionTables
DimensionTables
customer_dimcustomer_dimcustomer_dimcustomer_dim201 ALFI Alfreds201 ALFI Alfreds
product_dimproduct_dimproduct_dimproduct_dim 25 123 Chai 25 123 Chai
Source Data
customer idcustomer id
ALFI ALFIALFI
123 1/1/2000 400
134 1/1/2000134 1/1/2000
time_dimtime_dimtime_dimtime_dim
product idproduct id order dateorder date quantity_salesquantity_sales amount_salesamount_sales
10,789123 1/1/2000 400 10,789
cust_keycust_key
123 1/1/2000 400
prod_keyprod_key time_keytime_key quantity_salesquantity_sales amount_salesamount_sales
25 134 400 10,789201
Sales Fact Data
Identifying Dimension Application Key Values in the Fact Table Source Data
Retrieving Primary Keys from Each Dimension Table to Assign Foreign Keys
Loading Sales_fact
DTSDTS
Extracting Data from the Sales_stage Table
Assigning Foreign Keys by Retrieving Primary Keys from the Product_dim, Customer_dim, and Time_dim Dimensions
Best Practices - Performing Inserts
Bulk Insert Task
Accessing data in files Loading data into SQL Server destinations Copying data with no transformations
Transform Data Task
Accessing any source Loading to any destination Creating data transformations Using input parameters in the source query Applying custom logic to phases of the data pump
Best Practices - Performance Settings
Tuning the Transform Data Task
Fast load for SQL Server destinations
Batch size
Table lock
Tuning the Bulk Insert Task
Sort order for clustered indexes
Batch size
Table lock
Best Practices - Executing Flexible Queries
The Data Driven Query Task
Execute flexible queries on a row-by-row basis
Meet flexibility needs that outweigh performance needs
Perform non-insert queries
The Execute SQL Task
Execute SQL statements and extended SQL statements
Perform parameterized queries
Assign query outputs to global variables
Best Practices - Using Custom Tasks
Creating Reusable Functions and Utilities
Adding Functionality to DTS Package Designer
Implementing a Faster Alternative to ActiveX Script Tasks
Best Practices - Creating Efficient Column Mappings
Minimizing the Number of Column Mappings
Using Many-to-Many Mappings When Possible
Grouping Common Transformations Together
Best Practices - The Right Transformation Type
Using Supplied Transformations When Possible
Minimizing ActiveX Script Transformations When Performance Outweighs Flexibility
Using SQL Solutions with Copy Column Transformations
Developing Custom Transformations as a Faster Alternative to ActiveX Script Transformations