lab 1: implementing data flow in an ssis package 2014 labs.pdf · lab 1: implementing data flow in...

12
1 Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer and sales order data from the InternetSales database used by the company’s e-commerce site, which you must load into the Staging database. This database contains customer data (in a table named Customers), and sales order data (in tables named SalesOrderHeader and SalesOrderDetail). You will extract sales order data at the line item level of granularity. The total sales amount for each sales order line item is then calculated by multiplying the unit price of the product purchased by the quantity ordered. Additionally, the sales order data includes only the ID of the product purchased, so your data flow must look up the details of each product in a separate Products database. Objectives After completing this lab, you will be able to: Extract and profile source data. Implement a data flow. Use transformations in a data flow. Lab Setup Estimated Time: 60 minutes Virtual machine: 20463C-MIA-SQL User name: ADVENTUREWORKS\Student Password: Pa$$w0rd Exercise 1: Exploring Source Data Scenario You have designed a data warehouse schema for Adventure Works Cycles, and now you must design an ETL process to populate it with data from various source systems. Before creating the ETL solution, you have decided to examine the source data so you can understand it better. The main tasks for this exercise are as follows: 1. Prepare the Lab Environment 2. Extract and View Sample Source Data 3. Profile Source Data

Upload: others

Post on 21-Mar-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

1

Lab 1: Implementing Data Flow in an SSIS Package

Scenario

In this lab, you will focus on the extraction of customer and sales order data from the InternetSales database

used by the company’s e-commerce site, which you must load into the Staging database. This database

contains customer data (in a table named Customers), and sales order data (in tables named

SalesOrderHeader and SalesOrderDetail). You will extract sales order data at the line item level of

granularity. The total sales amount for each sales order line item is then calculated by multiplying the unit

price of the product purchased by the quantity ordered. Additionally, the sales order data includes only the

ID of the product purchased, so your data flow must look up the details of each product in a separate

Products database.

Objectives

After completing this lab, you will be able to:

• Extract and profile source data.

• Implement a data flow.

• Use transformations in a data flow.

Lab Setup

Estimated Time: 60 minutes

Virtual machine: 20463C-MIA-SQL

User name: ADVENTUREWORKS\Student

Password: Pa$$w0rd

Exercise 1: Exploring Source Data

Scenario

You have designed a data warehouse schema for Adventure Works Cycles, and now you must design an

ETL process to populate it with data from various source systems. Before creating the ETL solution, you

have decided to examine the source data so you can understand it better.

The main tasks for this exercise are as follows:

1. Prepare the Lab Environment

2. Extract and View Sample Source Data

3. Profile Source Data

Page 2: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

2

Task 1

1. Ensure that the 20463C-MIA-DC and 20463C-MIA-SQL virtual machines are both running, and then log

on to 20463C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa$$w0rd.

2. In the 20463C-MIA-SQL virtual machine, run Setup.cmd in the D:\Labfiles\Lab04\Starter folder as

Administrator.

Task 2

1. Use the SQL Server 2014 Import and Export Data Wizard to extract a sample of customer data from the

InternetSales database on the localhost instance of SQL Server to a comma-delimited flat file.

o Your sample should consist of the first 1,000 records in the Customers table.

o You should use a text qualifier because some string values in the table may contain commas.

2. After you have extracted the sample data, use Excel to view it.

Task 3

1. Create an Integration Services project named Explore Internet Sales in the D:\Labfiles\Lab04\Starter

folder.

2. Add an ADO.NET connection manager that uses Windows authentication to connect to the InternetSales

database on the localhost instance of SQL Server.

3. Use a Data Profiling task to generate the following profile requests for data in the InternetSales database:

4. Column statistics for the OrderDate column in the SalesOrderHeader table. You will use this data to

find the earliest and latest dates on which orders have been placed.

5. Column length distribution for the AddressLine1 column in the Customers table. You will use this data

to determine the appropriate column length to allow for address data.

6. Column null ratio for the AddressLine2 column in the Customers table. You will use this data to

determine how often the second line of an address is null.

7. Value inclusion for matches between the PaymentType column in the SalesOrderHeader table and the

PaymentTypeKey column in the PaymentTypes table. Do not apply an inclusion threshold and set a

maximum limit of 100 violations. You will use this data to find out if any orders have payment types that

are not present in the table of known payment types.

8. Run the SSIS package and view the report that the Data Profiling task generates in the Data Profile

Viewer.

Result: After this exercise, you should have a comma-separated text file that contains a sample of customer

data, and a data profile report that shows statistics for data in the InternetSales database.

Page 3: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

3

Exercise 2: Transferring Data by Using a Data Flow Task

Scenario

Now that you have explored the source data in the InternetSales database, you are ready to start

implementing data flows for the ETL process. A colleague has already implemented data flows for reseller

sales data, and you plan to model your Internet sales data flows on those.

The main tasks for this exercise are as follows:

1. Examine an Existing Data Flow

2. Create a Data Flow task

3. Add a Data Source to a Data Flow

4. Add a Data Destination to a Data Flow

5. Test the Data Flow Task

Task 1

1. Open the D:\Labfiles\Lab04\Starter\Ex2\ AdventureWorksETL.sln solution in Visual Studio.

2. Open the Extract Reseller Data.dtsx package and examine its control flow. Note that it contains two

Data Flow tasks.

3. On the Data Flow tab, view the Extract Resellers task and note that it contains a source named Resellers

and a destination named Staging DB.

4. Examine the Resellers source, noting the connection manager that it uses, the source of the data, and the

columns that its output contains.

5. Examine the Staging DB destination, noting the connection manager that it uses, the destination table for

the data, and the mapping of input columns to destination columns.

6. Right-click anywhere on the Data Flow design surface, click Execute Task, and then observe the data

flow as it runs, noting the number of rows transferred.

7. When the data flow has completed, stop the debugging session.

Task 2

1. Add a new package to the project and name it Extract Internet Sales Data.dtsx.

2. Add a Data Flow task named Extract Customers to the new package’s control flow.

Task 3

1. Create a new project-level OLE DB connection manager that uses Windows authentication to connect to

the InternetSales database on the localhost instance of SQL Server.

2. In the Extract Customers data flow, add a source that uses the connection manager that you created for

the InternetSales database, and name it Customers.

Page 4: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

4

3. Configure the Customers source to extract all columns from the Customers table in the InternetSales

database.

Task 4

1. Add a destination that uses the existing localhost.Staging connection manager to the Extract Customers

data flow, and then name it Staging DB.

2. Connect the output from the Customers source to the input of the Staging DB destination.

3. Configure the Staging DB destination to load data into the Customers table in the Staging database.

4. Ensure that all columns are mapped, and in particular that the CustomerKey input column is mapped to

the CustomerBusinessKey destination column.

Task 5

1. Right-click anywhere on the Data Flow design surface, click Execute Task, and then observe the data

flow as it runs, noting the number of rows transferred.

2. When the data flow has completed, stop the debugging session.

Result: After this exercise, you should have an SSIS package that contains a single Data Flow task, which

extracts customer records from the InternetSales database and inserts them into the Staging database.

Exercise 3: Using Transformations in a Data Flow

Scenario

You have implemented a simple data flow to transfer customer data to the staging database. Now you must

implement a data flow for Internet sales records. The new data flow must add a new column that contains

the total sales amount for each line item (which is derived by multiplying the list price by the quantity of

units purchased), and use a product key value to find additional data in a separate Products database. Once

again, you will model your solution on a data flow that a colleague has already implemented for reseller

sales data.

The main tasks for this exercise are as follows:

1. Examine an Existing Data Flow

2. Create a Data Flow Task

3. Add a Data Source to a Data Flow

4. Add a Derived Column transformation to a data flow

5. Add a Lookup Transformation to a Data Flow

6. Add a Data Destination to a Data Flow

7. Test the Data Flow task

Page 5: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

5

Task 1

1. Open the D:\Labfiles\Lab04\Starter\Ex3\AdventureWorksETL.sln solution in Visual Studio.

2. Open the Extract Reseller Data.dtsx package and examine its control flow. Note that it contains two

Data Flow tasks.

3. On the Data Flow tab, view the Extract Reseller Sales task.

4. Examine the Reseller Sales source, noting the connection manager that it uses, the source of the data, and

the columns that its output contains.

5. Examine the Calculate Sales Amount transformation, noting the expression that it uses to create a new

derived column.

6. Examine the Lookup Product Details transformation, noting the connection manager and query that it

uses to look up product data, and the column mappings used to match data and add rows to the data flow.

7. Examine the Staging DB destination, noting the connection manager that it uses, the destination table for

the data, and the mapping of input columns to destination columns.

8. Right-click anywhere on the Data Flow design surface, click Execute Task, and then observe the data

flow as it runs, noting the number of rows transferred.

9. When the data flow has completed, stop the debugging session.

Task 2

1. Open the Extract Internet Sales Data.dtsx package, and then add a new Data Flow task named Extract

Internet Sales to its control flow.

2. Connect the pre-existing Extract Customers Data Flow task to the new Extract Internet Sales task.

Task 3

1. Add a source that uses the existing localhost.InternetSales connection manager to the Extract Internet

Sales data flow, and then name it Internet Sales.

2. Configure the Internet Sales source to use the Transact-SQL code in the

D:\Labfiles\Lab04\Starter\Ex3\InternetSales.sql query file query to extract Internet sales records.

Task 4

1. Add a Derived Column transformation named Calculate Sales Amount to the Extract Internet Sales

data flow.

2. Connect the output from the InternetSales source to the input of the Calculate Sales Amount

transformation.

3. Configure the Calculate Sales Amount transformation to create a new column named SalesAmount

containing the UnitPrice column value multiplied by the OrderQuantity column value.

Page 6: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

6

Task 5

1. Add a Lookup transformation named Lookup Product Details to the Extract Internet Sales data flow.

2. Connect the output from the Calculate Sales Amount transformation to the input of the Lookup Product

Details transformation.

3. Configure the Lookup Product Details transformation to:

o Redirect unmatched rows to the no match output.

o Use the localhost.Products connection manager and the Products.sql query in the

D:\Labfiles\Lab04\Starter\Ex3 folder to retrieve product data.

o Match the ProductKey input column to the ProductKey lookup column.

o Add all lookup columns other than ProductKey to the data flow.

4. Add a flat file destination named Orphaned Sales to the Extract Internet Sales data flow. Then redirect

non-matching rows from the Lookup Product Details transformation to the Orphaned Sales destination,

which should save any orphaned records in a comma-delimited file named Orphaned Internet Sales.csv

in the D:\ETL folder.

Task 6

1. Add a destination that uses the localhost.Staging connection manager to the Extract Customers data

flow, and name it Staging DB.

2. Connect the match output from the Lookup Product Details transformation to the input of the Staging

DB destination.

3. Configure the Staging DB destination to load data into the InternetSales table in the Staging database.

Ensure that all columns are mapped. In particular, ensure that the *Key input columns are mapped to the

*BusinessKey destination columns.

Task 7

1. Right-click anywhere on the Data Flow design surface, click Execute Task, and then observe the data

flow as it runs, noting the number of rows.

2. When the data flow has completed, stop the debugging session.

Result: After this exercise, you should have a package that contains a Data Flow task including Derived

Column and Lookup transformations

Page 7: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

7

Lab 2: Implementing Control Flow in an SSIS Package

Scenario

You are implementing an ETL solution for Adventure Works Cycles and must ensure that the data flows

you have already defined are executed as a workflow that notifies operators of success or failure by sending

an email message. You must also implement an ETL solution that transfers data from text files generated by

the company’s financial accounting package to the data warehouse.

Objectives

After completing this lab, you will be able to:

• Use tasks and precedence constraints.

• Use variables and parameters.

• Use containers.

Lab Setup

Estimated Time: 60 minutes

Virtual machine: 20463C-MIA-SQL

User name: ADVENTUREWORKS\Student

Password: Pa$$w0rd

Exercise 1: Using Tasks and Precedence in a Control Flow

Scenario

You have implemented data flows to extract data and load it into a staging database as part of the ETL

process for your data warehousing solution. Now you want to coordinate these data flows by implementing a

control flow that notifies an operator of the outcome of the process.

The main tasks for this exercise are as follows:

1. Prepare the Lab Environment

2. View a Control Flow

3. Add Tasks to a Control Flow

4. Test the Control Flow

Page 8: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

8

Task 1

1. Ensure the 20463C-MIA-DC and 20463C-MIA-SQL virtual machines are both running, and then log on

to 20463C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa$$w0rd.

2. Run Setup.cmd in the D:\Labfiles\Lab05A\Starter folder as Administrator.

Task 2

1. Use Visual Studio to open the AdventureWorksETL.sln solution in the D:\Labfiles\Lab05A\Starter\Ex1

folder.

2. Open the Extract Reseller Data.dtsx package and examine its control flow. Note that it contains two

Send Mail tasks – one that runs when either the Extract Resellers or Extract Reseller Sales tasks fail,

and one that runs when the Extract Reseller Sales task succeeds.

3. Examine the settings for the precedence constraint connecting the Extract Resellers task to the Send

Failure Notification task to determine the conditions under which this task will be executed.

4. Examine the settings for the Send Mail tasks, noting that they both use the Local SMTP Server

connection manager.

5. Examine the settings of the Local SMTP Server connection manager.

6. On the Debug menu, click Start Debugging to run the package, and observe the control flow as the task

executes. Then, when the task has completed, on the Debug menu, click Stop Debugging.

7. In the C:\inetpub\mailroot\Drop folder, double-click the most recent file to open it in Outlook. Then read

the email message and close Outlook.

Task 3

1. Open the Extract Internet Sales Data.dtsx package and examine its control flow.

2. Add a Send Mail task to the control flow, configure it with the following settings, and create a

precedence constraint that runs this task if the Extract Internet Sales task succeeds:

o Name: Send Success Notification

o SmtpConnection: A new SMTP Connection Manager named Local SMTP Server that connects to the

localhost SMTP server

o From: [email protected]

o To: [email protected]

o Subject: Data Extraction Notification

o MessageSourceType: Direct Input

o MessageSource: The Internet Sales data was successfully extracted

o Priority: Normal

Page 9: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

9

3.

Add a second Send Mail task to the control flow, configure it with the following settings, and create a

precedence constraint that runs this task if either the Extract Customers or Extract Internet Sales task

fails:

o Name: Send Failure Notification

o SmtpConnection: The Local SMTP Server connection manager you created previously

o From: [email protected]

o To: [email protected]

o Subject: Data Extraction Notification

o MessageSourceType: Direct Input

o MessageSource: The Internet Sales data extraction process failed

o Priority: High

Task 4

1. Set the ForceExecutionResult property of the Extract Customers task to Failure. Then run the package

and observe the control flow.

2.

When package execution is complete, stop debugging and verify that the failure notification email

message has been delivered to the C:\inetpub\mailroot\Drop folder. You can double-click the email

message to open it in Outlook.

3. Set the ForceExecutionResult property of the Extract Customers task to None. Then run the package

and observe the control flow.

4. When package execution is complete, stop debugging and verify that the success notification email

message has been delivered to the C:\inetpub\mailroot\Drop folder.

5. Close Visual Studio when you have completed the exercise.

Result: After this exercise, you should have a control flow that sends an email message if the Extract

Internet Sales task succeeds, or sends an email message if either the Extract Customers or Extract Internet

Sales tasks fail.

Exercise 2: Using Variables and Parameters

Scenario

You need to enhance your ETL solution to include the staging of payments data that is generated in comma-

separated value (CSV) format from a financial accounts system. You have implemented a simple data flow

that reads data from a CSV file and loads it into the staging database. You must now modify the package to

construct the folder path and file name for the CSV file dynamically at run time instead of relying on a hard-

coded name in the data flow task settings.

Page 10: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

10

The main tasks for this exercise are as follows:

1. View a Control Flow

2. Create a Variable

3. Create a Parameter

4. Use a Variable and a Parameter in an Expression

Task 1

1. View the contents of the D:\Accounts folder and note the files it contains. In this exercise, you will

modify an existing package to create a dynamic reference to one of these files.

2. Open the AdventureWorksETL.sln solution in the D:\Labfiles\Lab05A\Starter\Ex2 folder.

3. Open the Extract Payment Data.dtsx package and examine its control flow. Note that it contains a single

data flow task named Extract Payments.

4. View the Extract Payments data flow and note that it contains a flat file source named Payments File,

and an OLE DB destination named Staging DB.

5. View the settings of the Payments File source and note that it uses a connection manager named

Payments File.

6. In the Connection Managers pane, double-click Payments File, and note that it references the

Payments.csv file in the D:\Labfiles\Lab05A\Starter\Ex2 folder. This file has the same data structure as

the payments file in the D:\Accounts folder.

7. Run the package, and stop debugging when it has completed.

8. On the Execution Results tab, find the following line in the package execution log:

[Payments File [2]] Information: The processing of the file

“D:\Labfiles\Lab05A\Starter\Ex2\Payments.csv” has started

Task 2

1. Add a variable with the following properties to the package:

o Name: fName

o Scope: Extract Payments Data

o Data type: String

o Value: Payments - US.csv

Page 11: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

11

Task 3

1. Add a project parameter with the following settings:

o Name: AccountsFolderPath

o Data type: String

o Value: D:\Accounts\

o Sensitive: False

o Required: True

o Description: Path to accounts files

Task 4

1. Set the Expressions property of the Payments File connection manager in the Extract Payment Data

package so that the ConnectionString property uses the following expression:

@[$Project::AccountsFolderPath]+ @[User::fName]

2. Run the package and view the execution results to verify that the data in the D:\Accounts\Payments -

US.csv file was loaded.

3. Close Visual Studio when you have completed the exercise.

Result: After this exercise, you should have a package that loads data from a text file based on a parameter

that specifies the folder path where the file is stored, and a variable that specifies the file name.

Exercise 3: Using Containers

Scenario

You have created a control flow that loads Internet sales data and sends a notification email message to

indicate whether the process succeeded or failed. You now want to encapsulate the data flow tasks for this

control flow in a sequence container so you can manage them as a single unit.

You have also successfully created a package that loads payments data from a single CSV file based on a

dynamically-derived folder path and file name. Now you must extend this solution to iterate through all the

files in the folder and import data from each one.

The main tasks for this exercise are as follows:

1. Add a Sequence Container to a Control Flow

2. Add a Foreach Loop Container to a Control Flow

Page 12: Lab 1: Implementing Data Flow in an SSIS Package 2014 Labs.pdf · Lab 1: Implementing Data Flow in an SSIS Package Scenario In this lab, you will focus on the extraction of customer

12

Task 1

1. Open the AdventureWorksETL solution in the D:\Labfiles\Lab05A\Starter\Ex3 folder.

2. Open the Extract Internet Sales Data.dtsx package and modify its control flow so that:

o The Extract Customers and Extract Internet Sales tasks are contained in a Sequence container

named Extract Customer Sales Data.

o The Send Failure Notification task is executed if the Extract Customer Sales Data container fails.

o The Send Success Notification task is executed if the Extract Customer Sales Data container

succeeds.

3. Run the package to verify that it successfully completes both data flow tasks in the sequence and then

executes the Send Success Notification task.

Task 2

1. In the AdventureWorksETL solution, open the Extract Payment Data.dtsx package.

2. Move the existing Extract Payments Data Flow task into a new Foreach Loop Container.

3. Configure the Foreach Loop Container so that it loops through the files in the folder referenced by the

AccountsFolderPath parameter, adding each file to the fName variable.

4. Run the package and count the number of times the Foreach Loop is executed.

5. When execution has completed, stop debugging and view the results to verify that all files in the

D:\Accounts folder were processed.

6. Close Visual Studio when you have completed the exercise.

Result: After this exercise, you should have one package that encapsulates two data flow tasks in a sequence

container, and another that uses a Foreach Loop to iterate through the files in a folder specified in a

parameter and uses a data flow task to load their contents into a database.