all tutorials2

Upload: exbis

Post on 18-Oct-2015

8 views

Category:

Documents


0 download

TRANSCRIPT

The Script Transformation part 1 a simple TransformationPosted by BI Monkey on September 1, 2009 2 Comments

Fig 1: The Script TransformationIn this post I will be covering the Script Transformation. The sample package can be found here for 2005 and guidelines on use are here.What does the Script Transformation do?The question here should really be what cant it do? The Script Transform fills in the gap when standard components dont fit the bill by allowing you to create, consume or access rows and columns in the Data Flow and process them within VB.Net code (and also C# in 2008). This opens up a whole array of functionality to access as both VB.Net and C# are powerful and flexible languages. The Script Transformation can function as a Source (providing rows to the Data Flow), as a Destination (consuming rows from the Data Flow) or as a Transformation (changing or creating column values in the Data Flow).Functioning as a Transformation, you arent restricted to just row in, row out operations either you add new outputs, create multiple rows from single rows and create single rows from multiple rows. The capacity to do impressive tricks with your data is well, impressive! Because of this array of options I will break each one out into a separate post. This first post will cover a simple one row in one row out transformation.Sadly there are some downsides to all this, two of which really stand out for me. The first of these is if you arent a programmer (I never moved much beyond VBA) then writing the code, debugging the code, or even knowing what can and cant be done in the code can make working with this component a bit of a struggle. The second is you lose a lot of visibility over what is being done in the component unlike with most other transforms there is no nice GUI to show what column is going where and what is being done to it. You need to be able to actually read the code to understand what is going on, and I must warn that the BI Monkey becomes one Angry Ape when code is insufficiently commented!Configuring the Script TransformationIt is fairly easy to set up the Script Transformation, but you need to use a little more of the Advanced Editor type features than basic developers are probably used to. Key actions are selecting input columns, defining output columns, choosing input variables and connection managers being used within the component.First up simply check the columns from the Input that you want to access in the script component. By access I mean read or alter the value of.

Fig 2: Selecting the Input ColumnsSecond, define the output columns. If you are adding new columns to the Data Flow as I do in the example, click the Add Column button which becomes enabled when you select the Output Columns folder. Then name it and select the data type. By default when using a Script Transformation as a Transformation a single output Output 0 is created for you to add columns to.

Fig 3: Configuring the Output ColumnsFinally (in this case I wont be using any connection managers here and will cover those in a future post) enter the variables you want to be able to access in the script there are two options ReadOnlyVariables and ReadWriteVariables. Fairly self explanatory if you want to change the value of the variable in the script enter it into the ReadWriteVariables line and if you dont want the value to change enter it into the ReadOnlyVariables line. Two quick notes of warning if theres a space in your list of variables in the ReadOnlyVariables line it will cause an error in 2005 Script tasks. Secondly, remember variable names are case sensitive.

Fig 4: Specifying the VariablesThe last thing you need to do is click on the Design Script button that will open your code editor. There is a commented section where you can add your code.Below is my sample code from the package theres nothing too fancy going on here. Note how the columns selected in the Inputs are available as a property of the Row - they simply pop up on Intellisense as you code. Similarly variables are accessible from Me.Variables.Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) Use a standard VB FunctionRow.ShipYear = Row.ShipDate.Year Use a variable in the Me.Variables. formatRow.DoubleDue = Row.TotalDue * CType(Me.Variables.Multiplier, Decimal)End SubIll dig into how code is structured in one of the future posts in this mini-series.Where should you use the Script Transformation?The Script transformation is there to be used when native SSIS component functionality doesnt meet your needs the most common use I have had for it is when the Derived Column editor doesnt give me what I need such as when I want to use Regular Expressions, or reuse a function across many columns.MSDN Documentation for the Script Transformation can be found here for 2008 and here for 2005.If you are still struggling, try these additional resources: Creating a Synchronous Transformation with the Script Component (MSDN) How to use a Script Transformation (Brian Knight) Using a Script Transform in SSIS (Jumpstart TV) If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.Filed under Integration Services Tagged with Integration Services, Script TaskGetting File Information with the Script TaskPosted by BI Monkey on August 26, 2009 1 CommentA requirement for a current client is to capture some information about Flat Files being loaded for logging purposes. Fortunately this is a easy job for the Script Task, using the following VB.Net code:Public Sub Main() String variable to hold file nameDim strFileName As String File name (including path) passed in from containerstrFileName = Dts.Variables(User::FileName).Value.ToString Create a System.IO.FileInfo object to retrieve the dataDim FileObject As System.IO.FileInfo = New System.IO.FileInfo(strFileName) Return the information via a message boxMsgBox(File name + FileObject.FullName + created on + FileObject.CreationTime.ToString)End SubThe sample package for this post can be found here. To see what other properties can be obtained, just use Intellisense on the FileObject object in the code.Filed under Integration Services Tagged with Integration Services, Script TaskThe Derived Column TransformationPosted by BI Monkey on August 23, 2009 1 Comment

Fig 1: The Derived Column TransformationIn this post I will be covering the Derived Column Transformation. The sample package can be found here for 2005 and guidelines on use are here.What does the Derived Column Transformation do?The Derived Column Transformation provides a means to change column data as it passes through the data flow. It uses the SSIS Expression Language to transform the data and allows you to replace an existing columns value or create a new column which can use values from other columns or variables to create a new data item.

Fig 2: Configuring the Derived Column TransformationIn the Derived Column editor, there are three panes to work with. The top left pane has two folders, one for Variables and one for Columns. The top right pane contains folders for the various functions available. The bottom pane is where you define the Derived Columns. Both the top panes have drag and drop functionality, so you can click and drag a function, column or variable into the Expression area of the Derived Column configuration pane.Setting up a Derived column is straightforward if adding a new column, provide a name for it in the Derived Column Name column. If replacing the content of an existing column, select its name in the Derived Column dropdown, and the Derived Column Name will lock down to the name of the column being replaced in the example above see the ListPrice column for an example.The next part is the Expression itself. For details of the language for the expression, see my posts on the SSIS Expression Language. The options you have are basic expressions, such as ROUNDing numbers, YEAR functions for dates and SUBSTRING for strings. Two things that I will raise are NULL Functions and Type Casts. Because SSIS is so strict about data typing, if you evaluate or return a Null in your expression, you have to pull a Null from the Null functions list for example if assessing if a Date value is null, you would need to use the Null function NULL (DT_DATE). Similarly Type Casts convert data to a specific type, so for example if you wanted to put a string source type value into a float column, you would have to convert it using the Type Cast (DT_R8) [Sting Column].The Data Type, Length, Precision, Scale and Code pages are determined automatically from the type of data you are working with note that you cannot change the data type of an existing column. The only room you really have to move here is to change string lengths and code pages, or precision and scale for numerics when you create a new column.In the sample package I show a few simple examples of Column manipulation, using column on column, variable on column, pure variable and pure column operations.What are the Derived Column Transformations limitations?My biggest problem with the Derived Column Transformation is that the function list is small, and worst of all, fixed. In a rare example of Cognos Data Manager being better than SSIS, Data Manager allows for the creation of custom functions that can then be re-used. SSIS offers no such flexibility, which means if you have complex operations that need to be done repeatedly (e.g. Trimming & Nulling incoming strings to clean input data) you cant create a custom function to simplify the operation and make the operation reuseable. Ive added a Feature Suggestion on Connect to request an package level extensible function library please vote for it if you agree this is a big hole in the component.One thing which trips up a few people is that you cannot use the result of one derived column in another derived column within the same transformation. The logic behind this is pretty simple each column is treated as a separate independent item within the data flow and can only consume columns that are input to the transformation. A derived column is effectively an output and so cannot be referenced within the component it was created in.Lesser gripes are that the Editor is too small. There is no call out box like when setting component properties using expressions, so complex expressions quickly become difficult to read and debug. Add to that the only way to get the syntax error messages is to hover over the function that is invalid and try to read the note that appears for about 4 seconds, meaning the only real way to read long error messages is to hover over it with the mouse and do a screen capture.Where should you use the Derived Column Transformation?It is best used when you have to perform simple operations to change data values for example TRIMming strings, simple IF statements and SUBSTRINGs. It also reproduces all of the functionality in the Audit and Data Conversion tasks, so if you are using those anywhere you may want to consider replacing them with a Derived Column. If your expressions are getting complicated or you repeat alot of operations, you may want to move these operations to the uglier but more powerful Script Component, which I will be covering soon.MSDN Documentation for the Derived Column Transformation can be found here for 2008 and here for 2005.If you are still struggling, try these additional resources: Andy Leonards post on SSIS Expression Language and the Derived Column Transformation If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.Filed under Integration Services Tagged with Derived Column, Integration ServicesThe Row Count TransformationPosted by BI Monkey on August 13, 2009 Leave a Comment

Fig 1: The Row Count TransformationIn this post I will be covering the Row Count Transformation. The sample package can be found here for 2005 and guidelines on use are here.What does the Row Count Transformation do?The Row Count Transformation counts the number of rows that have passed through a Data Flow and puts that count into a variable. Configuration is simple all you need to do to is specify the variable name that will hold the row count on the first page of the editor (down at the bottom under Custom Properties). Theres a little gotcha here whilst the tab for Input Columns is active, if you try to select any columns it will return an error and not allow you to continue.It is worth noting that the variable is only updated once all rows of data have passed through the data flow ive demonstrated this in the sample package by adding the variable to a column in a Derived Column it returns zero all the way through, so you cannot use the Row Count as a row number generator.Where would you use the Row Count Transformation?The most obvious use is in logging processes for example counting input rows versus outputs rows or counting failed rows. Anywhere you need to track the number of rows being passed through a given data flow.MSDN Documentation for the Row Count Transformation can be found here for 2008 and here for 2005.Filed under Integration Services Tagged with Integration Services, Row CountFormatting SSIS Configuration filesPosted by BI Monkey on August 11, 2009 Leave a CommentJamie Thompson has provided a nugget on quickly formatting SSIS XML Configuration files in Visual Studio. Simply open the XML in Visual studio, CTRL-K, CTRL-D and suddenly its readable!Im in agreement with Jamie, it would be nice if SSIS churned out something a little more easily readable by defaultFiled under Integration Services Tagged with Configuration files, Integration ServicesThe Slowly Changing Dimension Transformation, part 2 Type 2 DimensionsPosted by BI Monkey on August 11, 2009 2 Comments

Fig 1: The Slowly Changing Dimension TransformationIn this post I will be covering how to use the Slowly Changing Dimension (SCD) Transformation to update a Type 2 Dimension, that is, one thattracks changes in values over time. The sample package can be found here for 2005 and guidelines on use are here. This is the second post in the series looking at some slightly more advanced behaviour for the basics of the Slowly Changing Dimension (SCD) Transformation please read my post:The Slowly Changing Dimension Transformation, part 1 Type 1 Dimensions.Configuring the SCD for a Type 2 DimensionThe first work you need to do for a Type 2 dimension actually resides in your dimension table design you need to decide whether you are going to track changes in your table using either a simple indicator to identify current and expired records, or if you want to use effective dates the component doesnt natively allow you to use both, though you can customise the outputto do so. The Current / Expired indicator actually uses a small text string which can either be set to the string value pairsTrue /False or Current / Expired no customisation of these is allowed in the component (again, you can customise the output to change this, but the wizard will only allow mapping of the columnto one that will accepttext strings). The Effective dates option requires a start and end date datetime column, and in the wizard you use a variable to set the time used. The sample package demonstrates a few possibilities but below I will describe using effective dates.

Fig 2: Select a Dimension Table and KeysFirst of all, note when on the first page, Select a Dimension Table and Keys the Effective dates (and Current indicator) are not mapped. Because I have named the columns in line with what the SCD expects for such indicators, it ignores them completely in the mapping they cannot even be selected as Input Columns. If you name them differently in your design, simply map them as Not a Key Column.

Fig 3: Slowly Changing Dimension ColumnsIn the Slowly Changing Dimension Columns page set the change type of each column to Historic so the component will track history of changes.

Fig 4: Historical Attribute OptionsThe wizard will present a page that is only displayed when you have selected Historic change type columns. Here the start and end date columns are specified, and the component needs a datetime variable to use to set the expiry of old records and the start date of new records. Here I have just used the Package Start time variable in practice you may well want to specify a variable populated with something else, such as the extract date of the data.

Fig 5: Finishing the SCD WizardWhen go to finish the wizard, you will note the additional Historical Attribute Output will be generated. In practice this means a setof components will be output to manage the changes, which are illustrated below (click to zoom in). The derived columns add the effective start and end date columns and the OLE DB Command expires old records. Please review the sample component to see how this works in practice.

Fig 6: Historical Change OutputsSCD Considerations for Type 2 DimensionsOne of the most important things to bear in mind is that the component is not intelligent in terms of knowing which data is new so if you had two records for a given key in the sample file, you would have to sort it so it would feed it the most recent item last so that item would be the current one. It also provides no support for data which has its own change dates for example if a record had an update date and you wished to use that to form the effective date.The SCD component is only really suitable for tracking Type 2 changes in sources where there will be one record per key per extract and the source itself has no change tracking capabilities. Given this weakness and the difficulties with using this component generally (in terms of configuration and performance) you may well want to look at the alternatives I mentioned in my original post about the SCD. This is a component that definitely needs an overhaul for the next release.MSDN Documentation for the Slowly Changing Dimension Transformation can be found here for 2008 and here for 2005.Filed under Integration Services Tagged with Integration Services, Slowly Changing DimensionSSIS Expression Language basicsPosted by BI Monkey on August 6, 2009 1 CommentIve seen a fair bit of traffic for my post on the Conditional Split, and Im betting that a fair amount of problems people are having relate to getting the syntax for the SSIS Expression language right. The official documentation is here on MSDN, but below Ill spell out some basic concepts to get people going:Format of variables:Variables need to be in the format:@[Namespace::Variablename]Note that both and are case sensitive, so if you type @[NameSpace::VariableName], it will error stating it is unable to find your variables. Namespace is optional if you arent using it this is not a problem and by default every variable is in the User namespace.Basic comparisom operators:Equals: Two equals signs (==)@[Namespace::Variablename] == 1Not Equal: Exclamation mark and an equals sign (!=)@[Namespace::Variablename] != 1Less than: Less than symbol (=)@[Namespace::Variablename] >= 1And: Two ampersands (&&)@[Namespace::Variablename] == 1 && @[Namespace::Variablename2] == 1Or: Two pipes (||)@[Namespace::Variablename] == 1 || @[Namespace::Variablename2] == 1If statement: Boolean statement, quotation mark, result if true, colon, result if false (Boolean ? True : False)@[Namespace::Variablename] == 1 ? True : FalseI hope this helps get you started. For more information, try these articles / videos: Andy Leonard: An Introduction to the SSIS Expression Language Jumpstart TV: Expression language basics Filed under Integration Services Tagged with Expression Language, Integration ServicesThe Slowly Changing Dimension Transformation, part 1 Type 1 DimensionsPosted by BI Monkey on July 28, 2009 1 Comment

Fig 1: The Slowly Changing Dimension TransformationIn this post I will be covering the basics of the Slowly Changing Dimension (SCD) Transformation. The sample package can be foundhere for 2005 and guidelines on use are here. This is going to be an introductory post that will not cover all aspects of the SCD Transformation as it is one of the more involved components to configure and the post will still manage to be quite long.What does the SCD do?The simplest explanation is that it compares the attributes (column values) of rows of incoming data against a reference table, using a unique key called the Business Key to identify the record to compare against. What can make it complex is the range of comparison options and possible outputs for the component. The component checks attributes for three scenarios:1. New record no record with that business key exists in the reference table 2. Changed attributes a record with that business key exists and compared attributes have changed 3. Unchanged attributes a record with that business key exists and compared attributes have not changed Now, within those scenarios are a subset of possibilities to allow for that the changed attribute shouldnt change (Fixed attributes), or that the history of changes needs to be tracked (Historic attributes), or allowing for Inferred members, which I will explain in a future post.For the sake of simplicity, here I will only be covering a basic example where new records are added and changed records are updated a.k.a. a Type 1 Dimension see this Slowly Changing Dimension article on Wikipedia for more explanation of various Dimension Types.Configuring the SCD for a Type 1 DimensionThe SCD Transformation is configured using a wizard which launches when you double click on the component. It first prompts you to choose the reference table in the most common Data Warehouse scenarios this would be a Dimension or Fact table. In the example package, it is the dimension table SCD_Nuts.

Fig 2: Selecting a Dimension Table and KeysOnce you have selected the reference table, the component will automatically map columns from the Input Data Flow to columns in the reference (Dimension) table by name. It will not allow mappings to columns where data types do not match, so if you have columns that have the same name in the source data flow and reference table that do not map, check the data types the wizard will not allow you to force mappings of mismatched data types to repair later. Once you have set up your mappings, using the drop-down you have to identify the Business Key column(s) it can be a compound key of multiple columns, or a single column. In our example, it is the unique key column Nut_Key.

Fig 3: Slowly Changing Dimension ColumnsOn the next page of the wizard, you determine the Change Type of the non-key columns you want to compare. If a column value change doesnt matter and you dont want it to trigger any action, simply leave it off the list on this page. If a change in the column value does trigger an action, then add it to the list of Dimension Columns and set its Change Type. For the purposes of this example, the value we will be selecting is Changing Attribute, and in the sample package all attribute columns are being assessed for changes.The next page Fixed and Changing Attribute Options is for handling Fixed attributes and Type 2 dimension changes, so we will leave any options here unselected and move on. The next page Inferred Dimension Members also does not apply here, so uncheck the Enable inferred member support and click next.

Fig 4: Finishing the SCD WizardThe final page displays a list of outputs the component will generate. In this case it lists only the New Record Output, but it will also generate an output for Changing records. Click finish and the component will add an OLEDB Destination for the new records, and an OLEDB Command to handle updating changed records. Each added transformation will be fully mapped and in the case of the Changed records, the update query is written.Problems with the SCDThe SCD has issues with ease of use and performance. Ill start with themost importantone: Performance. How the SCD works is for each incoming row of data it issues a SQL command to check against the reference (or Dimension) table in the database tocompare the incoming row against its corresponding row in the reference (you can watch this happening in SQL profiler). This isnt a problem for small reference tables, but once you start processing thousands of incoming rows against tables with thousands of reference rows, performance starts to drag, because it is doing theserow byrow checks. The only performance tuning option you have at your disposal is to index the Business Key in the reference table. It would be much better if it was possible to cache the reference table in memory so lookups could be done in memory instead of row by row against the database according to this Connect article it may be on the list for the next release.In terms of ease of use, theres a couple of annoying things that can trip you up with this component. First of all, every time you complete the wizard, it creates new output transformations, deleting the old ones. If you have customised these in anyway e.g. adding an update date column -it gets annoying fast. Fortunatelythe workaround is easy create your own transformations to receive the outputs independently of the SCD, and when the wizard completes, just delete the outputs it creates and re-map the output data flows to your own transformations.Secondly, the Wizard is actually disconnected from what is stored in the package for the data flow. The wizards data is stored in a separate chunk of XML within the package definition. What this means in practice is if you use the Advanced Editor to make any changes, these will not be picked up by the wizard if you run it again. So its quite easy to make tweaks that get lost if you re-use the wizard.Where would you use the SCD?As per its name, you will most likely use this in data warehousing scenarios for maintaining slowly changing dimension and fact tables, or any table where you want to update data to reflect current values. Note that the reference / dimension table can only reside in SQL Server.There are some alternatives to the SCD available notably Table Difference from SQLBI.com (now at version 2.0) which I have used and is very quick, and the Kimball Method SCD from Codeplex, which I havent used but will certainly be looking at and may cover in one of the following posts on the SCD.

MSDN Documentation for the Slowly Changing Dimension Transformation can be found here for 2008 and here for 2005.Filed under Integration Services Tagged with Integration Services, Slowly Changing DimensionFlat File Sources and the Decimal Data TypePosted by BI Monkey on July 27, 2009 Leave a CommentQuestion: What Data Type should you use for importing a column in Flat File containing Decimal data?Answer: numeric [DT_NUMERIC]You cannot use the decimal [DT_DECIMAL] type, because in the Advanced Editor of the Flat File connection, the decimal type for some reason only allows you to set the Scale (the number of digits after the decimal point) the Precision is greyed out (precision is the total number of digits). The numeric data type allows the setting of both values.Fortunately the SSIS numeric type maps to SQL Server decimal columns without complaint, so you dont have to add a Data Conversion to change numeric to decimal before using it. I have raised a bug on Connect please vote it up if you consider this worth fixing.Filed under Integration Services Tagged with Data Types, Flat File, Integration ServicesDo While / Until Loops in SSISPosted by BI Monkey on July 23, 2009 3 CommentsThere isnt an explicit Do.. While / Until loop in SSIS, but it is easy to emulate the functionality using a For Loop Container. Its a simple 2 step process:1. Create a variable to hold your Until / While break value 2. Set only the EvalExpression of the For Loop container to break when your condition is met in the variable Ive attached an example which uses the variable User::WhileCondition. The For Loop container stops executing when this EvalExpression is no longer true:@[User::WhileCondition] < 5The script task within the loop increases User::WhileCondition by 1 on each iteration. When it reaches 5, the loop stops executing.The scenario I was using this was for monitoring a folder where files might keep arriving even during processing. So after the Foreach loop over the folder ran, the script task would check to see if any files remained in the folder. If there were, the loop would run again and process the new files. Once there were no files left, the Do.. Until loop evaluated that there were zero files and stopped running.The sample package can be found here for 2005 and guidelines on use are here.