advanced etl ms ssis 2012 & talend
DESCRIPTION
TRANSCRIPT
Advanced ETL -SSIS 2012 & Talend
By
Sunny Okoro
1
ContentsDatabase Systems........................................................................................................................................2
Applications.................................................................................................................................................2
Microsoft SQL Server Integration Services 2012.....................................................................................4
Talend Open Studio 5.4...........................................................................................................................183
2
Database Systems
Microsoft SQL Server 2008R2
Microsoft SQL Server 2012
Applications
3
Microsoft Visio
Microsoft Visual Studio 2010
4
Microsoft SQL Server Integration Services 2012
5
6
7
8
9
10
11
12
13
14
15
16
Example of Flat files Creation
17
18
The connection string ensures that file is created in the right folder with the right name as declared in the SSIS variable.
19
20
21
22
23
Example of Pivot Creation
24
25
This data flow task contains many tables, files, aggregations and derived columns not all will be illustrated. The pervious demonstrations illustrate some of the key components in this data flow. The
following illustrations
demonstrates major expression used in derived columns to transform the data.
26
27
28
29
The stored procedure executed from SQL Server management studio displays null data that would be transformed to a specific value using expression in SSIS.
30
31
Countrycode = AU [AUSTRIALIA]
STATECODE= VIC[VICTORIA]
EXECUTION
32
33
34
35
36
Results Abridged
37
Results Abridged
38
39
Results Abridged
40
41
42
43
44
45
46
47
48
49
50
51
52
53
Results Abridged
54
55
56
57
58
Results Abridged
Results Abridged
59
60
The countrycode is changed to US for USA and Statecode to CA for this execution. The [SalesRpt_FiscalYr_City] table does not contain any Australian cities from the previous demonstration because the table was truncated at the beginning of each package execution
The countrycode remained the same but the statecode was changed to IL. The data contrnts for the state of Illinios where created in the same folder as state contents for Victoria. The prefixes were changed to IL for each file name to reflect the countrycode and statecode which was done using file connection strings.
61
62
No data found for the city which was in California in the previous execution of this package. I will change the countrycode to CA and state code to BC .
63
64
The output folder is clustered and SSIS will delete every content in the output folder at the beginning of each execution.
65
66
The pervious content has been deleted by SSIS using the file system task which can also be utilized to create directories, copy files etc. The output folder has no content for Great Britain.
67
68
69
70
71
72
73
These files will be imported into MS SQL Server database using foreach loop to grab each csv files and upload them into the product tables.
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
For this demonstration, Talend ETL application would be utitlized to transform the data into xml format that can be recognized by SSIS.
95
96
97
Data Mapping
98
99
100
101
102
103
The Adoworks XML document and the Adworks XSD document are created in the XML folder.
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
Data Validation
Only the pivot based reports are displayed fully. The rest of reports are snapshots not the entire data extracted from the database.
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
Another way to create the XML format is to use TSQL XML features like XML Auto and Elements to parse the Query result into an XML Format and extract into an XML file which can be read by SSIS. This method is much faster for smaller data not for big data in a laptop environment.
162
163
164
165
166
167
168
169
170
171
172
173
All of the results are abridged
174
Instead of inserting data for all country when the package is executed. SSIS will insert data using the county code and state code highlighted above and the additional countrycode to determine which table to populate
175
Only the Australian table is populated. The reaming tables were ignored because the condition of the expression on the conditional split did elevate to true
176
177
Australian Customer data All of the results are abridged
4
178
179
Canadian Customer All of the results are Abridged
180
American Customer All of the results are Abridged
181
182
183
184
185
186
Talend Open Studio 5.4
187
188
189
JDBC drivers have to be uploaded manually to make it easier to connect to different platform like Oracle, MYSQL, Sybase SQL Anywhere, Postgresql and DB2 . Talend allows ODBC to be utilized for connection instead of the traditional JDBC. I had worked with Java based applications like Oracle SQL Developer and JDeveloper, ODBC does not work well in these environments only if the option is available.
190
191
192
193
194
195
196
197
198
199
200
201
All of the results are Abridged
202
203
204
205
206
207
208
209
210
Results Abridged
211
212
Results Abridged
213
214
215
216
217
218
219
Results Abridged
220
Results Abridged
221
222
223
224
225
Results Abridged
226
227
228
229
230
231
232
233
234
235
236
Results Abridged
237
238
239
240
Results Abridged
241
Results Abridged
242
Results Abridged
243
Results Abridged
244
Results Abridged
245
246
247
248
249
Results Abridged
250
251
252
253
254
Results Abridged
255
Results Abridged
256
Results Abridged
257
Results Abridged
258
259
260
261
262
263
264
265
Results Abridged
266
267
268
269
270
271
272
Only the Excel files will be read and uploaded into the database
273
274
Results Abridged
275
276
277
278
279
280
281
282
Results Abridged
Results Abridged
283
Results Abridged
284
Results Abridged
285
286
287
Results Abridged
288
289
Results Abridged
290
291
292
Results Abridged
All the file names includes the countrycode passed through the context
293
294
295
296
297
298
299
300
301
302
303
304
305