choosing the right steps in pentaho kettle

Download Choosing the right steps in pentaho kettle

If you can't read please download the document

Upload: alex-meadows

Post on 16-Apr-2017

2.810 views

Category:

Technology


2 download

TRANSCRIPT

Choosing the Right Steps in Pentaho Kettle

Alex MeadowsBI Engineer, iContactAugust RTP PUG Meetup

Kettle (PDI) The ETL Swiss Army Knife

Over 100 steps

Plugin Architecture

Scripting Steps

Which to use?!?!?

Example: Loading a Text File

Text File Input, right?

Will work for most text files

Most powerful of text file inputs

There are other options in PDI!

Find the one that closely matches what you're trying to do

Example: Sharded Databases

Default feature of database connections

Non-dynamic, so have to update as needed

Example: Sharded Databases

Needed a dynamic sharded list

Built job and transformation to read from table and perform function on each shard in table

Plugins Add More Functions

Community contributions

Teradata Bulk Loader

R/Weka Integration

Treated as siblings of native steps

All native steps are in essence plugins.

Many eventually become part of the core product.

Processing handled directly within the engine, just like native steps

Scripting Steps

Greatest functionality/flexibility

Executes/compiles at runtime

Can dramatically slow performance

If script is used in multiple places, turn it into a plugin for potentially better performance

Recommended Reading

Pentaho Solutions (general BI audience)

Pentaho Data Integration Beginner's Guide (beginner)

Pentaho Data Integration Cookbook (intermediate)

Pentaho Kettle Solutions (advanced)