Getting Started with Adobe After Effects - Part 6: Motion Blur


Upload Image Close it
Select File

Browse by Tags · View All
#SQLServer 46
SQL Server 44
SSRS EXPERT 41
BI 36
#BI 33
brh 28
SSIS BEST PRACTICES 23
TSQL 19
MS BI STACKS 18
SSRS 17

Archive · View All
March 2011 24
July 2011 17
February 2011 15
April 2010 13
June 2011 11
May 2011 11
April 2011 11
July 2010 11
March 2010 8
December 2011 7

SSIS # 102 – Don’t be afraid to leverage TSQL in SSIS package

Nov 15 2011 5:28AM by Sherry Li   

I hope so far you are convinced that you need to learn at least one ETL tool in order to stay competitive in the SQL/BI profession. I also hope that Microsoft SSIS is your choice.

I’ve worked with two types of people with very different views on ETL tools.

One is to be very skeptical about using an ETL tool over hand coded SQL code. They are even more skeptical when they see I am using stored procedures in a SSIS package. Their argument is that if all can be done in a stored procedure, why use both stored procedures and SSIS. Would it be simpler to just stick to one tool?

Another type tends to completely abandon TSQL code, in favor of data flow tasks only, such as Union All, Merge, Sort, Lookup, Slowly Changing Dimension etc.

To the first group of friends, I hope they have eventually been convinced that an ETL tool is the way to go for ETL work.

Some high level benefits of SSIS:

  • visual representation of work
  • control flow (Including multiple execution paths, and conditional execution path)
  • data transformations from source to target
  • event logging
  • source and target do not need to be on the same server or same DBMS
  • vreusability
  • audit details
  • package configurations for secure and easy deployment
  • memory management
  • etc.

In this blog, I’d like share with you one very important lesson I’ve learned. That is never be afraid to leverage TSQL in SSIS packages.

Why use the Sort data flow task when you can write TSQL code to SORT the data?

Why use the Merge task when you can write TSQL code to JOIN tables?

Why bring too much from the source when you can write TSQL code with proper WHERE clause to limit it to only the data you need?

Why use Derived Column task when you can write TSQL code to derive the column?

Why use data flow at all if you can write a stored procedure that encapsulates all business logics and have it executed from a Execute SQL Task?

Set-based INSERT and UPDATE

This is not just because we want to leverage the SQL Server engine’s computing power, but also for another very good reason. A very simple reason. That is to prefer set-based INSERT and UPDATE over row-by-row in-memory operation.

If you are working with large amount of data, writing TSQL set-based code to do INSERT and UPDATE, can quickly become your only option. In this case, sending large changes into staging will be your first step.

A reader once asked me if using SCD (Slowly Changing Dimension) is the only way for loading dimensions. I hope you have drawn your own conclusion so far. No, it’s is not the only way. It’s not always the best way either.

Tags: SSIS BEST PRACTICES, #SQLServer, SQL Server,


Sherry Li
14 · 12% · 3697
2
 
0
Lifesaver
 
0
Refreshed
 
0
Learned
 
0
Incorrect



Submit

6  Comments  

  • World's not SQL. SSIS can be used to extract data from variety of sources, e.g. XML, and in that case you have no option but to use Merge task together with two Sort tasks to join data which is next put into an SQL table. In case of SQL your reasoning is 100% true.

    commented on Nov 28 2011 2:30AM
    chojrak11
    1009 · 0% · 25
  • chojrak11, You can of course use T-SQL to read xml too with OPENXML

    commented on Nov 28 2011 5:49AM
    Jonathan Roberts
    76 · 2% · 744
  • You can use OPENXML but why? Usually for small data sets or ad-hock tasks is SSIS good enough and it is faster do design than pure T-SQL. Then you go to large data sets and you need extraction, transfer and load optimization. For this task it is T-SQL or mix T-SQL and SSIS usualy better option.

    commented on Nov 28 2011 2:28PM
    jablonov
    1493 · 0% · 13
  • I'm not saying it's the best option just that you do have the option to read xml in T-SQL too.

    commented on Nov 28 2011 4:23PM
    Jonathan Roberts
    76 · 2% · 744
  • BWAA-HAAA!!! I keep looking for reasons to learn SSIS and everytime I turn around, I find reasons not to. ;-) "...Leverage T-SQL in SSIS...". " Usually for small data sets or ad-hock tasks is SSIS good enough..."

    And then I look at all of the questions on various forums about how to do what would be simple things in T-SQL and all the problems people have with doing those things in SSIS and I have to wonder, if I can so easily do all those things in T-SQL, do I really need SSIS for anything?

    No... I'm not badmouthing SSIS. I'm looking for someone to give me a good honest answer to give me incintive to spend valuable time learning to do something a different way when I already know of a good way to accomplish ETL for millions of rows per batch in a very high performance manner.

    commented on Nov 28 2011 9:27PM
    Jeff Moden
    166 · 1% · 291
  • I'm afraid this may become a religious issue. When there are a number of ways to do things, some better in some situations, others better in other situations, it can become a messy conversation.

    This is the same type of conversation as a relational programmer speaking with an object-oriented programmer; one preferring set-based operations and the other preferring class-object-iteration-based operations. Both will work. Both have advantages. Both have costs. Which is "best" often depends on knowledge available, support considerations and standards-in-place.

    Why use the Sort data flow task when you can write TSQL code to SORT the data?

    Why use the Merge task when you can write TSQL code to JOIN tables?

    -- in order to avoid tempdb pressure on a system in production use. By offloading the sorting to a separate machine running SSIS, the SQL Server source can firehouse the data out, leaving the sorting and joining (and the memory and tempdb overhead) to another machine.

    Why bring too much from the source when you can write TSQL code with proper WHERE clause to limit it to only the data you need?

    Why use Derived Column task when you can write TSQL code to derive the column?

    -- Another option would be to define a view, then leverage that view with Data Source, instead of using an Execute SQL task. Also, sometimes you don't have the ability to create objects within the source system. Sometimes it's actually faster to have SQL Server firehouse all of the data out, then have SSIS remove the unneeded data. Applying a WHERE clause can mean table-scans, index-scans, index-seeks, subqueries or joins, all of which are overhead and may cause locking issues on large datasets.

    Why use data flow at all if you can write a stored procedure that encapsulates all business logics and have it executed from a Execute SQL Task?

    SQL Server is a batch-oriented transactional system. Things happen as a unit of work... or they don't (and then roll-back). By setting batch sizes within SSIS tasks, you can limit and throttle the impact of large tasks on the source and destination database servers. This is not something you can do with stored procedures. Using stored procedures, you are going to deal with memory pressure, tempdb pressure and locking issues.

    Set-based INSERT and UPDATE

    This is not just because we want to leverage the SQL Server engine’s computing power, but also for another very good reason. A very simple reason. That is to prefer set-based INSERT and UPDATE over row-by-row in-memory operation.

    This works well when the source and destination are on the same server. If they are on different servers, you then have to content with MSDTC, linked servers or OPENROWSET/OPENDATASOURCE work-arounds which can quickly become problematic. These issues are especially troublesome when dealing with huge datasets over sometimes unreliable WAN links. If the connection drops, the transaction aborts and is rolled back. SSIS can tolerate this, stored procedures cannot.

    If you are working with large amount of data, writing TSQL set-based code to do INSERT and UPDATE, can quickly become your only option. In this case, sending large changes into staging will be your first step.

    I would take the opposite view: If you are working with large amounts of data, set-based INSERT and UPDATE can quickly lead to SSIS (or another ETL tool) being your only option. Especially if you are touching the data source multiple times (for example, feeding multiple fact tables in a data warehouse or data mart). Using stored procedures, every time I need to touch a table, I'm throwing locks and filling up tempdb. Using SSIS, I can read a table once, then split off multiple tasks that consume and process the data.

    A reader once asked me if using SCD (Slowly Changing Dimension) is the only way for loading dimensions. I hope you have drawn your own conclusion so far. No, it’s is not the only way. It’s not always the best way either.

    I completely agree... there is not one-single way to get these types of tasks accomplished. My personal preference is to have the source systems dump data into text files using bcp, then loaded at the destination the same way. bcp has a configurable batch size to reduce tempdb pressure and text compresses very well when transmitted across WAN links using WAN acceleration. In the absense of WAN acceleration, the text files can be compressed using ZIP, 7z, or RAR, then transmitted across a WAN. This also works well across VPNs

    commented on Nov 30 2011 5:18AM
    Marc Jellinek
    97 · 2% · 546

Your Comment


Sign Up or Login to post a comment.

"SSIS # 102 – Don’t be afraid to leverage TSQL in SSIS package" rated 5 out of 5 by 2 readers
SSIS # 102 – Don’t be afraid to leverage TSQL in SSIS package , 5.0 out of 5 based on 2 ratings
    Copyright © Rivera Informatic Private Ltd Contact us      Privacy Policy      Terms of use      Report Abuse      Advertising      [ZULU1097]