I'm afraid this may become a religious issue. When there are a number of ways to do things, some better in some situations, others better in other situations, it can become a messy conversation.
This is the same type of conversation as a relational programmer speaking with an object-oriented programmer; one preferring set-based operations and the other preferring class-object-iteration-based operations. Both will work. Both have advantages. Both have costs. Which is "best" often depends on knowledge available, support considerations and standards-in-place.
Why use the Sort data flow task when
you can write TSQL code to SORT the
Why use the Merge task when you can
write TSQL code to JOIN tables?
-- in order to avoid tempdb pressure on a system in production use. By offloading the sorting to a separate machine running SSIS, the SQL Server source can firehouse the data out, leaving the sorting and joining (and the memory and tempdb overhead) to another machine.
Why bring too much from the source
when you can write TSQL code with
proper WHERE clause to limit it to
only the data you need?
Why use Derived Column task when you
can write TSQL code to derive the
-- Another option would be to define a view, then leverage that view with Data Source, instead of using an Execute SQL task. Also, sometimes you don't have the ability to create objects within the source system. Sometimes it's actually faster to have SQL Server firehouse all of the data out, then have SSIS remove the unneeded data. Applying a WHERE clause can mean table-scans, index-scans, index-seeks, subqueries or joins, all of which are overhead and may cause locking issues on large datasets.
Why use data flow at all if you can
write a stored procedure that
encapsulates all business logics and
have it executed from a Execute SQL
SQL Server is a batch-oriented transactional system. Things happen as a unit of work... or they don't (and then roll-back). By setting batch sizes within SSIS tasks, you can limit and throttle the impact of large tasks on the source and destination database servers. This is not something you can do with stored procedures. Using stored procedures, you are going to deal with memory pressure, tempdb pressure and locking issues.
Set-based INSERT and UPDATE
This is not just because we want to
leverage the SQL Server engine’s
computing power, but also for another
very good reason. A very simple
reason. That is to prefer set-based
INSERT and UPDATE over row-by-row
This works well when the source and destination are on the same server. If they are on different servers, you then have to content with MSDTC, linked servers or OPENROWSET/OPENDATASOURCE work-arounds which can quickly become problematic. These issues are especially troublesome when dealing with huge datasets over sometimes unreliable WAN links. If the connection drops, the transaction aborts and is rolled back. SSIS can tolerate this, stored procedures cannot.
If you are working with large amount
of data, writing TSQL set-based code
to do INSERT and UPDATE, can quickly
become your only option. In this case,
sending large changes into staging
will be your first step.
I would take the opposite view: If you are working with large amounts of data, set-based INSERT and UPDATE can quickly lead to SSIS (or another ETL tool) being your only option. Especially if you are touching the data source multiple times (for example, feeding multiple fact tables in a data warehouse or data mart). Using stored procedures, every time I need to touch a table, I'm throwing locks and filling up tempdb. Using SSIS, I can read a table once, then split off multiple tasks that consume and process the data.
A reader once asked me if using SCD
(Slowly Changing Dimension) is the
only way for loading dimensions. I
hope you have drawn your own
conclusion so far. No, it’s is not the
only way. It’s not always the best way
I completely agree... there is not one-single way to get these types of tasks accomplished. My personal preference is to have the source systems dump data into text files using bcp, then loaded at the destination the same way. bcp has a configurable batch size to reduce tempdb pressure and text compresses very well when transmitted across WAN links using WAN acceleration. In the absense of WAN acceleration, the text files can be compressed using ZIP, 7z, or RAR, then transmitted across a WAN. This also works well across VPNs