"Fast processing" is a very borad-range word. In real-world scenarios, it depends on a lot of things.
The Balanced Data Distributor (BDD) is not my favorite, because it depends on too much environment issues. Just splitting data wihtout the possibility to define conditions (as a conditional split does) leaves you with either having multiple destinations that compete for IO or locks, or you have to duplicate your transformation logic, if you have such. It could, however, speed up processing if your trasnformation logic contains blocking components (e.g. script or sort components). In this case, the benefit lies in that the component will wait until the whole data has been read in, and because the data is split into parts, it will wait only a fraction of the time it dows ithout BDD.
A possible scenario where BDD would speed up your ETL is when your first data flow converts the data into something more readable for the rest of the ETL (using column conversion) and presists it into a RAW file. Here using the BDD will produce multiple RAW files that then can be precessed in parallel by multiple Data Flow tasks.
The MaxConcurrentExecutables setting can speed up your code. I tend to leave it on something like 255 and let the SSIS runtime decide how much threads it will choose.
My favorite is Conditional Split, especially if you receive data that contains data of multiple partitions of your data destination. The split will then separate data on a partition level so you can use one Data Destination per partition and will benefit from the speed-up, becuse the SQL insert operations are independent from one another (and the disk I/O hopefully is, too).
A general strategy to speed up is to avoid blocking components like Sort or Union. Thinking about your algorithm and using conditional split and Multicasts widely has always been a good idea with me.
Replied on Jun 20 2011 6:53AM
.