In SSIS Data Flow Task (DFT) is the most used task as almost all the ETL is done in DFTs. Situation do arise when we need to use data between two or more DFTs. So how can this be achieved? Well there are a number of ways you can save you data in one DFT to so that you access the same data in another DFT. Below are the different ways to store the data in the intermediate process, to be used again in another DFT or even another package.
- Raw File
- Flat File
- SQL Table
- Excel File
- Recorset destination with Script component as the source
Let's elaborate each method.
While saving the data as a Raw File it is saved in the native form hence it is much faster while reading or writing the data to it. As the metadata is same as the SSIS data types, therefore there is no need of conversion of data type while reading or writing. This greatly improves the performance (IO) in case we are dealing with huge amount of data.
The metadata is stored as per our settings in the DFT hence consistency is maintained in the data. They are light weight as compared to other options available.
It is not easily editable as it requires SSIS to read the data or modify it. It could be an advantage as well as a disadvantage. It can just be used in the DFT.
Flat file destination or source can process various kinds of file like text, csv, etc.
It does not have any coupling with the metadata. It is difficult at times to know the data type of each column and it is prone to cause data errors while fetching the data from flat file. Because of this very reason the IO read and write time goes up as the data needs to be converted to appropriate form before reading.
It too is light weight and is used pretty often especially if you want to attach the data to a mail or make it available to people without SSIS. Unlike Raw file it is not dependent on SSIS for viewing or editing.
SQL table is the best choice if you need to store the data and not consume the entire data. In case there are multiple DFTs and each one requires some specific data then SQL table is the best option. Just write the appropriate query and you have the necessary data. There is no need to use any conditional split to filter data. Also a lot of derived column operations could be done in the source itself.
In case you need to get data based on other tables you could perform joins as well in the source. One of the big advantages of SQL server is that you can sort the data in the source (SQL Server) itself as the SSIS sort transform is very expensive.
SQL Server has perfect metadata association so no need to additional validation for data types if we need to use the data in the existing format.
The data security could be applied very efficiently to the tables. Even the data backup and restore is pretty foolproof.
Data is available for simultaneous access and the data locks are maintained by SQL Server effectively. We have the option of using transactions while loading the data in order to maintain the consistency of load.
On the flip side SQL Server maintenance is huge task in itself. SQL server itself needs a lot of resources.
This is one of the options which I do not prefer to use. There are a lot of issues while reading or writing data to excel file. Few common issues faced are:
- Conversion between Unicode and non-Unicode.
- SSIS determines the data type automatically based on sampling first few records and returns NULL for the records it finds with inappropriate data types. Though there is a workaround by setting adding IMEX =1 in the connection string but this also does not guarantee proper data every time.
On the positive side excel file allows the data to stored is in proper tabular format and we can query the data as per our need.
Recordset Destination and Script Component as source
Again I would not recommend this as it is extremely time consuming when compared to Raw file (check the references below).The other methods explained above are much simpler to use and configure. Hence I will not be talking more on this topic.
Mostly we use the first three options and there is no hard and fast rule as to which should be used. Based upon the requirement you need to make the selection which one fits into your scheme the best.