I got a very interesting scenario in Ask Sudeep section of BeyondRelational.com. Let me share the scenario first and then will explain how to go about it.
There is a folder which contains a number of files. The file name follows the following naming convention <Alphanumeric>.<Number> Alphanumeric could be any valid alphanumeric characters and the Number part would have just numeric values and both would be separated by a period [.]. Examples of file name file1Name.1, file1Name.2, file2Name.5 etc. Now the requirement says that for a given group of Alphanumeric part which are same we need to start processing the files from the file having the minimum numeric part and keep on processing the file till we get the numeric part incrementing by one for the files. Then move to the next set of files with another Alphanumeric part. Better to see an example, will give a clear picture.
List of files in the folder:
There can be 2 sets of files based on the alphanumeric part once with a and the other with b. From the above list we need to process just the following files:
As there is a break in the sequence for the file group “a” as a.3 is missing and similarly b.12 is missing so no files with larger numeric part will be processed.
Before starting the work on the Package we need to create 2 tables.
CREATE TABLE [dbo].[FileName1](
[FileName] [varchar](200) NOT NULL,
[SeqNum] [int] NOT NULL,
[FQDN] [varchar](200) NOT NULL
CREATE TABLE [dbo].[FileName](
[FileName] [varchar](200) NOT NULL,
[SeqNum] [int] NOT NULL
Here is the snapshot of the package.
The first Execute SQL task just truncates the table FileName1
Next in the Foreach loop we set it to loop through each files in the folder containing the above files. Here we map the Fully Qualified file name to a variable strFQDN. Next based on this variable and using expressions extract the Alphanumeric and the numeric parts of the file name and save them respectively to the variables: strFileName & intSeqNo. Now in the Execute SQL Task in this Foreach Loop I just insert the value of these 3 variables: strFileName, intSeqNo & strFQDN to the table FileName1 created by the above query. On completion of this loop there will be one record for each file in the folder in the table FileName1.
Next we come to the next Execute SQL Query. What this does is fetches the full file path of each record saved in it. The records are grouped by the fileName and ordered by the SeqNum. This is the query used here:
GROUP BY FileName, seqNum, FQDN
Order BY seqNum
The outcome of this query(full result set) is saved to an object variable objFQDN.
Now comes the place where we do the file selection. The last for each loop that you see in the flow is the core. The enumerator for this is set to “Foreach ADO Enumerator” and the ADO object source variable is mapped to the variable objFQDN. So this loop will run for each file in the folder above but in a ordered manner as we are using the above query to get the grouped and ordered file paths. In the variable mapping I again map it to the variable strFQDN and the other 2 variables(strFileNae & intSeqNo) are getting computed using expressions.
If we look into the foreach loop the first task we have is an Execute SQL task. This task check for 3 things in the 2nd table FileName:
- Is this the first entry to the table?
- Is the Alphanumeric part of the filename (strFileName) new in the table?
- Is there another record in the table with SeqNum just one less than the numeric part (variable intSeqNo) of current file being processed? This implies that the current files numeric part is one greater than the maximum value in the table hence this file follows the sequence.
In case any of the above condition is true we need to process the file. So the query I have used is as follows(T-SQL gurus could enhance this as this may not be the most efficient way, but yes it does manage some incorrect entries in the table FileName)
Declare @FName varchar(50)
DECLARE @SeqNum int
SET @FName = ?
SET @SeqNum = ?
SELECT SUM(CNT) FROM
SELECT COUNT(1) AS CNT, 'a' as col
FROM FileName tbl1
LEFT OUTER JOIN FileName tbl2
AND tbl1.SeqNum = tbl2.SeqNum-1
WHERE tbl2.SeqNum IS NULL AND tbl1.SeqNum+1 = @SeqNum
SELECT COUNT(1) AS CNT, 'a' AS col
WHERE FileName <> @FName
SELECT 1 AS CNT, 'a' AS col
) a GROUP BY col
This query returns a numeric value. If it is >0 it means that the above conditions have been satisfied and we can process the file. We save the value of this Execute SQL Task to an integer variable intCount.
We now need to process the file if the above variable has value greater than 0. I have used a Script task next which can be replaced by any task that you want to process the file. You could use a Data flow task or a file system task or FTP task etc here. But keep in mind we need to connect what ever task after the Execute SQL Task with precedence constraint set to EXPRESSION and the expression needs to be:
After this I have a File system task which puts the file processed above to an archive location so that the file is not processed(Initial part of the package as later the above query will not process it any further) again and the performance will be better as the files will keep growing in production environment.
Lastly in the foreach loop I have a last Execute SQL task :) which inserts the Alphanumeric part of the processed file and the numeric part of the file to the table FileName so that we can do the above condition check on for the next file in the queue.
The expressions that I used to extract the alphanumeric part and the numeric part from the fully qualified file name are listed below.
SUBSTRING ( RIGHT(@[User::strFQDN],FINDSTRING(REVERSE(@[User::strFQDN]),"\\",1)-1) ,
FINDSTRING( RIGHT(@[User::strFQDN],FINDSTRING(REVERSE(@[User::strFQDN]),"\\",1)-1) , ".",1 ) -1
(DT_I4) SUBSTRING( @[User::strFQDN] , FINDSTRING( @[User::strFQDN] ,".", 1) +1, LEN(@[User::strFQDN] ) - FINDSTRING( @[User::strFQDN] , ".", 1) +1)
That’s it. Hope this helps. Would like to hear from you or any other approach that you would take to solve this.