Hope that you have a basic understanding of ETL, SSIS and BIDS. So without much delay, let us start getting our package to move data from a flat file to a database table. The table does not exist and needs to be created while development. As this is a basic package, we would not be looking into all the 3 component of ETL. We would just look into EXTRACT and LOAD. I will demonstrate in detail the transformations later.
The source file is in the following csv format, with the column header in the first row:
Name, Age, Sex
Start BIDS and in the basic tools click New Project or Ctrl+Shift+Nas shown in the image below. You could also go to Menu-> File-> New-> Project
On doing the above the following window (see image below) will pop up, you would need to do the following:
- Select Project types to Business Intelligence Projects.
- In Templates select Integration Services Project.
- Set the Name of the Project
- Browse the location where you would like to save the project created.
- Click OK.
This is the basic settings you need to do to start the package creation.
Once you complete the above process, you see the screen as in the next figure. This is BIDS. If you do not see the Tools panel on the left, click the Tools buttons I have highlighted on the top. This will help you see the various other panels that can be visible to you. On the right, you can see the Solution Explorer. It has the solution with the same name you provided in the previous step.
Under SSIS Packages you would see a default package by the name
Package.dtsx being created. It is a blank package as you see in the middle design pane. It is blank. If you wish, you could rename this package.
Once you are familiar with the various aspects of BIDS that you see in the image above, we could start developing the package.
Just below the
Package.dtsx[Design]* you would notice 4 tabs and by default you are on the Control Flow tab. From the left hand Toolbox drag drop a Data Flow Task onto the Control Flow designer pane that you see. Once you drop the Data Flow Task onto the designer pane you would see the screen similar to the screen below.
Double click the Data Flow Task you dropped onto the Control Flow. You will notice that the tab has changed from Control Flow to Data Flow (see the image below).
If you have a look at the Toolbox now, the tools have changed. These tools are specific to Data Flow Task. We will talk about them in detail later.
As of now all you need to do is drop 2 transformations (Flat File Source and OLEDB Destination) from the Toolbox onto the Data Flow pane. You will notice that from the Flat File Source, there is a green arrow projecting downwards, click this and drag it to the OLEDB Destination (the other transformation)
NOTE: The tools on the Control Flow are called Tasks, while in Data Flow tasks the tools are called Transforms/Transformation
Flat File Source: This transform as the name suggests is used to define the source from where the data needs to be pulled. We need to define the metadata of the data we wish to pull. We will see how to do this in the next few steps.
NOTE: In both the transforms that you pull onto the data flow task pane, you will notice a red cross mark on the right side of the transforms. This is because the transforms are not yet setup correctly and if we execute the package now, we will get an error.
We should now start with setting up the Flat File Source. Double click the Flat File Source; you will notice the Flat File Source Editor window (see image below) pop-up. In the source, the first thing we need is to create a connection to read the file containing the information to be consumed. Since this is a new package and we do not have existing connections, we need to click the New button to create a new connection.
Flat File Connection Manager Editor (image below):
This is the editor used to create a flat file connection. You may provide a name to the connection for easy tracking and a description else, the names will be defaulted.
Click the Browse button to select the flat file we need to use.
NOTE: The file needs to exist while development, even though with just one record in it.
If the file is in a simple csv format, the other settings will be set automatically, namely the code page, text qualifier, Header row delimiter, Header rows to skip (set to zero by default)
You need to check the “Column names in the first data row” as the file we are using has header record. If we do not check this box the header row would be treated as data.
You do not need to make any changes in the Columns and Advanced tab as we are treating all the records as string for simplicity and we leave the default settings to be used. Since we have mentioned that the first row has header information the column names would be set as per the data in the first column.
Go to the Preview tab and see if the data is the format you expect and you do not get any error message. In my case, I can see 3 columns with 3 rows in the file. The headers are set correctly.
Click OK to come out of Flat File Connection manager.
Preview the Flat file source and click OK.
NOTE: Now if you see the Flat file source in the Data Flow Task, you will notice that the red error mark is gone from the Flat file source and yet visible on the OLEDB Destination.
Next we need to set up the OLEDB Destination. Double click the OLEDB Destination. Similar to the Flat file source, first we need to set up a connection to the SQL Server. Since this is the first package and we do not have any existing OLEDB Connection we need to set up one. For doing this click on the New button highlighted below in the Fig. 9.
The next image shows the existing connection on the left pane but we have not yet created one, hence we would need to do that. Click the New button on this pop up.
Now provide all the information required for the connection to be setup namely: Server Name, Logon type, and the database.
Note: If you have the SQL server on the local machine with no instance you can use period ‘.’ instead of localhost or the machine name.
Click the Test Connection button to verify the validity of the credentials provided and click OK. If there is no error we are good, else check the credentials again.
Select the Connection we just created. As I had mentioned earlier the table is not created when the development starts. SSIS provides a nice and simple way to create the table during development based on the Meta data we provide for the source and the transformations applied (in our case not transforms).
To do the above, click the New button highlighted below in the image below.
On clicking the New button, you will see a small text box pop up with the query to create the table in the SQL server. You can make changes to the name and the column types etc as per your wish. This is just a guide and not binding on you to follow the table definition provided.
I changed the name of the definition for clarity.
Click Ok after you are satisfied with the table definition. Click Ok on the OLED Destination pop-up as well and come to the Data Flow pane. You will see something similar to the next screenshot. Note that the red mark has gone from both the source and destination. :)
This means that the Source and destination are setup correctly. If there is no environmental error, the package will execute successfully.
To execute the package (i.e. perform the intended task) click function key F5 or click the green Play button marked in red as seen in the screen shot below.
You will notice that the Source and Destination turn yellow then green. It will happen too fast for you to notice the yellow. On turning green, you will see that there is a number 3 alongside the arrow connecting the source and destination. This depicts how many records have been sent from the source to destination.
The Source and Destination turning Green signifies that the 2 have performed their task successfully.
If you remember, we were doing all this in the Data Flow Task. To see if the data flow is successful overall, we move back to the Control Flow tab and see that the Data Flow task we had turned green. Since our package was a simple package and no additional tasks were there in the Control flow, we do not need to worry about other tasks failing or passing. If the Data Flow is green our package is successful. :)
Lastly, before closing this article, let me show you the log that is generated during the package execution. Go to the Progress tab the last tab to the right of the Control Flow Tab. Check the screenshot below for reference. Here you can see how much time was consumed in the execution what the steps were taken etc.
Congratulations!! You now have a fully functional SSIS Package with you. You can now have a look at other tasks and transforms available in SSIS. Explore that's the key in SSIS.
In the next articles, I will dive deeper into the concepts of Control Flow and Data Flow.