Getting Started with ASP.NET MVC - Part 5: How to do programming with razor syntax
First Time? You can support us by signing up. It takes only 5 seconds. Click here to sign up. If you already have an account, click here to login.


Upload Image Close it
Select File

Alpesh Patel is working on Various Microsoft Products (Visual Studio 6.0/2005/2008, SQL Server 2005/2008, BI Tool etc)

Administrators

Browse by Tags · View All
SQL Server 6
#SQLServer 4
#BI 3
BI 3
SSIS 3
SSRS 3
TSQL 2
#TSQL 1
#SQL Server 1

Archive · View All
September 2011 3
October 2011 2
February 2012 1
November 2011 1
August 2011 1

Alpesh Patel's Blog

Data Profiling Task to Validate data before load to EDW

Feb 1 2012 9:36AM by Alpesh Patel   

Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed.

Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve the problem of all SSIS developer.

The task analyze data in a SQL Server database and, from the results of that analysis, generate XML reports that can be saved to a file or an SSIS variable. By configuring one or more of the task's profile types, you can generate a report that provides details such as a column's minimum and maximum values, or the number and percentage of null values, or a column's min and max length with row count, or a column's value with row count, or a column's value's pattern with frequency, or column's functional dependency or a column's candidate key ratio etc..

 

Despite the variety of statistics that the Data Profiling task can provide, the task's practical applications might appear limited when you first try to implement it. After all, what use are data statistics to the automated processes in an SSIS package? However, by accessing the results generated by the Data Profiling task, you can design a workflow that automatically determines the appropriate actions to take based on the validity of the source data.

In this article, I describe a sample SSIS package that includes the Data Profiling task. The task is configured to generate a report based on the Value Inclusion profile type, which provides details that let you determine whether one or more columns in the source table are suitable as a foreign key. Although this type of profile might not seem particularly useful to an SSIS package, it can actually be quite valuable when loading data because you can use it to compare source data with valid data in your target database to determine the validity of the source data.


Let's take sample to understand the functionality of Data Profiler Task.

Capture

1. You want to check length of particular column data length before load to actual table. It may help to prevent your package to crash if data length is higher than the column width.

This moment you can use "ColumnLengthDistributionProfile" and get consolidated details related to given table.

Capture4

You have to pass the name of table and columns which you wish to check for data length.


2. You want to check Null value in columns before load to actual table. It may help to prevent your package to crash due to Null insertion.

This moment you can use "ColumnNullRatioProfile" and get consolidated details related to given table.

Capture1

You have to pass the name of table and columns which you wish to check for Null value count.


3. You want to check min and max value in columns with it's frequency

This moment you can use "ColumnStatisticsProfile" and get consolidated details related to given table.

Capture3

You have to pass the name of table and columns which you wish to check for min and max value and it's frequency in column.


4. You want to check value in columns before load to actual table. It may help to load conditional to warehouse.

This moment you can use "ColumnValueDistributionProfile" and get consolidated details related to given table.

Capture5

You have to pass the name of table and columns which you wish to check for value and frequency of that value in column.


5. You want to check value pattern in columns before load to actual table. It may help to load conditional to warehouse.

This moment you can use "ColumnPatternProfile" and get consolidated details related to given table.

Capture6

You have to pass the name of table and columns which you wish to check for pattern value and frequency of that value in column.

Please refer Log-File.xml for sample Data Profiler Output.

Capture7

You can view the output using the Tool, which is available at
    %programfiles%\Microsoft SQL Server\100\DTS\Binn\DataProfileViewer.exe

Thanks,

Alpesh

Tags: SSIS, SQL Server,


Alpesh Patel
45 · 5% · 1023
4
 
0
Lifesaver
 
0
Refreshed
 
0
Learned
 
0
Incorrect



Submit

Your Comment


Sign Up or Login to post a comment.

"Data Profiling Task to Validate data before load to EDW" rated 5 out of 5 by 4 readers
Data Profiling Task to Validate data before load to EDW , 5.0 out of 5 based on 4 ratings
    Copyright © Beyondrelational.com Contact us      Privacy Policy      Terms of use      Report Abuse      Advertising