Getting Started with Adobe After Effects - Part 6: Motion Blur
First Time? You can support us by signing up. It takes only 5 seconds. Click here to sign up. If you already have an account, click here to login.
Loading

1st Prize - Apple iPad


SQLServer Quiz 2011 - Best SSIS Components for generalized Extraction and Snchronization Process

  • Problem Statement: Requiement from University to load all college information into University Repository. Each College is independent of other in all ways.

    Job is to fetch data once in month from all colleges and stores in University Database

    For more Simplicity we have to fetch following information

    Student: ID, Name, Dgree, DOB

    Teacher: ID, Name, Subject, DOB

    Now my SSIS Solution Will contain Following Three Master Package

    1. Extraction: Fetching data from various colleges from their Data source (Let say: We will get Excel From Each college on monthly basis having name same as college name)

    2. PreProcessing: Remove null and make ready to load again in University Database

    3. Synchronization: Updation/Insertion/Deletion In University Database to make it ready to use

    Now Question Is How Efficiently, How Fast I am going to process this SSIS Task

    1. Answer should contain Small Explanation of each process (Even How to Use Task will also do)

    2. Also there is requirement of processing real time SSIS package so that if any college submitted excel late to University, How we are going to achieve same

    Posted on 06-16-2011 00:00 |
    Vishal Pawar
    24 · 7% · 2229

3  Answers  

Subscribe to Notifications
  • Score
    8

    BDD: A Yes to BDD. It distributes the data to multiple paths in equal proportion. It is good because it redirects buffers to various outputs instead of redirecting an individual row. It means it works on a set of rows and it is faster. BDD is intended for parallelsism so make sure that the package is running on server with multiple cores and various disk spindles supportingn the destination database. It is more useful when the transformations are taking much time. In that case create various outputs from BDD and apply the transformations across all the BDD outputs.

    MaxConcurrentExecutables (Max tasks that can be executed in parallel) is defintely a propety to tweak around in case of parallelism is needed. It depends on the number of processors on the server running the packages.

    Some times it is good to have some Asynchronous component (eg: UnionAll) withing the data flow task to add execution tree or execution path so as to get more thread to work upon.

    In case there are independent data flow paths inside the data flow task, engine thread property of the data flow task can be changed to gain parallelism.

    Multicast component can also be used to introduce execution sub paths inside an execution path so as more parallelism can be achieved.

    So, if my data volume is huge and lot of transformation components are needed without any specific ordering for the rows, I will use BDD. If my destination is not supporitng paralle load, I can merge the BDD output paths using a Union All (it will add a executuion tree also) and then do the insert into destination. If there are tasks at control flow that can be executed in paralle then MaxConcurrent executables property can be modified to gain parallelism.

    Replied on Jun 16 2011 7:52AM  . 
    Nitesh Rai
    135 · 1% · 361
  • Score
    8

    "Fast processing" is a very borad-range word. In real-world scenarios, it depends on a lot of things.

    The Balanced Data Distributor (BDD) is not my favorite, because it depends on too much environment issues. Just splitting data wihtout the possibility to define conditions (as a conditional split does) leaves you with either having multiple destinations that compete for IO or locks, or you have to duplicate your transformation logic, if you have such. It could, however, speed up processing if your trasnformation logic contains blocking components (e.g. script or sort components). In this case, the benefit lies in that the component will wait until the whole data has been read in, and because the data is split into parts, it will wait only a fraction of the time it dows ithout BDD.

    A possible scenario where BDD would speed up your ETL is when your first data flow converts the data into something more readable for the rest of the ETL (using column conversion) and presists it into a RAW file. Here using the BDD will produce multiple RAW files that then can be precessed in parallel by multiple Data Flow tasks.

    The MaxConcurrentExecutables setting can speed up your code. I tend to leave it on something like 255 and let the SSIS runtime decide how much threads it will choose.

    My favorite is Conditional Split, especially if you receive data that contains data of multiple partitions of your data destination. The split will then separate data on a partition level so you can use one Data Destination per partition and will benefit from the speed-up, becuse the SQL insert operations are independent from one another (and the disk I/O hopefully is, too).

    A general strategy to speed up is to avoid blocking components like Sort or Union. Thinking about your algorithm and using conditional split and Multicasts widely has always been a good idea with me.

    Replied on Jun 20 2011 6:53AM  . 
    Guenter
    28 · 6% · 1838
  • Score
    0

    BDD with MaxConcurrentExecutables in parellel threads

    Replied on Jul 18 2011 12:06AM  . 
    Vishal Pawar
    24 · 7% · 2229

Your Answer


Sign Up or Login to post an answer.
Please note that your answer will not be considered as part of the contest because the competition is over. You can still post an answer to share your knowledge with the community.

Copyright © Rivera Informatic Private Ltd.