Getting Started with Adobe After Effects - Part 6: Motion Blur
First Time? You can support us by signing up. It takes only 5 seconds. Click here to sign up. If you already have an account, click here to login.
Loading

1st Prize - Apple iPad


SQLServer Quiz 2011 - Use of Foreign Key constraints

  • Should Foreign Key constraints be used in a Data Warehouse?

    Posted on 06-07-2011 00:00 |
    Mike Lewis
    42 · 4% · 1336

28  Answers  

Subscribe to Notifications
Previous 1 | 2 | 3 Next
  • Score
    0

    If manually running a bunch of extra queries to duplicate the functionality of FK relationships on a one-off basis is your development standard, then we can bet the farm that everyone is doing that. We can also bet that no one will make any mistakes. Ever.

    I wouldn't consider running a bunch of extra queries as part of an ETL process. I'd use SSIS's Lookup Transformation and Cache Transform to handle this for me. If there are key mismatches, I can route the mismatched rows to an error table. This should be a part of a template used by developers when creating new load processes.

    Thank goodness you don't have any metadata/process control tables in that database that occasionally require maintenance in the form of updates and deletions.

    That's a good point. I generally consider metadata/process control tables to be "plumbing" and don't pay much attention to them. Yes, they would be UPDATEd and DELETEd as things change.

    I'm interested to know how you restrict sysadmins from issuing UPDATE and DELETE statements against tables.

    The same way you prevent them from overriding or removing FK constraints... you can't. You have to trust your sysadmins and let them know what is allowed and what is not.

    So as I said, you value speed of loading any data over the ongoing integrity of good data. This is fine if it's what you want to do, it's not for me.

    No. Ongoing integrity is just as important. I often choose to ensure ongoing relational integrity through ETL processing, not FK constraints. Same destination, different road.

    Your example seems to demonstrate one of the cases I mentioned where FK constraints did not degrade performance

    They did not degrade query performance, they would have degraded load performance.

    Replied on Jun 26 2011 1:06PM  . 
    Marc Jellinek
    97 · 2% · 546
  • Score
    0

    "I wouldn't consider running a bunch of extra queries as part of an ETL process. I'd use SSIS's Lookup Transformation and Cache Transform to handle this for me. If there are key mismatches, I can route the mismatched rows to an error table. This should be a part of a template used by developers when creating new load processes."

    If a manual DML change is warranted (a one-off change) for some reason (which was my original statement on this), such as inserting a new row manually for some reason do you create an entirely new package for that? The original question here was if a sysadmin or developer needs to change data in a table for some reason (insert a row, update a value, etc.), how do you continue enforcing relational integrity. Your original answer seemed to indicate that in this situation the person making the change would need to manually validate data integrity as part of your standard process.

    Side Note: BTW, the Lookup Transformation (with or without Cache Transformation) is not necessarily the most efficient way to match up keys in all situations. I recently refactored a handful of Lookup Transformations in a single package that took 3.5 hours to run. It now takes 4 mins 28 secs to run. This also brings us to another point I briefly mentioned earlier, about wall-clock speed as a simplistic measure of performance. If you're caching millions of large rows in a Cache Transformation you will be using a lot of server resources just to cache that information. If you are pulling all these millions of rows across the network you're also utilizing network resources as well. If there are other processes running on the server or pulling data across the network, this could be degrading their performance... It's all about the tradeoffs. Of course the SCD Wizard is right out.

    "The same way you prevent them from overriding or removing FK constraints... you can't. You have to trust your sysadmins and let them know what is allowed and what is not."

    A U.S. President once said, "Trust but verify." With DDL changes like FK constraint changes you can vigilantly monitor in many different ways. Regular polling of database metadata and comparison to the expected model metadata, DDL triggers, etc. What type of analogous verification/validation process do you have to verify that your sysadmins have done the right thing? When it comes to my databases I'm just not willing to Gamble that others will always do the right thing or not make mistakes.

    "No. Ongoing integrity is just as important. I often choose to ensure ongoing relational integrity through ETL processing, not FK constraints. Same destination, different road."

    Negative on that. Your ETL processing stops checking data integrity after the data load has been completed. FK constraints continue to ensure relational integrity on an ongoing basis, protecting you against (1) manual DML changes directly to the database [purposely or accidently], (2) bugs in your ETL process, (3) bugs in other processes that manipulate the data [if there are any, of course]. One-shot validation of your relational integrity during ETL processing is only halfway down the road to ongoing relational integrity.

    "They did not degrade query performance, they would have degraded load performance."

    It all depends on where your priorities are. As you indicate above, there's a tradeoff to be made here. If load performance is your priority then you'll choose it over the other factors.

    Replied on Jun 26 2011 4:00PM  . 
    Michael Coles
    156 · 1% · 302
  • Score
    0

    I think this is a great conversation, and certainly covers a huge amount of the arguments for and against foreign key constraints! I knew when I set the question that there was an element of subjectivity about it, and that different people sit in different camps for this one.

    It's fair to say that the Data Warehouse is complicated beast, and there's no single correct way of building one - you certainly need to design your architecture and database(s) around the requirements of your customer, rather than sticking to one approach simply because you have always done it that way. For example, a retail-based Data Warehouse might need speed of reporting as a primary function, to ensure directors can run lots of reports quickly to "slice and dice" the data. Alternatively, a bank may require data accuracy as a primary function so they can run reports that are accurate to the penny/cent/lowest unit of curreny.

    With that in mind, I want people to put forward valid arguments for/against the use of foreign key constraints. I would encourage those people who have only answered "yes" or "no" to add some reasoning behind their answer. Ultimately, the points awarded is up to me ;-) but there is certainly merit in being able to state why you think you're correct.

    Replied on Jun 27 2011 4:17AM  . 
    Mike Lewis
    42 · 4% · 1336
  • Score
    0

    @Mike

    Good question, really opened my eyes. I'm surprised at the level of importance people place on load speed, and that so many seem to consider removing FKs the low-hanging fruit of ETL "performance tuning."

    You mentioned:

    For example, a retail-based Data Warehouse might need speed of reporting as a primary function, to ensure directors can run lots of reports quickly to "slice and dice" the data. Alternatively, a bank may require data accuracy as a primary function so they can run reports that are accurate to the penny/ cent/lowest unit of curreny.

    I think we've shown that FK constraints help with both of these issues. So if these are the two primary requirements, they both support using FKs. If the priority is ETL load speed over these other two items then getting rid of FKs/using a random # generator, etc., might be the answer. BTW, what good is the fastest report in the world if it's wrong?

    Even folks who value ETL load speed above all else should consider optimizing their packages before dropping FKs. Having completely rewritten hundreds of inefficient SSIS packages I've found the biggest ETL performance issue to be poorly written code, not FKs.

    Replied on Jun 27 2011 10:01AM  . 
    Michael Coles
    156 · 1% · 302
  • Score
    2

    NO, for performance sake

    Replied on Jul 2 2011 9:37AM  . 
    Taha Amin
    2271 · 0% · 5
  • Score
    4

    According to me its depends on requiremnt 1.Yes we should use if we are having reporting or daily anlaysis or data mining based on this Data base for fastern our process we should have 2.No if i am not using my DB frequently we strait away remove FK

    In a way i mean to say we can have both approch include foreing key in table which are higly used nut less loading is there and remove if we are already passing it through set of rule by joining

    Replied on Jul 6 2011 3:58AM  . 
    Vishal Pawar
    24 · 7% · 2229
  • Score
    3

    Absolutely YES. Though OLAP Databases are not having such complexity in JOINS and all like OLTP but it needs some level of relational integrity. My Suggestion is you should have FKeys in OLAP.

    Thanks Manoj

    Replied on Jul 7 2011 12:05AM  . 
    Manoj Bhadiyadra
    153 · 1% · 315
  • Score

    Yes. You shouild have them in order to have data integrity and avoid issues in cube processing.

    Replied on Jul 14 2011 2:07AM  . 
    ATif-ullah Sheikh
    129 · 1% · 391
Previous 1 | 2 | 3 Next

Your Answer


Sign Up or Login to post an answer.
Please note that your answer will not be considered as part of the contest because the competition is over. You can still post an answer to share your knowledge with the community.

Copyright © Rivera Informatic Private Ltd.