![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Performance-wise, how should referential integrity be enforced when dealing with a modest data warehouse? If millions of records are inserted into the database, daily, triggers don't seem to be the answer. Are constraints (i.e. foreign key constraints) the answer? I feel foreign key constraints should not break the boundaries of a database meaning tables in DB1 should not have any constraints on tables in DB2. Crossing DB boundaries doesn't make sense. |
#3
| |||
| |||
|
|
On 16-Oct-2011 16:30, Douglas Gregor wrote: Performance-wise, how should referential integrity be enforced when dealing with a modest data warehouse? If millions of records are inserted into the database, daily, triggers don't seem to be the answer. Are constraints (i.e. foreign key constraints) the answer? I feel foreign key constraints should not break the boundaries of a database meaning tables in DB1 should not have any constraints on tables in DB2. Crossing DB boundaries doesn't make sense. I would say that having FK constraints cross-DB boundary *can* make sense. It just depends on how your system has been designed. Apart from that, I think RI constraints would be a good way to go; trigegrs are probably not for this type of use case. You can also avoid usign constraints or triggers if you know that in most cases the RI will be fine, and just run a query after loading the data to verify that the RI is OK (and then delete/fix the rows that do not match). -- HTH, Rob V. ----------------------------------------------------------------- Rob Verschoor Certified Sybase Professional DBA for ASE 15.0/12.5/12.0/11.5/11.0 and Replication Server 15.0.1/12.5 // TeamSybase Author of Sybase books (order online atwww.sypron.nl/shop): "Tips, Tricks& *Recipes for Sybase ASE" (ASE 15 edition) "The Complete Sybase ASE Quick Reference Guide" "The Complete Sybase Replication Server Quick Reference Guide" r... (AT) NO (DOT) SPAM.sypron.nl |www.sypron.nl*| Twitter: @rob_verschoor Sypron B.V., The Netherlands *| *Chamber of Commerce 27138666 ----------------------------------------------------------------- |
#4
| |||
| |||
|
|
Performance-wise, how should referential integrity be enforced when dealing with a modest data warehouse? If millions of records are inserted into the database, daily, triggers don't seem to be the answer. Are constraints (i.e. foreign key constraints) the answer? |
|
I feel foreign key constraints should not break the boundaries of a database meaning tables in DB1 should not have any constraints on tables in DB2. Crossing DB boundaries doesn't make sense. |
#5
| |||
| |||
|
|
If a table in another database has a constraint on a table in another database, the data couldn't be deleted until the data is deleted in the other database; therefore, one has to find all the tentacle. Clean up could be a pain. |
#6
| |||
| |||
|
|
On Oct 17, 1:30*am, Douglas Gregor <1dg... (AT) gmail (DOT) com> wrote: Performance-wise, how should referential integrity be enforced when dealing with a modest data warehouse? If millions of records are inserted into the database, daily, triggers don't seem to be the answer. Are constraints (i.e. foreign key constraints) the answer? Yes, definitely. *Triggers are inappropriate for many reasons, definitely inappropriate for a DW. Declarative Referential Integrity is essential. *Both for REFERENCES and CHECK CONSTRAINTS. *Sybase does some clever things under the covers, that enhances performance of inserts/updates, at both the child and the parent level. It is far more preferable to prevent corrupt data from getting into the db, than to look for it and clean it up after it gets there. Sybase is set up for that. *Eg. *You can stream data into a table using "fast" bcp, which means dropping the indices and disabling the triggers on the table. *bcp honours DEFAULTS but unfortunately not RI CONSTRAINTS. *So the trick that most of us use is: 1 create a staging_db with truncate_log_on_chkpt, select_into/ bulk_copy/Parallel_sort db_ptions set 2 that allows you to *not* set those options on the main target_db, which remains fully recoverable 3 create a copy of the table in the staging_db, with no indices and triggers 4 fast bcp the data into the staging_db..table 5 execute DELETE WHERE to remove bad data (eg. that does not satisfy FK constraints) 7 insert target_db..table/select from source_db..table If you want even more speed, and you are willing to backup the target_db *after* loading, rather than at some arbitrary time: 6 set the select_into/bulk_copy/Parallel_sort db_option on target_db 7 select into target_db..table from staging_db..table 8 optional: drop/create clustered index to de-fragment the table and maintain speed 9 clear the select_into/bulk_copy/Parallel_sort db_option on target_db 10 dump target_db I feel foreign key constraints should not break the boundaries of a database meaning tables in DB1 should not have any constraints on tables in DB2. Crossing DB boundaries doesn't make sense. Well, it depends on how you have set them up, and what you are using them for. If you take the concept that "a db is a recoverable unit" and completely independent, strictly, then yes, you are correct. And it keeps life nice and simple. But most of us have mature systems and databases, and we can control the order of db recovery. *Having one database for trading, market info, and data feeds in/out of the sever, then having one database each for front office and back office, which have FK constraints REFERENCING the trading database, is normal. *We really do not want to duplicate the trading database; and front and back office have to be clinically separated; but they both source data from trading. *Sybase supports that beautifully. *On recovery, ensure you load trading_db first; then front_db or back_db. Regards Derek |
#7
| |||
| |||
|
|
On Oct 17, 1:30*am, Douglas Gregor <1dg... (AT) gmail (DOT) com> wrote: Performance-wise, how should referential integrity be enforced when dealing with a modest data warehouse? If millions of records are inserted into the database, daily, triggers don't seem to be the answer. Are constraints (i.e. foreign key constraints) the answer? Yes, definitely. *Triggers are inappropriate for many reasons, definitely inappropriate for a DW. |
|
Declarative Referential Integrity is essential. *Both for REFERENCES and CHECK CONSTRAINTS. *Sybase does some clever things under the covers, that enhances performance of inserts/updates, at both the child and the parent level. It is far more preferable to prevent corrupt data from getting into the db, than to look for it and clean it up after it gets there. Sybase is set up for that. *Eg. *You can stream data into a table using "fast" bcp, which means dropping the indices and disabling the triggers on the table. *bcp honours DEFAULTS but unfortunately not RI CONSTRAINTS. *So the trick that most of us use is: 1 create a staging_db with truncate_log_on_chkpt, select_into/ bulk_copy/Parallel_sort db_ptions set 2 that allows you to *not* set those options on the main target_db, which remains fully recoverable 3 create a copy of the table in the staging_db, with no indices and triggers 4 fast bcp the data into the staging_db..table 5 execute DELETE WHERE to remove bad data (eg. that does not satisfy FK constraints) 7 insert target_db..table/select from source_db..table If you want even more speed, and you are willing to backup the target_db *after* loading, rather than at some arbitrary time: 6 set the select_into/bulk_copy/Parallel_sort db_option on target_db 7 select into target_db..table from staging_db..table 8 optional: drop/create clustered index to de-fragment the table and maintain speed 9 clear the select_into/bulk_copy/Parallel_sort db_option on target_db 10 dump target_db I feel foreign key constraints should not break the boundaries of a database meaning tables in DB1 should not have any constraints on tables in DB2. Crossing DB boundaries doesn't make sense. Well, it depends on how you have set them up, and what you are using them for. If you take the concept that "a db is a recoverable unit" and completely independent, strictly, then yes, you are correct. And it keeps life nice and simple. But most of us have mature systems and databases, and we can control the order of db recovery. *Having one database for trading, market info, and data feeds in/out of the sever, then having one database each for front office and back office, which have FK constraints REFERENCING the trading database, is normal. *We really do not want to duplicate the trading database; and front and back office have to be clinically separated; but they both source data from trading. *Sybase supports that beautifully. *On recovery, ensure you load trading_db first; then front_db or back_db. Regards Derek |
#8
| |||
| |||
|
|
On Oct 16, 7:20*pm, Derek Asirvadem <derek.asirva... (AT) gmail (DOT) com> wrote: On Oct 17, 1:30*am, Douglas Gregor <1dg... (AT) gmail (DOT) com> wrote: Performance-wise, how should referential integrity be enforced when dealing with a modest data warehouse? If millions of records are inserted into the database, daily, triggers don't seem to be the answer. Are constraints (i.e. foreign key constraints) the answer? Yes, definitely. *Triggers are inappropriate for many reasons, definitely inappropriate for a DW. But REFERENCES and CHECK CONSTRAINTS are not wise if they span databases? Is there anyway to speed up the building of or dropping of an index on a very large table? Declarative Referential Integrity is essential. *Both for REFERENCES and CHECK CONSTRAINTS. *Sybase does some clever things under the covers, that enhances performance of inserts/updates, at both the child and the parent level. It is far more preferable to prevent corrupt data from getting into the db, than to look for it and clean it up after it gets there. Sybase is set up for that. *Eg. *You can stream data into a table using "fast" bcp, which means dropping the indices and disabling the triggers on the table. *bcp honours DEFAULTS but unfortunately not RI CONSTRAINTS. *So the trick that most of us use is: 1 create a staging_db with truncate_log_on_chkpt, select_into/ bulk_copy/Parallel_sort db_ptions set 2 that allows you to *not* set those options on the main target_db, which remains fully recoverable 3 create a copy of the table in the staging_db, with no indices and triggers 4 fast bcp the data into the staging_db..table 5 execute DELETE WHERE to remove bad data (eg. that does not satisfy FK constraints) 7 insert target_db..table/select from source_db..table If you want even more speed, and you are willing to backup the target_db *after* loading, rather than at some arbitrary time: 6 set the select_into/bulk_copy/Parallel_sort db_option on target_db 7 select into target_db..table from staging_db..table 8 optional: drop/create clustered index to de-fragment the table and maintain speed 9 clear the select_into/bulk_copy/Parallel_sort db_option on target_db 10 dump target_db I feel foreign key constraints should not break the boundaries of a database meaning tables in DB1 should not have any constraints on tables in DB2. Crossing DB boundaries doesn't make sense. Well, it depends on how you have set them up, and what you are using them for. If you take the concept that "a db is a recoverable unit" and completely independent, strictly, then yes, you are correct. And it keeps life nice and simple. But most of us have mature systems and databases, and we can control the order of db recovery. *Having one database for trading, market info, and data feeds in/out of the sever, then having one database each for front office and back office, which have FK constraints REFERENCING the trading database, is normal. *We really do not want to duplicate the trading database; and front and back office have to be clinically separated; but they both source data from trading. *Sybase supports that beautifully. *On recovery, ensure you load trading_db first; then front_db or back_db. Regards Derek- Hide quoted text - - Show quoted text - |
#9
| ||||||
| ||||||
|
|
Thank you for your reply filled with awesome information. |
|
Yes, definitely. *Triggers are inappropriate for many reasons, definitely inappropriate for a DW. But REFERENCES and CHECK CONSTRAINTS are not wise if they span databases? |
|
Is there anyway to speed up the building of or dropping of an index on a very large table? |
|
If I have the following columns" updated_by DEFAULT SUSER_NAME() updated_date DEFAULT GETDATE() created_by DEFAULT SUSER_NAME() created_date DEFAULT GETDATE() When the records are inserted the default will populate the columns. |
|
Then I can have an update trigger to populate the updated_by and updated_date. Is this a good idea? |
|
Will the defualt on the insert trigger the update trigger? |
#10
| |||
| |||
|
|
On Oct 18, 2:58*am, 1dg618 <1dg... (AT) gmail (DOT) com> wrote: Thank you for your reply filled with awesome information. My pleasure. Yes, definitely. *Triggers are inappropriate for many reasons, definitely inappropriate for a DW. But REFERENCES and CHECK CONSTRAINTS are not wise if they span databases? I have detailed the answer to that above: some times they are wise, other times unwise. *Is there anything specific that you find confusing ? Is there anyway to speed up the building of or dropping of an index on a very large table? (it is best that you open a new thread for each separate question.) Drop. *It is very fast and cannot be mode faster. Create Index. Yes. *Make sure you do the following (in descending order of returns): - use the WITH SORTED_DATA option - have enough sort buffers - have enough proc cache - that the index will fit into the data cache (default or named) Composite keys are no problem, but of course it helps if the columns in the index are (a) narrow and (b) fixed. If I have the following columns" updated_by * DEFAULT SUSER_NAME() updated_date DEFAULT GETDATE() created_by * DEFAULT SUSER_NAME() created_date DEFAULT GETDATE() When the records are inserted the default will populate the columns. If they are not supplied in the INSERT Then I can have an update trigger to populate the updated_by and updated_date. Is this a good idea? Yes, it is the common method. An even better method. Those of us who do not use triggers, ensure that (a) the udpate is transactional (b) set updated_by and updated_date in the transaction. Will the defualt on the insert trigger the update trigger? No. The insert trigger (if there is one) is executed on insert. Separately, the update trigger (if there is one) is executed on update of any column. The default does not come into play. Regards Derek |
![]() |
| Thread Tools | |
| Display Modes | |
| |