![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
2) Break up the monolithic table(s) in to many tables with a smaller numbers of columns, where the columns in each table are more 'tightly related' in usage then simply being related to the same primary key, and then do a single SQL UPDATE for all columns that could have changed, even if they haven't all changed. This way it's more likely that no columns for a particular table require updating, and thus an update to that particular table would not be required. |
#3
| |||
| |||
|
|
With mainframe indexed files (VSAM) and DL/I (IMS) databases segments, which are somewhat analogous to rows in a table, are pretty much just, umm, large chunks of data. If you want to replace a segment with new values for one or more fields you must replace the entire segment. Obviously this is not the case for a relational database. Currently we have some DL/I databases with segments that contain over 400 individual 'fields'. When migrating to DB2, I'm guessing that having an SQL UPDATE with SET column-name = :host_var 400+ times (one for each column) is probably not something that one would want to do. What are the alternatives? Here's what I can think of, though none of them seem to me to be ideal... 1) Update one column at a time, and only if it has changed. EG: EXEC SQL SELECT column1, column2, column3, -- etc. INTO :column1, :column2, :column3 -- etc. FROM A_TABLE WHERE primary_key = rimary-keyIF column1_new not = column1 EXEC SQL UPDATE A_TABLE SET column1 = :column1_new END-EXEC END-IF IF column2_new not = column2 EXEC SQL UPDATE A_TABLE SET column2 = :column2_new END-EXEC END-IF 2) Break up the monolithic table(s) in to many tables with a smaller numbers of columns, where the columns in each table are more 'tightly related' in usage then simply being related to the same primary key, and then do a single SQL UPDATE for all columns that could have changed, even if they haven't all changed. This way it's more likely that no columns for a particular table require updating, and thus an update to that particular table would not be required. 3) Can't think of a number three. I'm guessing that number 2 is more likely the way to go, but I'm really looking for that 'perfect world' third alternative. Thanks, Frank |
#4
| |||
| |||
|
|
On 8/14/2007 at 9:29 AM, in message 1187105352.766137.141000 (AT) i38g20...oglegroups.com>, Ed |
|
On Aug 13, 7:57 pm, "Frank Swarbrick" <Frank.Swarbr... (AT) efirstbank (DOT) com wrote: With mainframe indexed files (VSAM) and DL/I (IMS) databases segments, which are somewhat analogous to rows in a table, are pretty much just, umm, large chunks of data. If you want to replace a segment with new values for one or more fields you must replace the entire segment. Obviously this is not the case for a relational database. Currently we have some DL/I databases with segments that contain over 400 individual 'fields'. When migrating to DB2, I'm guessing that having an SQL UPDATE with SET column-name = :host_var 400+ times (one for each column) is probably not something that one would want to do. What are the alternatives? Here's what I can think of, though none of them seem to me to be ideal... 1) Update one column at a time, and only if it has changed. EG: EXEC SQL SELECT column1, column2, column3, -- etc. INTO :column1, :column2, :column3 -- etc. FROM A_TABLE WHERE primary_key = rimary-keyIF column1_new not = column1 EXEC SQL UPDATE A_TABLE SET column1 = :column1_new END-EXEC END-IF IF column2_new not = column2 EXEC SQL UPDATE A_TABLE SET column2 = :column2_new END-EXEC END-IF 2) Break up the monolithic table(s) in to many tables with a smaller numbers of columns, where the columns in each table are more 'tightly related' in usage then simply being related to the same primary key, and then do a single SQL UPDATE for all columns that could have changed, even if they haven't all changed. This way it's more likely that no columns for a particular table require updating, and thus an update to that particular table would not be required. 3) Can't think of a number three. I'm guessing that number 2 is more likely the way to go, but I'm really looking for that 'perfect world' third alternative. Thanks, Frank David gave you some good advice. I'll just suggest you get some help from an experienced Relational Db designer. Meanwhile as you experiment, you might look at your data for some common things such as: several fields all containing the same data. For example do you have several phone number fields? daytime_phone, fax_number, nighttime_phone, dialin_number, ... Those should go into a table of their own and become multiple rows in that table. TABLE PHONE owner phone_number type /* FAX, HOME, WORK, MODEM, CELL */ the Primary key would be the combined (owner, phone_number). I just thought a specific example might help you right away. |
#5
| |||
| |||
|
|
On 8/14/2007 at 9:29 AM, in message 1187105352.766137.141000 (AT) i38g20...oglegroups.com>, Ed Prochak<edprochak (AT) gmail (DOT) com> wrote: On Aug 13, 7:57 pm, "Frank Swarbrick" <Frank.Swarbr... (AT) efirstbank (DOT) com wrote: With mainframe indexed files (VSAM) and DL/I (IMS) databases segments, which are somewhat analogous to rows in a table, are pretty much just, umm, large chunks of data. If you want to replace a segment with new values for one or more fields you must replace the entire segment. Obviously this is not the case for a relational database. Currently we have some DL/I databases with segments that contain over 400 individual 'fields'. When migrating to DB2, I'm guessing that having an SQL UPDATE with SET column-name = :host_var 400+ times (one for each column) is probably not something that one would want to do. What are the alternatives? Here's what I can think of, though none of them seem to me to be ideal... 1) Update one column at a time, and only if it has changed. EG: EXEC SQL SELECT column1, column2, column3, -- etc. INTO :column1, :column2, :column3 -- etc. FROM A_TABLE WHERE primary_key = rimary-keyIF column1_new not = column1 EXEC SQL UPDATE A_TABLE SET column1 = :column1_new END-EXEC END-IF IF column2_new not = column2 EXEC SQL UPDATE A_TABLE SET column2 = :column2_new END-EXEC END-IF 2) Break up the monolithic table(s) in to many tables with a smaller numbers of columns, where the columns in each table are more 'tightly related' in usage then simply being related to the same primary key, and then do a single SQL UPDATE for all columns that could have changed, even if they haven't all changed. This way it's more likely that no columns for a particular table require updating, and thus an update to that particular table would not be required. 3) Can't think of a number three. I'm guessing that number 2 is more likely the way to go, but I'm really looking for that 'perfect world' third alternative. Thanks, Frank David gave you some good advice. I'll just suggest you get some help from an experienced Relational Db designer. Meanwhile as you experiment, you might look at your data for some common things such as: several fields all containing the same data. For example do you have several phone number fields? daytime_phone, fax_number, nighttime_phone, dialin_number, ... Those should go into a table of their own and become multiple rows in that table. TABLE PHONE owner phone_number type /* FAX, HOME, WORK, MODEM, CELL */ the Primary key would be the combined (owner, phone_number). I just thought a specific example might help you right away. Thanks to you and David for your thoughts. I don't think, however, that our current structure is as 'denormalized' as you may think. There are simply *a lot* of individual attributes that are related, in this case, to an account. For instance there are fields that keep track of the numbers (counts and amounts) of different types of transactions, both since the last account statement, and during the last account statement. |
|
I work better with real world examples, so let me give you some field names: D-MSTR-ACH-DEBITS-NBR D-MSTR-ACH-CREDITS-NBR D-MSTR-WIRES-IN-NBR D-MSTR-WIRES-PCB-NBR D-MSTR-WIRES-PHN-NBR D-MSTR-WIRES-FGN-NBR D-MSTR-STOP-PMTS-NBR D-MSTR-LAST-ACH-DEBITS-NBR D-MSTR-LAST-ACH-DEBITS-AMT D-MSTR-LAST-ACH-CREDITS-NBR D-MSTR-LAST-ACH-CREDITS-AMT D-MSTR-LAST-WIRES-IN-NBR D-MSTR-LAST-WIRES-IN-AMT D-MSTR-LAST-WIRES-PCB-NBR D-MSTR-LAST-WIRES-PCB-AMT D-MSTR-LAST-WIRES-PHN-NBR D-MSTR-LAST-WIRES-PHN-AMT D-MSTR-LAST-WIRES-FGN-NBR D-MSTR-LAST-WIRES-FGN-AMT D-MSTR-LAST-STOP-PMTS-NBR D-MSTR-LAST-STOP-PMTS-AMT What happens here is, for example, as we're processing transactions for a day, for each ACH transaction we'll add 1 to D-MSTR-ACH-DEBITS-NBR or D-MSTR-ACH-CREDITS-NBR, depending on whether or not it's a debit or a credit. If it is a wire we'll add one to one of the four wire fields, etc. When the statement 'cycles' we'll take the first set of fields, multiply each one by the fee amount for that transaction type, and store the result in the related "LAST" AMT field, moving the NBR field to the related "LAST" NBR field. |
|
Now it probably would be true to say that, rather than storing these fields in and of themselves we could instead do some sort of transaction aggregation at statement cycle time in order to calculate the proper values. I'm sure we could. Should we? Hmm, that's another matter. The "LAST" fields are used not only at the time we create the account statement, but also in online transactions for the rest of the month where we may want to let a customer know exactly what counts and amounts made up his service charge last month. Is it better to recalculate this every time or simply to store the calculated results in a table? In the end, even if we do the latter, it is obvious to me that the "LAST" cycle fields belong in their own table, and not in the "account master" table, and that an UPDATE to this table would be done only if the account has cycled that day. The "current" cycle fields would belong in their own table as well (distinct from the LAST cycle table), but would be updated daily. Still, even doing so I think we would still be left with a 'large' number of fields that are simply "account attributes". For example, account type, service charge code, account open date, initials of officer assigned to account, tax withholding flag, current interest rate, current balance, account status, etc. Taking a brief look at it, though, does make me realize that at least two-thirds of the fields are what might be called "aggregation" fields. Not fields that would exist in and of themselves in other tables, but fields that could be calculated using information in other tables. I guess what you're getting at, though, is there may not actually be a need for all of these different columns at all, in that some may already exist in other tables, while others could simply be 'calculated' when needed. Would this be true to say? Still, in the end, I think my question still stands. Take, for example, an 'account status' field. During nightly batch processing we look at each account, and if the account has been at a zero balance with no transactions for 40 days we set the account status to 'closed'. 95% of the accounts would not have their status changed. So do we update the account status field for every account, possibly as part of a larger UPDATE statement (which might include the 'current balance', or do we do it as a separate UPDATE statement, and only if the value has actually been changed? To go back to your original thing about the phone numbers, I will say that we *don't* duplicate such things at an account level. Each account is related to one or more customer records, and that's where address information, etc. would be retrieved from. So at least its somewhat normalized at that level. Not quite down to a phone number table, though. Though that's not a bad idea, since we just recently added a 'cell phone' field to our customer master segment, in addition to the existing 'work phone' and 'home phone' fields. Anyway, I will definitely utilize the expertise of our (hopefully!) database experts here. But I'm trying to figure some things out on my own, so I can at least have some knowledge to start with. Thanks again, Frank |
#6
| |||
| |||
|
|
With mainframe indexed files (VSAM) and DL/I (IMS) databases segments, which are somewhat analogous to rows in a table, are pretty much just, umm, large chunks of data. If you want to replace a segment with new values for one or more fields you must replace the entire segment. Obviously this is not the case for a relational database. Currently we have some DL/I databases with segments that contain over 400 individual 'fields'. When migrating to DB2, I'm guessing that having an SQL UPDATE with SET column-name = :host_var 400+ times (one for each column) is probably not something that one would want to do. What are the alternatives? Here's what I can think of, though none of them seem to me to be ideal... 1) Update one column at a time, and only if it has changed. EG: EXEC SQL SELECT column1, column2, column3, -- etc. INTO :column1, :column2, :column3 -- etc. FROM A_TABLE WHERE primary_key = rimary-keyIF column1_new not = column1 EXEC SQL UPDATE A_TABLE SET column1 = :column1_new END-EXEC END-IF IF column2_new not = column2 EXEC SQL UPDATE A_TABLE SET column2 = :column2_new END-EXEC END-IF |
rimary-key
rimary-key
#7
| |||
| |||
|
|
On 8/14/2007 at 12:45 PM, in message f9st7o$7ot$1 (AT) registered (DOT) motzarella.org>, |
|
Frank Swarbrick wrote: With mainframe indexed files (VSAM) and DL/I (IMS) databases segments, which are somewhat analogous to rows in a table, are pretty much just, umm, large chunks of data. If you want to replace a segment with new values for one or more fields you must replace the entire segment. Obviously this is not the case for a relational database. Currently we have some DL/I databases with segments that contain over 400 individual 'fields'. When migrating to DB2, I'm guessing that having an SQL UPDATE with SET column-name = :host_var 400+ times (one for each column) is probably not something that one would want to do. What are the alternatives? Here's what I can think of, though none of them seem to me to be ideal... 1) Update one column at a time, and only if it has changed. EG: EXEC SQL SELECT column1, column2, column3, -- etc. INTO :column1, :column2, :column3 -- etc. FROM A_TABLE WHERE primary_key = rimary-keyIF column1_new not = column1 EXEC SQL UPDATE A_TABLE SET column1 = :column1_new END-EXEC END-IF IF column2_new not = column2 EXEC SQL UPDATE A_TABLE SET column2 = :column2_new END-EXEC END-IF I dont think that I understand this example. But what would be the effect of: EXEC SQL SELECT column1, column2, column3, -- etc. INTO :column1, :column2, :column3 -- etc. FROM A_TABLE WHERE primary_key = rimary-keyEXEC SQL UPDATE A_TABLE SET column1 = :column1_new, column2 = :column2_new, ... END-EXEC END-EXEC or EXEC SQL SELECT column1, column2, column3, -- etc. INTO :column1, :column2, :column3 -- etc. FROM A_TABLE WHERE primary_key = rimary-keyEXEC SQL UPDATE A_TABLE SET (column1, column2, ...) = (:column1_new, :column2_new, ...) END-EXEC END-EXEC |
#8
| |||
| |||
|
|
I'm not sure what you mean. I don't think you can nest one SQL statement inside another. |
|
If your thinking is that the update would be done immediately after the select, that is not the case (it only looks like it! <g>). In reality there would be quite a bit of 'programmatic calculations', if you will to determine if the value of a column is to be changed. As it is now there is no "column1_new"; there's just "column1", and it would, prior to the update statement, contain (possibly) a value other than the value it currently contains in the database. The only reason I had the 'new' columns above is so I could, if necessary, compare the old value to the new value, and only update if it has changed. |
|
Frank |
![]() |
| Thread Tools | |
| Display Modes | |
| |