![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||||
| |||||
|
|
I'm using FM server 7. I have a table that has 25,000 records in it, and I need to make related records in a join table. The two tables are related by a two-field match (ID and Date). I've come up with several ways to do it, but none of them work well, and I'm looking for hints on how to do it better. For purposes of discussion, assume the two tables are named "A" and "B". Also, table A has a many to one relationship with table B (i.e. multiple records in A for the same ID/Date combo), but I only want a single related record in table B. Method 1: Looping set field. Find all records in A with a child in B. Omit these, leaving only those w/o children. You have to do it this way because you can't find for "=" (blank) related records, which is annoying but another discussion. |
|
Loop through all records in table A, setting the related field in table B. Pros : works, and by design won't ever make multiple children in table B. Clean, as it doesn't require any calculated fields in either table. Cons: Slow. Very slow. Terribly slow! On a 100 base-T network, it runs at about 8 records per second, even when I freeze the window. There is also no progress bar to give an indication whether it's 1% done or 99% done. |
|
Method 2: Import Find for non-matching records as above. Import into Table B. Pros: Fast -- 10 x as fast as method 1. Progress bar shows how long until it's done. Cons: Doesn't handle the many-to-one relationship. It creates multiple child records in table B. Method 3: Import unique values only. Same as method 2, except we have a calculated field in table A Unique = If(Self::SerialNumber = SerialNumber,1,0), and we limit the records to only those unique ones before the import. Pros: The import is fast. Cons: Requires another relation, and the calculation of the Unique field is slow. |
|
So, in summary, an operation which should be fairly straight forward turns into a bit of a nightmare. You can do it the "right" way which is godawful slow and has no progress bar, or you can do it the hack way which requires you to add extra fields and relations to one of the tables. |
|
Ideas? |
There are a few ways of filtering out duplicates. Your
While it might seem that such a looper should be
#3
| ||||
| ||||
|
|
In article <1h0xquc.2fhjjmysvz40N%md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid>, md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid says... I'm using FM server 7. I have a table that has 25,000 records in it, and I need to make related records in a join table. The two tables are related by a two-field match (ID and Date). I've come up with several ways to do it, but none of them work well, and I'm looking for hints on how to do it better. For purposes of discussion, assume the two tables are named "A" and "B". Also, table A has a many to one relationship with table B (i.e. multiple records in A for the same ID/Date combo), but I only want a single related record in table B. Method 1: Looping set field. Find all records in A with a child in B. Omit these, leaving only those w/o children. You have to do it this way because you can't find for "=" (blank) related records, which is annoying but another discussion. You are searching for records with child records with "nothing" in a particular field, that's fine and works perfectly. But what you want is it to find records with no children. Anyhow your solution of finding records with something in a child records field and then omitting is of course the right way, but there is nothing illogical or annoying about it. |
|
Loop through all records in table A, setting the related field in table B. Pros : works, and by design won't ever make multiple children in table B. Clean, as it doesn't require any calculated fields in either table. Cons: Slow. Very slow. Terribly slow! On a 100 base-T network, it runs at about 8 records per second, even when I freeze the window. There is also no progress bar to give an indication whether it's 1% done or 99% done. Adding a 'progress bar' or other progress indicator of your own that updates each time a percent, or 10th percent or whatever is completed is not terribly difficult to do. |
|
Method 2: Import Find for non-matching records as above. Import into Table B. Pros: Fast -- 10 x as fast as method 1. Progress bar shows how long until it's done. Cons: Doesn't handle the many-to-one relationship. It creates multiple child records in table B. Method 3: Import unique values only. Same as method 2, except we have a calculated field in table A Unique = If(Self::SerialNumber = SerialNumber,1,0), and we limit the records to only those unique ones before the import. Pros: The import is fast. Cons: Requires another relation, and the calculation of the Unique field is slow. snipped Methods 4 & 5: "solutions" that don't work, and thus by definiation aren't actually solutions So, in summary, an operation which should be fairly straight forward turns into a bit of a nightmare. You can do it the "right" way which is godawful slow and has no progress bar, or you can do it the hack way which requires you to add extra fields and relations to one of the tables. Having to add extra fields and relations to meet a requirement of the database is not "hacking". Indeed it would be quite difficult to meet *any* requirement of a database without adding fields or relationships. While adding fields tends to denormalize a database, the objectives of satisfactory performance and perfection of form are always at odds with eachother. Striking a balance is the developers job. Ideas? Why do you need this? From experience its pretty unusual to need to do this with any sort of regularity. Perhaps if you let us know what problem you are actually trying to solve we can suggest a better approach.. However, if you really do need this, after doing this one time fix, why not just generate the related records in B on the fly, at the same time the parent record is created, where the 1/8th of a second overhead isn't going to really bother anyone interactively adding records. Failing that, if you must do it as a batch... the optimal solution will depend on the typical characteristics of the dataset. How many records in A are being dealt with? 25,000 you said. Actually that is the entire dataset, you already said you were only working with a part of it!! How large a part? (For the rest of my answer I'll just run with 25,000.) And of course how many records in B does this usually transform to? Obviously its less than 25,000... but that's quite a range: 150? 12,000? 24,400? |
|
If the number in B approaches A (25,000) (thus the duplicates are actually quite rare), then your 'method 2' makes the most sense. Let the duplicate records be generated in B, then find them, and clean them up. Its not terribly hard and the performance overhead of the cleanup would very quite minor -- provided there were only few many duplicates. Your method 3 is another reasonable approach as it actually works correctly. There are a few ways of filtering out duplicates. Yourself-join relationship is one. But a simple sort and loop (omitting duplicates) can also be highly effective, without 'perverting your perfect database' While it might seem that such a looper should bethe same speed as method 1 especially as you added a sort on top but this often isn't the case. Not too suprisingly database 25,000 updates that trigger related records to be created automatically are slower than 25,000 simple "is this a duplicate of the previous record?" and an indeterminate but <<25,000 requests to "omit record from found set". If the number of duplicates is VERY high, and thus the number of records in B is very low then another approach you might take would be to look at using an intermediate file... provided you sort by the foreign key field, you have the option to export by summary, which would export one record for each unique value. The trouble is that its available only on exports, so you'd have to import it back into B from the intermediate table after exporting it. (AFAIK You can't export directly into another *existing* FM file/table, at least you couldn't in 6 down, and I don't beleive that its changed for 7). HTH & good luck -Dave |
#4
| |||||
| |||||
|
|
42 <nospam (AT) nospam (DOT) com> wrote: In article <1h0xquc.2fhjjmysvz40N%md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid>, md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid says... I'm using FM server 7. I have a table that has 25,000 records in it, and I need to make related records in a join table. The two tables are related by a two-field match (ID and Date). I've come up with several ways to do it, but none of them work well, and I'm looking for hints on how to do it better. For purposes of discussion, assume the two tables are named "A" and "B". Also, table A has a many to one relationship with table B (i.e. multiple records in A for the same ID/Date combo), but I only want a single related record in table B. Method 1: Looping set field. Find all records in A with a child in B. Omit these, leaving only those w/o children. You have to do it this way because you can't find for "=" (blank) related records, which is annoying but another discussion. You are searching for records with child records with "nothing" in a particular field, that's fine and works perfectly. But what you want is it to find records with no children. Anyhow your solution of finding records with something in a child records field and then omitting is of course the right way, but there is nothing illogical or annoying about it. To be clear -- I think that the fact that finding for "=" in a related table returns no matches, whereas finding for "=" in a local table returns reocords with blanks, is inconsistent. |
.
|
Perhaps we need a "no-related record" symbol which could be used differently than the "=" symbol. |
|
More info. I determined why the loop is so slow on my system -- it turns out I had forgotton that Table B has a bunch of complex auto-enter calcs (that sum across related tables). So a simple "Set field" from table A to B is actually is triggering a bunch of sub-calculations when the record is created (at a time when I don't want the fields updated). |
|
In this case, I found the solution was to add a "Unique" field to table A, use a self-relation to ferret out the distinct values, and then only import the distinct values. The import ended up being much faster, because you can de-select the "Allow auto-enter options etc." checkbox, preventing all the auto-enter calcs. So, in summary, I think I may have unfairly maligned FileMaker Server 7 -- it's not as slow as I was claiming. |
|
Don't, however, ask me about the speed of deleting records... grr. |
#5
| |||
| |||
|
|
In article <1h137mn.6tnsql1807hoyN%md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid>, md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid says... 42 <nospam (AT) nospam (DOT) com> wrote: In article <1h0xquc.2fhjjmysvz40N%md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid>, md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid says... I'm using FM server 7. I have a table that has 25,000 records in it, and I need to make related records in a join table. The two tables are related by a two-field match (ID and Date). I've come up with several ways to do it, but none of them work well, and I'm looking for hints on how to do it better. For purposes of discussion, assume the two tables are named "A" and "B". Also, table A has a many to one relationship with table B (i.e. multiple records in A for the same ID/Date combo), but I only want a single related record in table B. Method 1: Looping set field. Find all records in A with a child in B. Omit these, leaving only those w/o children. You have to do it this way because you can't find for "=" (blank) related records, which is annoying but another discussion. You are searching for records with child records with "nothing" in a particular field, that's fine and works perfectly. But what you want is it to find records with no children. Anyhow your solution of finding records with something in a child records field and then omitting is of course the right way, but there is nothing illogical or annoying about it. To be clear -- I think that the fact that finding for "=" in a related table returns no matches, whereas finding for "=" in a local table returns reocords with blanks, is inconsistent. I disagree. You can use "=" in a related table and it *will* find related records that are blank in the specified field. That is consistent with local searches. It simply won't find records that don't exist ... which is technically (and trivially) consistent with local searches too .Essentially "=" doesn't work because if the related record doesn't exist its fields have no values, including "blank". A good example in the physical world is binders and pages. If you search for all your binders that contain blank pages, that's quite a different found set than searching for your empty binders. Indeed they would be completely different sets of binders. ![]() As an aside SQL is no different... This would be a simple local field search: select * from a where afield="" and returns records where afield is blank...and this bit of SQL is equivalent to finding an "=" on a related child record: select * from a,b where a.key = b.key and b.afield="" and like FM it will return each a record that has a child b with a blank afield... but WON'T return records from "a" that have no related b record at all. |
|
Perhaps we need a "no-related record" symbol which could be used differently than the "=" symbol. That would be quite handy... in any database. SQL doesnt make that search trivial either, unless I've forgotten something. |
|
More info. I determined why the loop is so slow on my system -- it turns out I had forgotton that Table B has a bunch of complex auto-enter calcs (that sum across related tables). So a simple "Set field" from table A to B is actually is triggering a bunch of sub-calculations when the record is created (at a time when I don't want the fields updated). Ah. That makes sense. In this case, I found the solution was to add a "Unique" field to table A, use a self-relation to ferret out the distinct values, and then only import the distinct values. The import ended up being much faster, because you can de-select the "Allow auto-enter options etc." checkbox, preventing all the auto-enter calcs. So, in summary, I think I may have unfairly maligned FileMaker Server 7 -- it's not as slow as I was claiming. Yeah, sounds like that would have been slow in any system. Don't, however, ask me about the speed of deleting records... grr. The number of indexed fields *heavily* impacts this. |
#6
| |||
| |||
|
|
42 <nospam (AT) nospam (DOT) com> wrote: In article <1h137mn.6tnsql1807hoyN%md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid>, md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid says... 42 <nospam (AT) nospam (DOT) com> wrote: In article <1h0xquc.2fhjjmysvz40N%md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid>, md03NOSPAM (AT) xochiNOSPAM (DOT) com.invalid says... I'm using FM server 7. I have a table that has 25,000 records in it, and I need to make related records in a join table. The two tables are related by a two-field match (ID and Date). I've come up with several ways to do it, but none of them work well, and I'm looking for hints on how to do it better. For purposes of discussion, assume the two tables are named "A" and "B". Also, table A has a many to one relationship with table B (i.e. multiple records in A for the same ID/Date combo), but I only want a single related record in table B. Method 1: Looping set field. Find all records in A with a child in B. Omit these, leaving only those w/o children. You have to do it this way because you can't find for "=" (blank) related records, which is annoying but another discussion. You are searching for records with child records with "nothing" in a particular field, that's fine and works perfectly. But what you want is it to find records with no children. Anyhow your solution of finding records with something in a child records field and then omitting is of course the right way, but there is nothing illogical or annoying about it. To be clear -- I think that the fact that finding for "=" in a related table returns no matches, whereas finding for "=" in a local table returns reocords with blanks, is inconsistent. I disagree. You can use "=" in a related table and it *will* find related records that are blank in the specified field. That is consistent with local searches. It simply won't find records that don't exist ... which is technically (and trivially) consistent with local searches too .Essentially "=" doesn't work because if the related record doesn't exist its fields have no values, including "blank". A good example in the physical world is binders and pages. If you search for all your binders that contain blank pages, that's quite a different found set than searching for your empty binders. Indeed they would be completely different sets of binders. ![]() As an aside SQL is no different... This would be a simple local field search: select * from a where afield="" and returns records where afield is blank...and this bit of SQL is equivalent to finding an "=" on a related child record: select * from a,b where a.key = b.key and b.afield="" and like FM it will return each a record that has a child b with a blank afield... but WON'T return records from "a" that have no related b record at all. I think to have "=" find blank or missing records in B, you would have to make a Caclulated field in A that is equal to the field in B. I'm pretty sure then it would work as I want it to. |
|
Perhaps we need a "no-related record" symbol which could be used differently than the "=" symbol. That would be quite handy... in any database. SQL doesnt make that search trivial either, unless I've forgotten something. Under SQL, I think you'd have to do a Right Outer join to get it to work, though I wouldn't be surprised if the behavior of finding nulls in such a scenario is manufacturer-specific. |

|
More info. I determined why the loop is so slow on my system -- it turns out I had forgotton that Table B has a bunch of complex auto-enter calcs (that sum across related tables). So a simple "Set field" from table A to B is actually is triggering a bunch of sub-calculations when the record is created (at a time when I don't want the fields updated). Ah. That makes sense. In this case, I found the solution was to add a "Unique" field to table A, use a self-relation to ferret out the distinct values, and then only import the distinct values. The import ended up being much faster, because you can de-select the "Allow auto-enter options etc." checkbox, preventing all the auto-enter calcs. So, in summary, I think I may have unfairly maligned FileMaker Server 7 -- it's not as slow as I was claiming. Yeah, sounds like that would have been slow in any system. Don't, however, ask me about the speed of deleting records... grr. The number of indexed fields *heavily* impacts this. Based on the poor performance, it appears to me that FM 7 Server updates the table index upon each record change. This makes imports and mass-deletions heinosously slow. I've got some tables where FM Server deletes at most 50-100 records per second when you attempt to delete all records in a table. I really wish they would add the ability to turn off reindexing manually via script, and automatically in some obvious cases (e.g. delete all records in a table). |
#7
| |||
| |||
|
|
I've got some tables where FM Server deletes at most 50-100 records per second |
![]() |
| Thread Tools | |
| Display Modes | |
| |