![]() | |
#161
| |||
| |||
|
|
"Ed Prochak" <edproc... (AT) gmail (DOT) com> wrote in message news:4b1507c9-207e-4555-bd27-42dd0e84de9d (AT) 59g2000hsb (DOT) googlegroups.com... On Jun 26, 12:31 pm, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote: [] I can't tell you the number of times programmers have come to me for help with a query that produces wrong results, and when I try to change their "select" to "select distinct" to see if maybe that's the problem, they exclaim "Oh I never use 'select distinct'! It runs too slow!" I patiently explained to them that I first want to come up with a query that is logically correct, then if necessary, come up with one that's logically equivalent, but runs fast. When I see SELECT DISTINCT in anything but an ad hoc query, I know something is wrong with the query. Possibly it is joining to a wrong table (to a detail table instead of a header/parent table for example) or some filtering condition is missed (a column not in the select list needs to be used), or some join condition was missed (there is a compound key and one component was left out). Even in an ad hoc query I tend to use SELECT COUNT(*) since that gives me a little bit more information for the same cost as DISTINCT (i.e. it reads the same amount of data). Seeing this post, I'm actually a little disappointed in you, David. Your last sentence does show you mainly follow a good approach. It's just that DISTINCT should be left to ad hoc queries IMO. Have a good day. Ed Your comment doesn't agree with my experience. In general, there are two circumstances where SELECT DISTINCT is the right approach. Both involve situations where SELECT generates a bag, but the desired result is a set. The first is a situation where the database has been misdesigned. If the database has been frozen for a year or more, and there is a lot of production software that depends on the existing table design, it may simply be unfeasable to correct the design in order to make one query logically easy to write. |
|
The second is a situation where the query requires a subset of the data that does not include any candidate keys from the underlying tables. In that case, the generation of a query requires its own mechanism for eliminating duplicates. |
|
Perhaps the development of a new program, a long time after the database is in production, and not easily altered, has much the same flavor of what you have called an "ad hoc query". Using the database to organize data in a way that the designer did not contemplate. |
#162
| |||
| |||
|
|
"Ed Prochak" <edproc... (AT) gmail (DOT) com> wrote in message news:4b1507c9-207e-4555-bd27-42dd0e84de9d (AT) 59g2000hsb (DOT) googlegroups.com... On Jun 26, 12:31 pm, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote: [] I can't tell you the number of times programmers have come to me for help with a query that produces wrong results, and when I try to change their "select" to "select distinct" to see if maybe that's the problem, they exclaim "Oh I never use 'select distinct'! It runs too slow!" I patiently explained to them that I first want to come up with a query that is logically correct, then if necessary, come up with one that's logically equivalent, but runs fast. When I see SELECT DISTINCT in anything but an ad hoc query, I know something is wrong with the query. Possibly it is joining to a wrong table (to a detail table instead of a header/parent table for example) or some filtering condition is missed (a column not in the select list needs to be used), or some join condition was missed (there is a compound key and one component was left out). Even in an ad hoc query I tend to use SELECT COUNT(*) since that gives me a little bit more information for the same cost as DISTINCT (i.e. it reads the same amount of data). Seeing this post, I'm actually a little disappointed in you, David. Your last sentence does show you mainly follow a good approach. It's just that DISTINCT should be left to ad hoc queries IMO. Have a good day. Ed Your comment doesn't agree with my experience. In general, there are two circumstances where SELECT DISTINCT is the right approach. Both involve situations where SELECT generates a bag, but the desired result is a set. The first is a situation where the database has been misdesigned. If the database has been frozen for a year or more, and there is a lot of production software that depends on the existing table design, it may simply be unfeasable to correct the design in order to make one query logically easy to write. |
|
The second is a situation where the query requires a subset of the data that does not include any candidate keys from the underlying tables. In that case, the generation of a query requires its own mechanism for eliminating duplicates. |
|
Perhaps the development of a new program, a long time after the database is in production, and not easily altered, has much the same flavor of what you have called an "ad hoc query". Using the database to organize data in a way that the designer did not contemplate. |
#163
| |||
| |||
|
|
"Ed Prochak" <edproc... (AT) gmail (DOT) com> wrote in message news:4b1507c9-207e-4555-bd27-42dd0e84de9d (AT) 59g2000hsb (DOT) googlegroups.com... On Jun 26, 12:31 pm, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote: [] I can't tell you the number of times programmers have come to me for help with a query that produces wrong results, and when I try to change their "select" to "select distinct" to see if maybe that's the problem, they exclaim "Oh I never use 'select distinct'! It runs too slow!" I patiently explained to them that I first want to come up with a query that is logically correct, then if necessary, come up with one that's logically equivalent, but runs fast. When I see SELECT DISTINCT in anything but an ad hoc query, I know something is wrong with the query. Possibly it is joining to a wrong table (to a detail table instead of a header/parent table for example) or some filtering condition is missed (a column not in the select list needs to be used), or some join condition was missed (there is a compound key and one component was left out). Even in an ad hoc query I tend to use SELECT COUNT(*) since that gives me a little bit more information for the same cost as DISTINCT (i.e. it reads the same amount of data). Seeing this post, I'm actually a little disappointed in you, David. Your last sentence does show you mainly follow a good approach. It's just that DISTINCT should be left to ad hoc queries IMO. Have a good day. Ed Your comment doesn't agree with my experience. In general, there are two circumstances where SELECT DISTINCT is the right approach. Both involve situations where SELECT generates a bag, but the desired result is a set. The first is a situation where the database has been misdesigned. If the database has been frozen for a year or more, and there is a lot of production software that depends on the existing table design, it may simply be unfeasable to correct the design in order to make one query logically easy to write. |
|
The second is a situation where the query requires a subset of the data that does not include any candidate keys from the underlying tables. In that case, the generation of a query requires its own mechanism for eliminating duplicates. |
|
Perhaps the development of a new program, a long time after the database is in production, and not easily altered, has much the same flavor of what you have called an "ad hoc query". Using the database to organize data in a way that the designer did not contemplate. |
#164
| |||
| |||
|
|
I can't help wishing that the optimizer could figure it out. But that's probably way down on the list of things an optimizer should be able to do. |
#165
| |||
| |||
|
|
I can't help wishing that the optimizer could figure it out. But that's probably way down on the list of things an optimizer should be able to do. |
#166
| |||
| |||
|
|
I can't help wishing that the optimizer could figure it out. But that's probably way down on the list of things an optimizer should be able to do. |
![]() |
| Thread Tools | |
| Display Modes | |
| |