dbTalk Databases Forums  

"code" tables?

comp.databases comp.databases


Discuss "code" tables? in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #161  
Old   
Ed Prochak
 
Posts: n/a

Default Re: "code" tables? - 06-30-2008 , 07:22 AM






On Jun 30, 7:33 am, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote:
Quote:
"Ed Prochak" <edproc... (AT) gmail (DOT) com> wrote in message

news:4b1507c9-207e-4555-bd27-42dd0e84de9d (AT) 59g2000hsb (DOT) googlegroups.com...



On Jun 26, 12:31 pm, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote:
[]
I can't tell you the number of times programmers have come to me for
help
with a query
that produces wrong results, and when I try to change their "select" to
"select distinct" to see if maybe that's the problem, they exclaim "Oh
I
never use 'select distinct'! It runs too slow!"
I patiently explained to them that I first want to come up with a query
that
is logically correct, then if necessary, come up with one that's
logically
equivalent, but runs fast.

When I see
SELECT DISTINCT
in anything but an ad hoc query, I know something is wrong with the
query. Possibly it is joining to a wrong table (to a detail table
instead of a header/parent table for example) or some filtering
condition is missed (a column not in the select list needs to be
used), or some join condition was missed (there is a compound key and
one component was left out). Even in an ad hoc query I tend to use
SELECT COUNT(*)
since that gives me a little bit more information for the same cost as
DISTINCT (i.e. it reads the same amount of data).

Seeing this post, I'm actually a little disappointed in you, David.
Your last sentence does show you mainly follow a good approach. It's
just that DISTINCT should be left to ad hoc queries IMO.

Have a good day.
Ed

Your comment doesn't agree with my experience. In general, there are two
circumstances where SELECT DISTINCT is the right approach. Both involve
situations where SELECT generates a bag, but the desired result is a set.

The first is a situation where the database has been misdesigned. If the
database has been frozen for a year or more, and there is a lot of
production software that depends on the existing table design, it may
simply be unfeasable to correct the design in order to make one query
logically easy to write.
Okay, you have a case where the design error is not in the query. We
agree there is still something wrong here.

Quote:
The second is a situation where the query requires a subset of the data that
does not include any candidate keys from the underlying tables. In that
case, the generation of a query requires its own mechanism for eliminating
duplicates.
I'd like to see an example of this case.
Then the question would be: is this case a rare exception?
I think it is very rare, since honestly I haven't seen it.
You seem to be saying it is not so rare (especially in your other
post). If so, an example or two should be easy to show.

Maybe I am making some queries more complicated than they need to be.
Please show me the error of my ways.

Quote:
Perhaps the development of a new program, a long time after the database is
in production, and not easily altered, has much the same flavor of what you
have called an "ad hoc query". Using the database to organize data in a way
that the designer did not contemplate.
I am thinking of ad hoc as in a throwaway query. One you might do to
research an issue in the system. Occasionally it might be a business
request, but most are developer queries as part of new development
(especially when you just started a new job and there is little or no
documentation). And I would say it includes organizing data in ways
the designer did not anticipate. Whether these new ways are useful or
not is yet to be determined. So I guess, yes, in a few cases these
queries might work their way into a new program.

Ed


Reply With Quote
  #162  
Old   
Ed Prochak
 
Posts: n/a

Default Re: "code" tables? - 06-30-2008 , 07:22 AM






On Jun 30, 7:33 am, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote:
Quote:
"Ed Prochak" <edproc... (AT) gmail (DOT) com> wrote in message

news:4b1507c9-207e-4555-bd27-42dd0e84de9d (AT) 59g2000hsb (DOT) googlegroups.com...



On Jun 26, 12:31 pm, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote:
[]
I can't tell you the number of times programmers have come to me for
help
with a query
that produces wrong results, and when I try to change their "select" to
"select distinct" to see if maybe that's the problem, they exclaim "Oh
I
never use 'select distinct'! It runs too slow!"
I patiently explained to them that I first want to come up with a query
that
is logically correct, then if necessary, come up with one that's
logically
equivalent, but runs fast.

When I see
SELECT DISTINCT
in anything but an ad hoc query, I know something is wrong with the
query. Possibly it is joining to a wrong table (to a detail table
instead of a header/parent table for example) or some filtering
condition is missed (a column not in the select list needs to be
used), or some join condition was missed (there is a compound key and
one component was left out). Even in an ad hoc query I tend to use
SELECT COUNT(*)
since that gives me a little bit more information for the same cost as
DISTINCT (i.e. it reads the same amount of data).

Seeing this post, I'm actually a little disappointed in you, David.
Your last sentence does show you mainly follow a good approach. It's
just that DISTINCT should be left to ad hoc queries IMO.

Have a good day.
Ed

Your comment doesn't agree with my experience. In general, there are two
circumstances where SELECT DISTINCT is the right approach. Both involve
situations where SELECT generates a bag, but the desired result is a set.

The first is a situation where the database has been misdesigned. If the
database has been frozen for a year or more, and there is a lot of
production software that depends on the existing table design, it may
simply be unfeasable to correct the design in order to make one query
logically easy to write.
Okay, you have a case where the design error is not in the query. We
agree there is still something wrong here.

Quote:
The second is a situation where the query requires a subset of the data that
does not include any candidate keys from the underlying tables. In that
case, the generation of a query requires its own mechanism for eliminating
duplicates.
I'd like to see an example of this case.
Then the question would be: is this case a rare exception?
I think it is very rare, since honestly I haven't seen it.
You seem to be saying it is not so rare (especially in your other
post). If so, an example or two should be easy to show.

Maybe I am making some queries more complicated than they need to be.
Please show me the error of my ways.

Quote:
Perhaps the development of a new program, a long time after the database is
in production, and not easily altered, has much the same flavor of what you
have called an "ad hoc query". Using the database to organize data in a way
that the designer did not contemplate.
I am thinking of ad hoc as in a throwaway query. One you might do to
research an issue in the system. Occasionally it might be a business
request, but most are developer queries as part of new development
(especially when you just started a new job and there is little or no
documentation). And I would say it includes organizing data in ways
the designer did not anticipate. Whether these new ways are useful or
not is yet to be determined. So I guess, yes, in a few cases these
queries might work their way into a new program.

Ed


Reply With Quote
  #163  
Old   
Ed Prochak
 
Posts: n/a

Default Re: "code" tables? - 06-30-2008 , 07:22 AM



On Jun 30, 7:33 am, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote:
Quote:
"Ed Prochak" <edproc... (AT) gmail (DOT) com> wrote in message

news:4b1507c9-207e-4555-bd27-42dd0e84de9d (AT) 59g2000hsb (DOT) googlegroups.com...



On Jun 26, 12:31 pm, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote:
[]
I can't tell you the number of times programmers have come to me for
help
with a query
that produces wrong results, and when I try to change their "select" to
"select distinct" to see if maybe that's the problem, they exclaim "Oh
I
never use 'select distinct'! It runs too slow!"
I patiently explained to them that I first want to come up with a query
that
is logically correct, then if necessary, come up with one that's
logically
equivalent, but runs fast.

When I see
SELECT DISTINCT
in anything but an ad hoc query, I know something is wrong with the
query. Possibly it is joining to a wrong table (to a detail table
instead of a header/parent table for example) or some filtering
condition is missed (a column not in the select list needs to be
used), or some join condition was missed (there is a compound key and
one component was left out). Even in an ad hoc query I tend to use
SELECT COUNT(*)
since that gives me a little bit more information for the same cost as
DISTINCT (i.e. it reads the same amount of data).

Seeing this post, I'm actually a little disappointed in you, David.
Your last sentence does show you mainly follow a good approach. It's
just that DISTINCT should be left to ad hoc queries IMO.

Have a good day.
Ed

Your comment doesn't agree with my experience. In general, there are two
circumstances where SELECT DISTINCT is the right approach. Both involve
situations where SELECT generates a bag, but the desired result is a set.

The first is a situation where the database has been misdesigned. If the
database has been frozen for a year or more, and there is a lot of
production software that depends on the existing table design, it may
simply be unfeasable to correct the design in order to make one query
logically easy to write.
Okay, you have a case where the design error is not in the query. We
agree there is still something wrong here.

Quote:
The second is a situation where the query requires a subset of the data that
does not include any candidate keys from the underlying tables. In that
case, the generation of a query requires its own mechanism for eliminating
duplicates.
I'd like to see an example of this case.
Then the question would be: is this case a rare exception?
I think it is very rare, since honestly I haven't seen it.
You seem to be saying it is not so rare (especially in your other
post). If so, an example or two should be easy to show.

Maybe I am making some queries more complicated than they need to be.
Please show me the error of my ways.

Quote:
Perhaps the development of a new program, a long time after the database is
in production, and not easily altered, has much the same flavor of what you
have called an "ad hoc query". Using the database to organize data in a way
that the designer did not contemplate.
I am thinking of ad hoc as in a throwaway query. One you might do to
research an issue in the system. Occasionally it might be a business
request, but most are developer queries as part of new development
(especially when you just started a new job and there is little or no
documentation). And I would say it includes organizing data in ways
the designer did not anticipate. Whether these new ways are useful or
not is yet to be determined. So I guess, yes, in a few cases these
queries might work their way into a new program.

Ed


Reply With Quote
  #164  
Old   
--CELKO--
 
Posts: n/a

Default Re: "code" tables? - 06-30-2008 , 09:50 AM



Quote:
I can't help wishing that the optimizer could figure it out. But that's probably way down on the list of things an optimizer should be able to do.
Actually, it is not too hard to detect in many cases. If the result
set has a key in it without a cross join, then the rows are UNIQUE.
Perfect hashing and bit vector indexing can also spot and reject
redundant duplicates during processing.





Reply With Quote
  #165  
Old   
--CELKO--
 
Posts: n/a

Default Re: "code" tables? - 06-30-2008 , 09:50 AM



Quote:
I can't help wishing that the optimizer could figure it out. But that's probably way down on the list of things an optimizer should be able to do.
Actually, it is not too hard to detect in many cases. If the result
set has a key in it without a cross join, then the rows are UNIQUE.
Perfect hashing and bit vector indexing can also spot and reject
redundant duplicates during processing.





Reply With Quote
  #166  
Old   
--CELKO--
 
Posts: n/a

Default Re: "code" tables? - 06-30-2008 , 09:50 AM



Quote:
I can't help wishing that the optimizer could figure it out. But that's probably way down on the list of things an optimizer should be able to do.
Actually, it is not too hard to detect in many cases. If the result
set has a key in it without a cross join, then the rows are UNIQUE.
Perfect hashing and bit vector indexing can also spot and reject
redundant duplicates during processing.





Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.