dbTalk Databases Forums  

How does a correlated subquery process?

comp.databases comp.databases


Discuss How does a correlated subquery process? in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Suzanne Dorman
 
Posts: n/a

Default How does a correlated subquery process? - 07-31-2003 , 10:18 AM






I'm trying to understand how correlated subqueries are processed. It
seems especially confusing when there are multiple conditions in the
outer where clause.

I have the following data:

TaxName Gender State BandAmount EffectiveDate
-----------------------------------------------------
cs080 Male CT 28000 1961/01/01
cs080 Female DE 35000 1950/01/01
tax Male CT 37000 1980/01/01
cs080 Male CT 28000 1981/01/01
cs080 Male FL 28000 2001/01/01

I want to get back the max EffectiveDate that is less than 1999/01/01
for those rows whose TaxName is cs080.

SELECT *
FROM Table2 t1
WHERE TaxName = 'cs080'
AND EffectiveDate = (SELECT MAX(t2.EffectiveDate) FROM Table2 t2
WHERE t2.TaxName = t1.TaxName AND
t2.EffectiveDate <= '1999/01/01')


This works fine, returning:
cs080 Male CT 28000 1981/01/01


If I want to get back the max EffectiveDate that is less than
1999/01/01 for those rows whose TaxName is cs080 and whose Gender is
male:

SELECT *
FROM Table2 t1
WHERE TaxName = 'cs080'
AND Gender = 'Male'
AND EffectiveDate = (SELECT MAX(t2.EffectiveDate) FROM Table2 t2
WHERE t2.TaxName = t1.TaxName AND
t2.EffectiveDate <= '1999/01/01')

This works fine too, returning the same line:
cs080 Male CT 28000 1981/01/01


If I change the gender to female, I thought I would get the one female
that is in there

SELECT *
FROM Table2 t1
WHERE TaxName = 'cs080'
AND Gender = 'Female'
AND EffectiveDate = (SELECT MAX(t2.EffectiveDate) FROM Table2 t2
WHERE t2.TaxName = t1.TaxName AND
t2.EffectiveDate <= '1999/01/01')

But I get nothing!

I thought the outer clause (including its WHERE) would get processed
and then each resultant row would be used to determine the max
effective date.

If I add Gender as a correlated variable, I do get the one female row.

SELECT *
FROM Table2 t1
WHERE TaxName = 'cs080'
AND Gender = 'Female'
AND EffectiveDate = (SELECT MAX(t2.EffectiveDate) FROM Table2 t2
WHERE t2.TaxName = t1.TaxName AND t2.Gender =
t1.Gender AND t2.EffectiveDate <= '1999/01/01')

Result:
cs080 Female DE 35000 1950/01/01

Is it only performing the outer WHERE for those variables that are
specified as correlation variables? And then performing the subquery
on that result set and applying the rest of the outer WHERE at the
end?

If so, what if I have 8-10 values that are tested in the outer WHERE?
Do I have to include all of those columns as correlation variables in
the subquery?

Sorry if I'm way off base but every time I see something about
correlation subqueries, only one item is ever shown in the outer
WHERE.

Reply With Quote
  #2  
Old   
--CELKO--
 
Posts: n/a

Default Re: How does a correlated subquery process? - 07-31-2003 , 06:12 PM






Quote:
I have the following data:
TaxName Gender State BandAmount EffectiveDate
-----------------------------------------------------
cs080 Male CT 28000 1961/01/01
cs080 Female DE 35000 1950/01/01
tax Male CT 37000 1980/01/01
cs080 Male CT 28000 1981/01/01
cs080 Male FL 28000 2001/01/01 <<

What is the key? It looks like it might be (tax_name, gender, state,
effective_date), but without any DDL who knows? In fact, you don't
even give us the name of this quasi-table; I guess that "Table2" is
the industry standard, meaningful name? Thre is no way to really
answer your question without the DDL. Do you have to work from specs
this bad at your job?

Quote:
I want to get back the max EffectiveDate that is less than [sic:
earlier than] 1999-01-01 for those rows whose tax_name is cs080.

As per your specs:

SELECT MAX (effective_date)
FROM Foobar
WHERE tax_name = 'cs080'
AND effective_date < '1999-01-01';

Quote:
If I want to get back the max EffectiveDate that is less than [sic:
earlier than] 1999-01-01 for those rows whose tax_name is cs080 and
whose gender is male: <<

As per your specs:

SELECT MAX(effective_date)
FROM Foobar
WHERE TaxName = 'cs080'
AND effective_date < '1999-01-01'
AND gender = 'male';

Quote:
If I change the gender to female, I thought I would get the one
female that is in there

As per your specs:

SELECT MAX(effective_date)
FROM Foobar
WHERE TaxName = 'cs080'
AND effective_date < '1999-01-01'
AND gender = 'female';

Quote:
I thought the outer clause (including its WHERE) would get
processed and then each resultant row would be used to determine the
max effective date. <<

No, like all block structured languages, the innermost query is done
first, and (in effect) converted into a table constant. You work
outward to the containing queries.

Quote:
Sorry if I'm way off base, but every time I see something about
correlated subqueries, only one column is ever shown in the outer
WHERE. <<

Why do you go out of your way to do correlations, when you have
constants? I don't think you gave the right specs. Let's try this
example instead: what is the diffrence between

1) Find the tallest fat man.
2) Find the fatest tall man.

SELECT *
FROM Personnel AS P0
WHERE height
= (SELECT MAX(height)
FROM Personnel AS P1
WHERE gender = 'male'
AND weight
= (SELECT MAX(weight)
FROM Personnel AS P2
WHERE gender = 'male'));

In this query, we find the heaviest guys in the personnel in the
innermost query. Using that weight, we next find the tallest among
them.

SELECT *
FROM Personnel AS P0
WHERE weight
= (SELECT MAX(weight)
FROM Personnel AS P1
WHERE gender = 'male'
AND height
= (SELECT MAX(height)
FROM Personnel AS P2
WHERE gender = 'male'));

In this query, we find the tallest guys in the personnel in the
innermost query. Using that height, we next find the heaviest among
them. The trick is in nesting the subqueries to get the priority
ordering right. (It might be the same guy or it might not; you might
have ties).

If I wanted to make this more general as to gender, then I could use
corelation:

SELECT *
FROM Personnel AS P0
WHERE weight
= (SELECT MAX(weight)
FROM Personnel AS P1
WHERE P1.gender = P0.gender
AND height
= (SELECT MAX(height)
FROM Personnel AS P2
WHERE P2.gender = P0.gender));

Here is how a SELECT works in SQL ... at least in theory. Real
products will optimize things when they can.

a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors
are there. The table expression> AS <correlation name> option allows
you give a name to this working table which you then have to use for
the rest of the containing query.

b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (reject UNKNOWN and FALSE). The
WHERE clause is applied to the working in the FROM clause.

c) Go to the optional GROUP BY clause, make groups and reduce each
group to a single row, replacing the original working table with the
new grouped table. The rows of a grouped table must be group
characteristics: (1) a grouping column (2) a statistic about the group
(i.e. aggregate functions) (3) a function or (4) an expression made up
of the those three items.

d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.

e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The AS
operator can give a name to expressions in the SELECT list, too.
These new names come into existence all at once, but after the WHERE
clause, GROUP BY clause and HAVING clause has been executed; you
cannot use them in the SELECT list or the WHERE clause for that
reason.

If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).

f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.


Reply With Quote
  #3  
Old   
Malcolm Dew-Jones
 
Posts: n/a

Default Re: How does a correlated subquery process? - 07-31-2003 , 08:19 PM



Suzanne Dorman (suzanne.dorman (AT) segllc (DOT) com) wrote:
: I'm trying to understand how correlated subqueries are processed. It
: seems especially confusing when there are multiple conditions in the
: outer where clause.

: I have the following data:

: TaxName Gender State BandAmount EffectiveDate
: -----------------------------------------------------
: cs080 Male CT 28000 1961/01/01
: cs080 Female DE 35000 1950/01/01
: tax Male CT 37000 1980/01/01
: cs080 Male CT 28000 1981/01/01
: cs080 Male FL 28000 2001/01/01

: I want to get back the max EffectiveDate that is less than 1999/01/01
: for those rows whose TaxName is cs080.

: SELECT *
: FROM Table2 t1
: WHERE TaxName = 'cs080'
: AND EffectiveDate = (SELECT MAX(t2.EffectiveDate) FROM Table2 t2
: WHERE t2.TaxName = t1.TaxName AND
: t2.EffectiveDate <= '1999/01/01')


: This works fine, returning:
: cs080 Male CT 28000 1981/01/01


: If I want to get back the max EffectiveDate that is less than
: 1999/01/01 for those rows whose TaxName is cs080 and whose Gender is
: male:

: SELECT *
: FROM Table2 t1
: WHERE TaxName = 'cs080'
: AND Gender = 'Male'
: AND EffectiveDate = (SELECT MAX(t2.EffectiveDate) FROM Table2 t2
: WHERE t2.TaxName = t1.TaxName AND
: t2.EffectiveDate <= '1999/01/01')

: This works fine too, returning the same line:
: cs080 Male CT 28000 1981/01/01


: If I change the gender to female, I thought I would get the one female
: that is in there

: SELECT *
: FROM Table2 t1
: WHERE TaxName = 'cs080'
: AND Gender = 'Female'
: AND EffectiveDate = (SELECT MAX(t2.EffectiveDate) FROM Table2 t2
: WHERE t2.TaxName = t1.TaxName AND
: t2.EffectiveDate <= '1999/01/01')

: But I get nothing!

: I thought the outer clause (including its WHERE) would get processed
: and then each resultant row would be used to determine the max
: effective date.

As you see, it doesn't.

The inner query will always select all the rows from the entire table,
subject to the conditions you provide. In this case the only condition is
that the inner query must select rows that have the same Taxname as the
"potentially selected" row from outer query. There are lots of rows with
the same taxname as the female's row , but the max effective date of those
rows belongs to a Male. Therefore his date is selected in the inner
query, and then her row is only selected if her date happens to be the
same as his, which it isn't, so her row is not selected.

Apparently this isn't what you want, but there is nothing inherently wrong
with this. You might very well want only those females who have dates
later than anyone else, male or female.


Reply With Quote
  #4  
Old   
Suzanne Dorman
 
Posts: n/a

Default Re: How does a correlated subquery process? - 08-01-2003 , 09:38 AM



yf110 (AT) vtn1 (DOT) victoria.tc.ca (Malcolm Dew-Jones) wrote in message news:<3f29b227 (AT) news (DOT) victoria.tc.ca>...
Quote:
Suzanne Dorman (suzanne.dorman (AT) segllc (DOT) com) wrote:
: I'm trying to understand how correlated subqueries are processed. It
: seems especially confusing when there are multiple conditions in the
: outer where clause.

: I have the following data:

: TaxName Gender State BandAmount EffectiveDate
: -----------------------------------------------------
: cs080 Male CT 28000 1961/01/01
: cs080 Female DE 35000 1950/01/01
: tax Male CT 37000 1980/01/01
: cs080 Male CT 28000 1981/01/01
: cs080 Male FL 28000 2001/01/01

: I want to get back the max EffectiveDate that is less than 1999/01/01
: for those rows whose TaxName is cs080.

: SELECT *
: FROM Table2 t1
: WHERE TaxName = 'cs080'
: AND EffectiveDate = (SELECT MAX(t2.EffectiveDate) FROM Table2 t2
: WHERE t2.TaxName = t1.TaxName AND
: t2.EffectiveDate <= '1999/01/01')


: This works fine, returning:
: cs080 Male CT 28000 1981/01/01


: If I want to get back the max EffectiveDate that is less than
: 1999/01/01 for those rows whose TaxName is cs080 and whose Gender is
: male:

: SELECT *
: FROM Table2 t1
: WHERE TaxName = 'cs080'
: AND Gender = 'Male'
: AND EffectiveDate = (SELECT MAX(t2.EffectiveDate) FROM Table2 t2
: WHERE t2.TaxName = t1.TaxName AND
: t2.EffectiveDate <= '1999/01/01')

: This works fine too, returning the same line:
: cs080 Male CT 28000 1981/01/01


: If I change the gender to female, I thought I would get the one female
: that is in there

: SELECT *
: FROM Table2 t1
: WHERE TaxName = 'cs080'
: AND Gender = 'Female'
: AND EffectiveDate = (SELECT MAX(t2.EffectiveDate) FROM Table2 t2
: WHERE t2.TaxName = t1.TaxName AND
: t2.EffectiveDate <= '1999/01/01')

: But I get nothing!

: I thought the outer clause (including its WHERE) would get processed
: and then each resultant row would be used to determine the max
: effective date.

As you see, it doesn't.

The inner query will always select all the rows from the entire table,
subject to the conditions you provide. In this case the only condition is
that the inner query must select rows that have the same Taxname as the
"potentially selected" row from outer query. There are lots of rows with
the same taxname as the female's row , but the max effective date of those
rows belongs to a Male. Therefore his date is selected in the inner
query, and then her row is only selected if her date happens to be the
same as his, which it isn't, so her row is not selected.

Apparently this isn't what you want, but there is nothing inherently wrong
with this. You might very well want only those females who have dates
later than anyone else, male or female.

Thanks. Using the table and not the results of the outer select is
quite different than my understanding.

So how would I do what I want? I want to select all rows that have a
TaxName of cs080 and a Gender of Female and then, from that set, I
want it to only return the row (or rows) with the maximum effective
date. Do I have to write 2 separate queries? If I use a correlated
subquery, do I have to specify every variable in the outer WHERE as
part of the inner WHERE?

Obviously, I have supplied a very simple test case. My table and my
query are more complicate with the query having 8 conditions in the
WHERE which must be met before it determines the max effective date
(which is why the correlated subquery seemed so messy - and slow!)


Reply With Quote
  #5  
Old   
Tokunaga T.
 
Posts: n/a

Default Re: How does a correlated subquery process? - 08-01-2003 , 09:03 PM



suzanne.dorman (AT) segllc (DOT) com (Suzanne Dorman) wrote in message
Quote:
As you see, it doesn't.

The inner query will always select all the rows from the entire table,
subject to the conditions you provide. In this case the only condition is
that the inner query must select rows that have the same Taxname as the
"potentially selected" row from outer query. There are lots of rows with
the same taxname as the female's row , but the max effective date of those
rows belongs to a Male. Therefore his date is selected in the inner
query, and then her row is only selected if her date happens to be the
same as his, which it isn't, so her row is not selected.

Apparently this isn't what you want, but there is nothing inherently wrong
with this. You might very well want only those females who have dates
later than anyone else, male or female.


Thanks. Using the table and not the results of the outer select is
quite different than my understanding.

So how would I do what I want? I want to select all rows that have a
TaxName of cs080 and a Gender of Female and then, from that set, I
want it to only return the row (or rows) with the maximum effective
date. Do I have to write 2 separate queries? If I use a correlated
subquery, do I have to specify every variable in the outer WHERE as
part of the inner WHERE?

Obviously, I have supplied a very simple test case. My table and my
query are more complicate with the query having 8 conditions in the
WHERE which must be met before it determines the max effective date
(which is why the correlated subquery seemed so messy - and slow!)
How about to add a condition in subquery?
(reformatted with my taste)
SELECT *
FROM Table2 t1
WHERE TaxName = 'cs080'
AND Gender = 'Female'
AND EffectiveDate
= (SELECT MAX(t2.EffectiveDate)
FROM Table2 t2
WHERE t2.TaxName = t1.TaxName
AND t2.Gender = t1.Gender -- Add this condition
AND t2.EffectiveDate <= '1999/01/01'
)
;


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.