dbTalk Databases Forums  

use a generated column in a where clause

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss use a generated column in a where clause in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Arun Srini
 
Posts: n/a

Default use a generated column in a where clause - 04-26-2010 , 07:28 AM






I have a huge table and I have the following requirement :
Table1
---------
client
day_of_sale
qty

Table2
---------
same ddl

Now table2 needs to have the same data except for the first
day_of_sale value for each client.

So I have the query
select client,day_of_sale,qty from table(select
client,day_of_sale,qty, row_number() over(partition by client order by
day_of_sale) as rownum ) from table1 where rownum>1

This gives me a huge cost since its going to build up the inner table
and then select the 'rest' of the columns in the outer table.
To stop doing the two scans (use lesser memory to store the whole
inner table in buffer and then loop through it), what we did was
1. add the row number column in table 2
2. put in a check constraint in the table 2 that has only >1
3. load to table 2
4. set integrity

This eats up the concurrency of our project since the loaded table
can't be used while the load happens.
So I would be very happy if someone can give me a single sql without
CTE to do the above.
We are playing with the OLAP and windowing functions, but no joy till
now.

Any help or suggestions plz

Reply With Quote
  #2  
Old   
Lennart Jonsson
 
Posts: n/a

Default Re: use a generated column in a where clause - 04-26-2010 , 09:04 AM






On 2010-04-26 14:28, Arun Srini wrote:
Quote:
I have a huge table and I have the following requirement :
Table1
---------
client
day_of_sale
qty

Table2
---------
same ddl

Now table2 needs to have the same data except for the first
day_of_sale value for each client.

So I have the query
select client,day_of_sale,qty from table(select
client,day_of_sale,qty, row_number() over(partition by client order by
day_of_sale) as rownum ) from table1 where rownum>1

This gives me a huge cost since its going to build up the inner table
and then select the 'rest' of the columns in the outer table.
To stop doing the two scans (use lesser memory to store the whole
inner table in buffer and then loop through it), what we did was
1. add the row number column in table 2
2. put in a check constraint in the table 2 that has only >1
3. load to table 2
4. set integrity

This eats up the concurrency of our project since the loaded table
can't be used while the load happens.
So I would be very happy if someone can give me a single sql without
CTE to do the above.
We are playing with the OLAP and windowing functions, but no joy till
now.

Any help or suggestions plz
Just a thought, if I get it right the set of data NOT included in
"table2" is fairly stable, it only grows when a new client is added,
correct?

create table table3 (
client ...,
day_of_sale ...,

primary key (client, day_of_sale)
);

insert into table3 (client,day_of_sale)
select client, min( day_of_sale )
from table1
group by client;

If you can maintain table3 somehow (triggers for example), table 2 would be:

select client,day_of_sale,qty
from table1 x
left join table3 y
on (x.client, x.day_of_sale)
= (y.client, y.day_of_sale)
where y.client is null;

since table3 should be much smaller than table1, it might be an
improvement. As mentioned, just a thought.


/Lennart

Reply With Quote
  #3  
Old   
--CELKO--
 
Posts: n/a

Default Re: use a generated column in a where clause - 04-26-2010 , 09:56 PM



Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. If you know how, follow ISO-11179 data element naming
conventions and formatting rules. Temporal data should use ISO-8601
formats. Code should be in Standard SQL as much as possible and not
local dialect.

Sample data is also a good idea, along with clear specifications. It
is very hard to debug code when you do not let us see it. If you want
to learn how to ask a question on a Newsgroup, look at:
http://www.catb.org/~esr/faqs/smart-questions.html

But the real question is why such a pair of tables, loaded with
redundant data, exist at all!! Just make a VIEW on the larger table
to get the trimmed table for whatever reason you need it.

Reply With Quote
  #4  
Old   
Arun Srini
 
Posts: n/a

Default Re: use a generated column in a where clause - 04-27-2010 , 06:07 AM



Table1
---------
create table table1(client char(5), day_of_sale date, qty bigint);

Table2
---------
Same as table 1

Requirement - table1 is a staging table, gets loaded using an ETL.
Table 2 is the production table.
Due to a problem with client data, the first record for everymonth,
every client,ex.
('Belgium','01/01/2010',-10)

This will always have a negative value, and is junk data. We have
asked the client to rectify this, but while we are waiting for them to
do it, we are planning to put a data cleanse solution ourselves.
So I need to just leave the first row for every month per client
alone, and move the rest
Source
----------
('Belgium','01/01/2010',-10)
('Belgium','01/07/2010',1)
('Belgium','01/04/2010',3)
('Megallen','01/01/2010',-10)
('Megallen','01/21/2010',10)
('Megallen','01/02/2010',4)
('Megallen','01/14/2010',9)
Target
--------

('Belgium','01/07/2010',1)
('Belgium','01/04/2010',3)

('Megallen','01/21/2010',10)
('Megallen','01/02/2010',4)
('Megallen','01/14/2010',9)
This is not only needed for moving the data but also reporting how
many rows are there per month per client. Like

Counts
----------
Jan -2010 Belgium - 2
Jan -2010 Megallen -3

What we do is

select * from table
(select row_number() over(partition by client order by day_of_sale) as
rownum, client, qty from table1) as x
where x.rownum !=1;

This gives me a huge cost since there is two reads here.
I want to do something that would require one read , like select <>
from table1;
Hope the additional data helps.

Reply With Quote
  #5  
Old   
Lennart Jonsson
 
Posts: n/a

Default Re: use a generated column in a where clause - 04-27-2010 , 08:59 AM



On 2010-04-27 13:07, Arun Srini wrote:
Quote:
Table1
---------
create table table1(client char(5), day_of_sale date, qty bigint);

Table2
---------
Same as table 1

Requirement - table1 is a staging table, gets loaded using an ETL.
Table 2 is the production table.
Due to a problem with client data, the first record for everymonth,
every client,ex.
('Belgium','01/01/2010',-10)

This will always have a negative value, and is junk data.
Are these the only ones with a negative value?

/Lennart

[...]

Reply With Quote
  #6  
Old   
Ian
 
Posts: n/a

Default Re: use a generated column in a where clause - 04-27-2010 , 01:56 PM



On 4/27/10 4:07 AM, Arun Srini wrote:
Quote:
Table1
---------
create table table1(client char(5), day_of_sale date, qty bigint);

Table2
---------
Same as table 1

Requirement - table1 is a staging table, gets loaded using an ETL.
Table 2 is the production table.
Due to a problem with client data, the first record for everymonth,
every client,ex.
('Belgium','01/01/2010',-10)

This will always have a negative value, and is junk data.
Can qty ever be negative? If not, then just a simple predicate for
qty >= 0 sounds like it should suffice.

If not, have you created any indexes on the staging table to help
with the query (i.e. to try and avoid a sort)?

Reply With Quote
  #7  
Old   
Arun Srini
 
Posts: n/a

Default Re: use a generated column in a where clause - 04-27-2010 , 11:22 PM



On Apr 27, 6:59*pm, Lennart Jonsson <erik.lennart.jons... (AT) gmail (DOT) com>
wrote:
Quote:
On 2010-04-27 13:07, Arun Srini wrote:







Table1
---------
create table table1(client char(5), day_of_sale date, qty bigint);

Table2
---------
Same as table 1

Requirement - table1 is a staging table, gets loaded using an ETL.
Table 2 is the production table.
Due to a problem with client data, the first record for everymonth,
every client,ex.
('Belgium','01/01/2010',-10)

This will always have a negative value, and is junk data.

Are these the only ones with a negative value?

/Lennart

[...]
sorry the data tended to be misleading. The qty can be negative, this
is because the company would use negative values to give additional
data about the 'returns' of their products from customers.

Reply With Quote
  #8  
Old   
situ
 
Posts: n/a

Default Re: use a generated column in a where clause - 05-04-2010 , 01:35 AM



On Apr 28, 9:22*am, Arun Srini <arunro... (AT) gmail (DOT) com> wrote:
Quote:
On Apr 27, 6:59*pm, Lennart Jonsson <erik.lennart.jons... (AT) gmail (DOT) com
wrote:



On 2010-04-27 13:07, Arun Srini wrote:

Table1
---------
create table table1(client char(5), day_of_sale date, qty bigint);

Table2
---------
Same as table 1

Requirement - table1 is a staging table, gets loaded using an ETL.
Table 2 is the production table.
Due to a problem with client data, the first record for everymonth,
every client,ex.
('Belgium','01/01/2010',-10)

This will always have a negative value, and is junk data.

Are these the only ones with a negative value?

/Lennart

[...]

sorry the data tended to be misleading. The qty can be negative, this
is because the company would use negative values to give additional
data about the 'returns' of their products from customers.
Hi Arun,

Can't we tackle this situation when your ETL tool reads the data and
loads the staging table. create one more staging table and have ETL
tool to write the records to this as well( here business logic should
be adapted at ETL level), i assume it won't cause much system
overhead; also its going to be just normal insert and concurrency
should not be a problem.
also have you tried constructing the MQT, which we used in our old
days
hope this helps.

Regards,
Sri

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.