dbTalk Databases Forums  

perform expensive calculation on a number of rows without locking allrows until all complete

comp.databases.oracle.misc comp.databases.oracle.misc


Discuss perform expensive calculation on a number of rows without locking allrows until all complete in the comp.databases.oracle.misc forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Martin
 
Posts: n/a

Default perform expensive calculation on a number of rows without locking allrows until all complete - 09-28-2011 , 01:55 PM






I have a table entry which contains a column called recalc (value 0 or
1). Sometimes code sets the recalc column to 1 indicating to a job
which runs at regular intervals it should update another table history
with expensive calculations for the entry. When the calculation is
complete (or when the rows with entry.recalc=1 are selected so we know
which entries to perform the calculation on) the entry.recalc column
needs to be reset to 0. As the calculation is expensive I do not want
to keep the rows in the entry table locked whilst the calculation is
performed on each entry although keeping one entry locked whilst its
calculation is performed is not so bad. Additionally, other code might
want to set entry.recalc to 1 at any time.

my first (working but probably ora 01555 snapshot out of date, error
prone) was:

for arow in (select entry.id from entry where recalc = 1) loop
perform expensive calculation
update entry set recalc = 0 where entry.id = arow.id;
commit; -- potential snapshot out of date problem
end loop

as although the calculation is expensive per entry it is not so bad
but I do
not want to lock all entries needing recalc whilst all entries are
recalculated.

Ideally I think I need to select the entry columns with recalc = 1
into a list (say a global temp table) for update of the recalc column,
reset entry.recalc = 0 for all entries in one go, commit, then perform
the calculations but I cannot work out a way to do this e.g. (yes, I
know this is invalid sql, just here to demonstrate the problem),

insert into gtt (select id from entry where recalc = 1 for update of
recalc);
update entry set recalc = 0 where entry.id in (select id from gtt);
commit; -- nothing locked for write now
loop through gtt performing calculation which is independent of the
entry

of course, you cannot do this.

or

update all entry.recalc = 0 where entry.recalc=1 returning entry.id
(for all ids)

Any ideas?

Thanks

Reply With Quote
  #2  
Old   
Andreas Leitgeb
 
Posts: n/a

Default Re: perform expensive calculation on a number of rows withoutlocking all rows until all complete - 09-28-2011 , 02:54 PM






Martin <martin.j.evans (AT) gmail (DOT) com> wrote:
Quote:
I have a table entry which contains a column called recalc (value 0 or
1). Sometimes code sets the recalc column to 1 indicating to a job
which runs at regular intervals it should update another table history
with expensive calculations for the entry. When the calculation is
complete (or when the rows with entry.recalc=1 are selected so we know
which entries to perform the calculation on) the entry.recalc column
needs to be reset to 0. As the calculation is expensive I do not want
to keep the rows in the entry table locked whilst the calculation is
performed on each entry although keeping one entry locked whilst its
calculation is performed is not so bad. Additionally, other code might
want to set entry.recalc to 1 at any time.
Some idea:
let the job run something like the following pseudocode:
loop
update ... set recalc=0 where recalc=1 AND ROWNUM=1 returning id
commit
if none found then exit from loop
perform longish task for id
end loop

It will then find only one row, process it, then search anew
for another such row. From your description, it seems like
the longish task would run long enough, that the cost of redoing
the select for recalc=1 from scratch each time doesn't really
matter all that much. (you do have an index on recalc, don't you?)

Reply With Quote
  #3  
Old   
Martin
 
Posts: n/a

Default Re: perform expensive calculation on a number of rows without lockingall rows until all complete - 09-29-2011 , 03:18 AM



On Sep 28, 8:54*pm, Andreas Leitgeb <a... (AT) gamma (DOT) logic.tuwien.ac.at>
wrote:
Quote:
Martin <martin.j.ev... (AT) gmail (DOT) com> wrote:
I have a table entry which contains a column called recalc (value 0 or
1). Sometimes code sets the recalc column to 1 indicating to a job
which runs at regular intervals it should update another table history
with expensive calculations for the entry. When the calculation is
complete (or when the rows with entry.recalc=1 are selected so we know
which entries to perform the calculation on) the entry.recalc column
needs to be reset to 0. As the calculation is expensive I do not want
to keep the rows in the entry table locked whilst the calculation is
performed on each entry although keeping one entry locked whilst its
calculation is performed is not so bad. Additionally, other code might
want to set entry.recalc to 1 at any time.

Some idea:
let the job run something like the following pseudocode:
* *loop
* * * update ... set recalc=0 where recalc=1 * AND ROWNUM=1* returning id
* * * commit
* * * if none found then exit from loop
* * * perform longish task for id
* *end loop

It will then find only one row, process it, then search anew
for another such row. *From your description, it seems like
the longish task would run long enough, that the cost of redoing
the select for recalc=1 from scratch each time doesn't really
matter all that much. (you do have an index on recalc, don't you?)
Interesting idea and thanks for answering but the problem is that
whilst the calculations are taking place per entry other processes
will be setting the recalc=1 in other entries so there is a
possibility my job would never finish. I could count how many recalc=1
rows there are first (and only loop at most that many times) then
follow your suggestion but I've no way of ensuring that if more rows
are set reclac=1 whilst I'm looping then I might keep missing ones at
the end. I guess I could fix that by changing recalc from an integer
to a timestamp and order by oldest first but then other processes
might reset the recalc column to a newer timestamp. To prevent that I
could change the code which sets recalc to only set it if it is null.

So, maybe there is something to look in to here.

If any one else has any ideas I'd appreciate it.

Thanks.

Martin

Reply With Quote
  #4  
Old   
joel garry
 
Posts: n/a

Default Re: perform expensive calculation on a number of rows without lockingall rows until all complete - 09-29-2011 , 11:26 AM



On Sep 29, 1:18*am, Martin <martin.j.ev... (AT) gmail (DOT) com> wrote:
Quote:
On Sep 28, 8:54*pm, Andreas Leitgeb <a... (AT) gamma (DOT) logic.tuwien.ac.at
wrote:



Martin <martin.j.ev... (AT) gmail (DOT) com> wrote:
I have a table entry which contains a column called recalc (value 0 or
1). Sometimes code sets the recalc column to 1 indicating to a job
which runs at regular intervals it should update another table history
with expensive calculations for the entry. When the calculation is
complete (or when the rows with entry.recalc=1 are selected so we know
which entries to perform the calculation on) the entry.recalc column
needs to be reset to 0. As the calculation is expensive I do not want
to keep the rows in the entry table locked whilst the calculation is
performed on each entry although keeping one entry locked whilst its
calculation is performed is not so bad. Additionally, other code might
want to set entry.recalc to 1 at any time.

Some idea:
let the job run something like the following pseudocode:
* *loop
* * * update ... set recalc=0 where recalc=1 * AND ROWNUM=1 * returning id
* * * commit
* * * if none found then exit from loop
* * * perform longish task for id
* *end loop

It will then find only one row, process it, then search anew
for another such row. *From your description, it seems like
the longish task would run long enough, that the cost of redoing
the select for recalc=1 from scratch each time doesn't really
matter all that much. (you do have an index on recalc, don't you?)

Interesting idea and thanks for answering but the problem is that
whilst the calculations are taking place per entry other processes
will be setting the recalc=1 in other entries so there is a
possibility my job would never finish. I could count how many recalc=1
rows there are first (and only loop at most that many times) then
follow your suggestion but I've no way of ensuring that if more rows
are set reclac=1 whilst I'm looping then I might keep missing ones at
the end. I guess I could fix that by changing recalc from an integer
to a timestamp and order by oldest first but then other processes
might reset the recalc column to a newer timestamp. To prevent that I
could change the code which sets recalc to only set it if it is null.

So, maybe there is something to look in to here.

If any one else has any ideas I'd appreciate it.

Thanks.

Martin
I think if multiple recalcs can happen on a row, you need some kind of
queuing. Perhaps add one for each calculation, setting to zero when
the last one is done. I hesitate to say more, else we'll be writing a
full-blown transaction processor.

jg
--
@home.com is bogus.
Give a man a fish, you've fed him for a day.
Teach a man to fish, and you will have misleading database
aggregation. http://www.signonsandiego.com/news/2...falling-apart/

Reply With Quote
  #5  
Old   
Gerard H. Pille
 
Posts: n/a

Default Re: perform expensive calculation on a number of rows without lockingall rows until all complete - 09-29-2011 , 11:43 AM



Martin wrote:
Quote:
insert into gtt (select id from entry where recalc = 1 for update of
recalc);
update entry set recalc = 0 where entry.id in (select id from gtt);
commit; -- nothing locked for write now
loop through gtt performing calculation which is independent of the
entry

of course, you cannot do this.

or

update all entry.recalc = 0 where entry.recalc=1 returning entry.id
(for all ids)

Any ideas?

Thanks
You've never heard of PL/SQL ? Any other 3GL?

Fetch a list of all entries needing recalculation, store that list in memory, start
recalculating one by one.

Reply With Quote
  #6  
Old   
Martin
 
Posts: n/a

Default Re: perform expensive calculation on a number of rows without lockingall rows until all complete - 09-29-2011 , 12:00 PM



On Sep 29, 5:43*pm, "Gerard H. Pille" <g... (AT) skynet (DOT) be> wrote:
Quote:
Martin wrote:
insert into gtt (select id from entry where recalc = 1 for update of
recalc);
update entry set recalc = 0 where entry.id in (select id from gtt);
commit; -- nothing locked for write now
loop through gtt performing calculation which is independent of the
entry

of course, you cannot do this.

or

update all entry.recalc = 0 where entry.recalc=1 returning entry.id
(for all ids)

Any ideas?

Thanks

You've never heard of PL/SQL ? * Any other 3GL?
?

Quote:
Fetch a list of all entries needing recalculation, store that list in memory, start
recalculating one by one.
Cannot do that as each entry needs updating to set recalc = 0.

If I fetched the list for update of recalc field all rows are locked
until the processing is complete.

If I update all the rows so recalc = 0 I can no longer find which rows
to operate on since update does not return multiple rowids.

If I select all the rows with recalc = 1 "into memory" as you say then
run an update to set recalc = 0 I could be clearing recalc on an entry
which has since been re-set to 1 since there was no lock between the
select and the update.

Martin

Reply With Quote
  #7  
Old   
Martin
 
Posts: n/a

Default Re: perform expensive calculation on a number of rows without lockingall rows until all complete - 09-29-2011 , 12:11 PM



On Sep 29, 5:26*pm, joel garry <joel-ga... (AT) home (DOT) com> wrote:
Quote:
On Sep 29, 1:18*am, Martin <martin.j.ev... (AT) gmail (DOT) com> wrote:









On Sep 28, 8:54*pm, Andreas Leitgeb <a... (AT) gamma (DOT) logic.tuwien.ac.at
wrote:

Martin <martin.j.ev... (AT) gmail (DOT) com> wrote:
I have a table entry which contains a column called recalc (value 0or
1). Sometimes code sets the recalc column to 1 indicating to a job
which runs at regular intervals it should update another table history
with expensive calculations for the entry. When the calculation is
complete (or when the rows with entry.recalc=1 are selected so weknow
which entries to perform the calculation on) the entry.recalc column
needs to be reset to 0. As the calculation is expensive I do not want
to keep the rows in the entry table locked whilst the calculation is
performed on each entry although keeping one entry locked whilst its
calculation is performed is not so bad. Additionally, other code might
want to set entry.recalc to 1 at any time.

Some idea:
let the job run something like the following pseudocode:
* *loop
* * * update ... set recalc=0 where recalc=1 * AND ROWNUM=1 * returning id
* * * commit
* * * if none found then exit from loop
* * * perform longish task for id
* *end loop

It will then find only one row, process it, then search anew
for another such row. *From your description, it seems like
the longish task would run long enough, that the cost of redoing
the select for recalc=1 from scratch each time doesn't really
matter all that much. (you do have an index on recalc, don't you?)

Interesting idea and thanks for answering but the problem is that
whilst the calculations are taking place per entry other processes
will be setting the recalc=1 in other entries so there is a
possibility my job would never finish. I could count how many recalc=1
rows there are first (and only loop at most that many times) then
follow your suggestion but I've no way of ensuring that if more rows
are set reclac=1 whilst I'm looping then I might keep missing ones at
the end. I guess I could fix that by changing recalc from an integer
to a timestamp and order by oldest first but then other processes
might reset the recalc column to a newer timestamp. To prevent that I
could change the code which sets recalc to only set it if it is null.

So, maybe there is something to look in to here.

If any one else has any ideas I'd appreciate it.

Thanks.

Martin

I think if multiple recalcs can happen on a row, you need some kind of
queuing. *Perhaps add one for each calculation, setting to zero when
the last one is done. *I hesitate to say more, else we'll be writing a
full-blown transaction processor.

jg
Nice idea, why didn't I think of that.

If I increment recalc each time it needs a recalc then do:

select entry_id, recalc from entry where recalc > 0 into a global temp
table
update entry set recalc = recalc - (select recalc from global temp
table)
loop through global temp table doing expensive processing

Thanks

Martin

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.