dbTalk Databases Forums  

Batch job slows down

comp.databases.informix comp.databases.informix


Discuss Batch job slows down in the comp.databases.informix forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Ulf
 
Posts: n/a

Default Batch job slows down - 01-26-2010 , 04:49 AM






Hi !

IDS 10.00.FC8 on HP-UX 11.31, EMC SAN

We have been running the same batch job (transactions updating a
number of lagre tables 20 million rows) for a number of years. This
batch usually takes about 1 hour, now it takes up to 5 hours. Number
of transactions are the same, roughly. The batch does a number of
selects and based on them a a number of insert/updates/deletes.

We have not changed the batch program or any settings in IDS.

One idea is that the problem is connected to the statistics for the
tables. We where using "dostats" once a week, we ran into the problem,
did a new "dostats" and the problem vanished. The theory was the that
the update statistics somehow failed sometimes so we stopped using
"dostats".

After some major changes of values for indexed rows we decided to run
dostats again (it takes about 3 hours to run). We now hit the problem
again, no error messages from dostats. We did a manual update
statistics (script created by Server Studio), problem remains.

I have done a "set explain on" for the batch program, nothing that
sticks out.

Any ideas are appreciated.

Another try with "dostats" ?

Rebuild indexes ?

Reply With Quote
  #2  
Old   
Art Kagel
 
Posts: n/a

Default Re: Batch job slows down - 01-26-2010 , 06:58 AM






Is it possible that the tables that are being accessed have become
fragmented and need to be reorged? Why did/do you think that dostats has
anything to do with the problem? What do you mean 'did a new "dostats" and
the problem vanished'? If a manual update stats script created by Server
Studio didn't improve or hurt performance why not continue to use dostats or
the script?

To really diagnose this, I'd want to connect to the server and poke around.
We'll try our best by remote control, but...

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
IIUG Board of Directors (art (AT) iiug (DOT) org)

See you at the 2010 IIUG Informix Conference
April 25-28, 2010
Overland Park (Kansas City), KS
www.iiug.org/conf

Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Advanced DataTools, the IIUG, nor any other
organization with which I am associated either explicitly, implicitly, or by
inference. Neither do those opinions reflect those of other individuals
affiliated with any entity with which I am affiliated nor those of the
entities themselves.



On Tue, Jan 26, 2010 at 4:49 AM, Ulf <ulf.akerberg (AT) gmail (DOT) com> wrote:

Quote:
Hi !

IDS 10.00.FC8 on HP-UX 11.31, EMC SAN

We have been running the same batch job (transactions updating a
number of lagre tables 20 million rows) for a number of years. This
batch usually takes about 1 hour, now it takes up to 5 hours. Number
of transactions are the same, roughly. The batch does a number of
selects and based on them a a number of insert/updates/deletes.

We have not changed the batch program or any settings in IDS.

One idea is that the problem is connected to the statistics for the
tables. We where using "dostats" once a week, we ran into the problem,
did a new "dostats" and the problem vanished. The theory was the that
the update statistics somehow failed sometimes so we stopped using
"dostats".

After some major changes of values for indexed rows we decided to run
dostats again (it takes about 3 hours to run). We now hit the problem
again, no error messages from dostats. We did a manual update
statistics (script created by Server Studio), problem remains.

I have done a "set explain on" for the batch program, nothing that
sticks out.

Any ideas are appreciated.

Another try with "dostats" ?

Rebuild indexes ?
_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list

Reply With Quote
  #3  
Old   
Ulf
 
Posts: n/a

Default Re: Batch job slows down - 01-26-2010 , 09:47 AM



Thank you for your answer and the questions, I will try to answer
them:

1 They tables may need a reorg but as a new update statistics solved
the problem (using dostats) I don't think so

2. Dostats itself does not cause the problem, I suspect that the
update statistics done by dostats somehow fails sometimes

3. When we first hit the problem we did a new dostats and then the
problem was solved until we now two months later runs dostats again
and the problem appears again

I know that it is not dostats that causes the problem, I mentioned it
because the type of update statistics it produces is known to a number
of people

Reply With Quote
  #4  
Old   
Lester Knutsen
 
Posts: n/a

Default Re: Batch job slows down - 01-26-2010 , 10:37 AM



Hi,

Another shot in the dark... Maybe the reason dostats is failing and the batch job is taking longer are caused by the same underlying problem. Have you check I/O stats and waits on the SAN? Could there be a slowly degrading disk problem or disk contention?

Regards - Lester

Ulf wrote:
Quote:
Thank you for your answer and the questions, I will try to answer
them:

1 They tables may need a reorg but as a new update statistics solved
the problem (using dostats) I don't think so

2. Dostats itself does not cause the problem, I suspect that the
update statistics done by dostats somehow fails sometimes

3. When we first hit the problem we did a new dostats and then the
problem was solved until we now two months later runs dostats again
and the problem appears again

I know that it is not dostats that causes the problem, I mentioned it
because the type of update statistics it produces is known to a number
of people

_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list


--
__________________________________________________ ____________________
Lester Knutsen lester (AT) advancedatatools (DOT) com
Advanced DataTools Corporation Voice: 703-256-0267 x102
Visit our Web page: http://www.advancedatatools.com
__________________________________________________ ____________________

Reply With Quote
  #5  
Old   
Ulf
 
Posts: n/a

Default Re: Batch job slows down - 01-26-2010 , 11:05 AM



We have are looking at the disksystem and SAN , we have tested it
using 'dd' and it seems OK. EMC will have a look as well

Reply With Quote
  #6  
Old   
Art Kagel
 
Posts: n/a

Default Re: Batch job slows down - 01-26-2010 , 04:33 PM



See my notes below:

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
IIUG Board of Directors (art (AT) iiug (DOT) org)

See you at the 2010 IIUG Informix Conference
April 25-28, 2010
Overland Park (Kansas City), KS
www.iiug.org/conf

Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Advanced DataTools, the IIUG, nor any other
organization with which I am associated either explicitly, implicitly, or by
inference. Neither do those opinions reflect those of other individuals
affiliated with any entity with which I am affiliated nor those of the
entities themselves.



On Tue, Jan 26, 2010 at 9:47 AM, Ulf <ulf.akerberg (AT) gmail (DOT) com> wrote:

Quote:

Thank you for your answer and the questions, I will try to answer
them:

1 They tables may need a reorg but as a new update statistics solved
the problem (using dostats) I don't think so

OK, so by 'new update statistics' and 'new dostats' you mean rerunning the
update statistics commands or rerunning dostats fixed the problem. How
often do you normally run dostats/update statistics?

Quote:
2. Dostats itself does not cause the problem, I suspect that the
update statistics done by dostats somehow fails sometimes

If the dostats run was failing in any way it would be noted in the output.
If you are capturing stderr you would see any error messages. Dostats
checks for, traps, and reports any errors it encounters whether while
gathering intelligence or actually running the update statistics commands.


Quote:
3. When we first hit the problem we did a new dostats and then the
problem was solved until we now two months later runs dostats again
and the problem appears again

If the problem recurs, run an dbschema -d <database> -hd all and redirect it
to a file. Then run the dostats (or other equivalent update stats run) and
if all is well, run the dbschema again as above to another file and compare
the two files. That may tell you what's been happening. If you can't
figure it out from there yourself, feel free to send me the two files (and
indicate what tables are specifically having problems) or call IBM, open a
PMR, and have the case engineer attach the two output files to the case to
see what IBM thinks.


Quote:
I know that it is not dostats that causes the problem, I mentioned it
because the type of update statistics it produces is known to a number
of people

No harm no foul! Just trying to understand.


Quote:
_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list

Reply With Quote
  #7  
Old   
Fernando Nunes
 
Posts: n/a

Default Re: Batch job slows down - 01-26-2010 , 05:49 PM



Ulf wrote:
Quote:
Hi !

IDS 10.00.FC8 on HP-UX 11.31, EMC SAN

We have been running the same batch job (transactions updating a
number of lagre tables 20 million rows) for a number of years. This
batch usually takes about 1 hour, now it takes up to 5 hours. Number
of transactions are the same, roughly. The batch does a number of
selects and based on them a a number of insert/updates/deletes.

We have not changed the batch program or any settings in IDS.

One idea is that the problem is connected to the statistics for the
tables. We where using "dostats" once a week, we ran into the problem,
did a new "dostats" and the problem vanished. The theory was the that
the update statistics somehow failed sometimes so we stopped using
"dostats".

After some major changes of values for indexed rows we decided to run
dostats again (it takes about 3 hours to run). We now hit the problem
again, no error messages from dostats. We did a manual update
statistics (script created by Server Studio), problem remains.

I have done a "set explain on" for the batch program, nothing that
sticks out.

Any ideas are appreciated.

Another try with "dostats" ?

Rebuild indexes ?
A blind guess. You talked about major changes in indexed values. Check
your btscanner activity and configuration. Maybe your indexes are giving
too much work for the index cleaners. 10.00.FC8 is, to the best of my
knowledge, a good version. previous versions (FC6 and FC7 if I recall
correctly) had serious issues.

Besides that, take a look at any kind of lock waiting issues... Maybe
other jobs cause troubles to that one?...

Regards.

Reply With Quote
  #8  
Old   
Ian Michael Gumby
 
Posts: n/a

Default RE: Batch job slows down - 01-27-2010 , 07:33 AM



As Art says, it would be nice to be able to poke around. So take what youread with a grain of Salt.

OTC is right. Take a look at your batch jobs and see if the indexes match the queries.

I think it was Lester who said to take a look at your san.

Disk failures don't just happen. There are usually signs that a disk is going to fail before they do. (Although I've seen some drive go poof without any signs).
If you've been running this on a SAN for the length of time you said you have, unless you've been replacing drives, you could have some drive issues. Because the SAN uses raid, you may have a problem and not notice it.

Having said all of that, here's a couple of ideas...

First, consider migrating the data to a different part of the SAN. I meancreate a new table space and a new index space on new disks and then rebuild the table.

Second, idea. Drop and rebuild the indexes. If they are not detached, detach them and rebuild them in a different table space.

If the number of rows in the table haven't really changed over the years,meaning you have roughly 20 million rows that are being updated, or replaced, what do you expect to happen when you run update statistics. I meansure it doesn't hurt, but your stats aren't really changing.

I would agree with Lester that it could be the SAN and you have disk issues..
It could also be a hardware issue too.

I'd say drop and rebuild your indexes would be a good place to start.

But hey! What do I know? My head is in the clouds. :-P

-G


Quote:
From: ulf.akerberg (AT) gmail (DOT) com
Subject: Batch job slows down
Date: Tue, 26 Jan 2010 01:49:53 -0800
To: informix-list (AT) iiug (DOT) org

Hi !

IDS 10.00.FC8 on HP-UX 11.31, EMC SAN

We have been running the same batch job (transactions updating a
number of lagre tables 20 million rows) for a number of years. This
batch usually takes about 1 hour, now it takes up to 5 hours. Number
of transactions are the same, roughly. The batch does a number of
selects and based on them a a number of insert/updates/deletes.

We have not changed the batch program or any settings in IDS.

One idea is that the problem is connected to the statistics for the
tables. We where using "dostats" once a week, we ran into the problem,
did a new "dostats" and the problem vanished. The theory was the that
the update statistics somehow failed sometimes so we stopped using
"dostats".

After some major changes of values for indexed rows we decided to run
dostats again (it takes about 3 hours to run). We now hit the problem
again, no error messages from dostats. We did a manual update
statistics (script created by Server Studio), problem remains.

I have done a "set explain on" for the batch program, nothing that
sticks out.

Any ideas are appreciated.

Another try with "dostats" ?

Rebuild indexes ?
_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list
__________________________________________________ _______________
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/196390709/direct/01/

Reply With Quote
  #9  
Old   
Ulf
 
Posts: n/a

Default Re: Batch job slows down - 01-28-2010 , 03:40 AM



Thanks for your input !

Before we first ran into the problem we ran dostats once a week.

There is no error messages from dostats, what I meant by that it fails
somehow is a thought of some internal error in the server that
sometimes casues an incorrect statistics.

We ran a new dostats last night, the problem remains.


There are no other jobs running at the same time.

Regarding the other suggestions we are now doing a number of things:

- a case is opened with EMC to analyze the SAN
- we will set upp a copy of the database on another server and try the
things that you have suggested (this will then also test the idea to
create the tables and indexes on other disks) (reorg of tables,
fragmenting and others.
- we will also trie to analyze each step and SQL statement in the
program to find which one uses most time, the cost according to SQL
EXPLAIN is 10 for the SQL with the highest cost and for most of them
1, so we need to time them instead
- we will also upgrade the version in the new test environment, there
might be some corrections in later versions that helps


One idea that I have is that update statistics indeed does what it is
supposed to but as distribution changed the optimizer used a bad
access path in some cases. We where then lucky when we did a new
update statistics, the distribution at that time caused a faster
access path. Now we are constantly getting a bad access path due to
how distribution looks right now.

I will get back as we proceed

Reply With Quote
  #10  
Old   
Ulf
 
Posts: n/a

Default Re: Batch job slows down - 02-03-2010 , 02:57 AM



Some new information:

- we turned off the btscanner, no difference. It is not a problem in
the btscanner

- reorg of the table where all transactions are loaded and then
deleted from lowered the time per transaction from 0.3 seconds to 0.15
seconds. (The average time before the problems started was 0.03
seconds). Inspired by this a reorg of a number of tables was done,
this has not further reduced the transaction time.

- EMC has checked and found that we do have a problem with the load of
the SAN. We will now schedule the batch in such a way that we know
that there is no other activities and see if the SAN can account for
the rest of the problem

- if this does not explain the problem we will look for other
solutions in a test environment

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.