dbTalk Databases Forums  

Statistics during a complex load

comp.databases comp.databases


Discuss Statistics during a complex load in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Tom Anderson
 
Posts: n/a

Default Statistics during a complex load - 02-16-2011 , 01:24 PM






Hello,

We have an application that stores all its information in a database
(Oracle 11, as it happens). Between the application and the database sits
a third-party object-relational mapper. And not just any sort of
object-relational mapper, but one that implements a versioned object model
- think of it as source control for objects. A fully operational instance
of the application has about 300 000 objects in the database (according to
one way of counting them, at least), corresponding to a somewhat larger
number of rows (less than a million, i'd guess), spread over a few hundred
tables.

We like building fresh instances from scratch. We do it as part of our
build process, along with compiling the code and so on, to make sure that
we can always build a working system from the raw materials in CVS. This
process involves clearing out everything in the schema (dropping every
object and then purging the recyclebin), running all our DDL, and then
loading the 300 000 objects through the versioned mapper.

This process is not as fast as we'd like it to be - the DDL is fast, but
the data loading takes something like 45 minutes.

Let's say, for the sake of argument if nothing else, that we cannot
abandon building from scratch, or avoid the versioned mapper, or do
anything about the speed or behaviour of the mapper itself.

What could we try to make the load faster?

I don't have detailed statistics to hand, but one thing we've noticed is
that there is a very high ratio of selects to upserts during the load;
there are numerous (>10) queries made concerning each object before it is
inserted - i think the mapper is checking to see if there is an existing
version of the object, then inserting the loaded version, performing some
sort of merge or check-in operation, etc.

We're in the process of tuning our indexes to make them more useful during
the load, but a major concern is that the query planner is not making use
of them - we quite often see that the most time-consuming queries are ones
which should be able to make good use of an index, but are being planned
as table scans. Our theory is that this is because the statistics for the
tables have not been updated since they were created, and so it looks like
they're empty, even as they grow to some vast size. Does that seem
plausible? If so, what can we do about it? We could do a bit of bodging to
run dbms_stats.gather_table_stats periodically during the load; would that
be sensible? Is there some way we can arrange for statistics updates to be
triggered automatically as the tables grow? A colleague swears that
there's some way to tell the planner not to trust the statistics, so it
will make plans using indexes even when it doesn't know that's a good
idea; can anyone shed any light on that?

Thanks,
tom

--
Every moloch that tends the great Machine down here in the darkness of the
Lower Shafts has a number. If a moloch is destroyed or decommissioned,
his number is given to another who is sent down from Above to take his
empty place. This is the normal procedure.

Reply With Quote
  #2  
Old   
Robert Klemme
 
Posts: n/a

Default Re: Statistics during a complex load - 02-18-2011 , 02:07 PM






On 16.02.2011 20:24, Tom Anderson wrote:

Quote:
We have an application that stores all its information in a database
(Oracle 11, as it happens). Between the application and the database
sits a third-party object-relational mapper. And not just any sort of
object-relational mapper, but one that implements a versioned object
model - think of it as source control for objects. A fully operational
instance of the application has about 300 000 objects in the database
(according to one way of counting them, at least), corresponding to a
somewhat larger number of rows (less than a million, i'd guess), spread
over a few hundred tables.

We like building fresh instances from scratch. We do it as part of our
build process, along with compiling the code and so on, to make sure
that we can always build a working system from the raw materials in CVS.
This process involves clearing out everything in the schema (dropping
every object and then purging the recyclebin), running all our DDL, and
then loading the 300 000 objects through the versioned mapper.

This process is not as fast as we'd like it to be - the DDL is fast, but
the data loading takes something like 45 minutes.

Let's say, for the sake of argument if nothing else, that we cannot
abandon building from scratch, or avoid the versioned mapper, or do
anything about the speed or behaviour of the mapper itself.
Well, OR-Mappers are typically not very fast since they tend to handle
every single instance individually unless there is an explicit batch mode.

Quote:
What could we try to make the load faster?
If your OR-Mapper has a means of verifying a database's data it may be
quicker to data pump the data into the database and then check but I
guess this is not what you want.

Another variant may be to split up your dataset into independent parts
and do multiple concurrent loads.

Quote:
I don't have detailed statistics to hand, but one thing we've noticed is
that there is a very high ratio of selects to upserts during the load;
there are numerous (>10) queries made concerning each object before it
is inserted - i think the mapper is checking to see if there is an
existing version of the object, then inserting the loaded version,
performing some sort of merge or check-in operation, etc.

We're in the process of tuning our indexes to make them more useful
during the load, but a major concern is that the query planner is not
making use of them - we quite often see that the most time-consuming
queries are ones which should be able to make good use of an index, but
are being planned as table scans. Our theory is that this is because the
statistics for the tables have not been updated since they were created,
and so it looks like they're empty, even as they grow to some vast size.
Does that seem plausible?
Yes.

Quote:
If so, what can we do about it? We could do a
bit of bodging to run dbms_stats.gather_table_stats periodically during
the load; would that be sensible?
I'd rather take stats after the load, save them somewhere and load them
back after you have recreated your schema. Another option might be to
use 11g's new features for plan stability but I can't help you there.
There are also some pre 11g features like "stored outlines" which may help.

But maybe it's first reasonable to do a trace in order to find out why
the load is slow. If you determine that there are some slow queries
which use bad plans you need to tune them. If you find out it's the
massive amount of SQL statements fired against the DB the you need to
tune that (e.g. faster network, more CPU power on the server...).

Quote:
Is there some way we can arrange for
statistics updates to be triggered automatically as the tables grow? A
colleague swears that there's some way to tell the planner not to trust
the statistics, so it will make plans using indexes even when it doesn't
know that's a good idea; can anyone shed any light on that?
I believe he means you should switch to the RBO (rule based optimizer) -
which is ancient. IIRC you can enable it per session.

Docs are on

http://tahiti.oracle.com/

Kind regards

robert

F'up to comp.databases.oracle.server

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Reply With Quote
  #3  
Old   
Jasen Betts
 
Posts: n/a

Default Re: Statistics during a complex load - 02-19-2011 , 02:32 AM



On 2011-02-16, Tom Anderson <twic (AT) urchin (DOT) earth.li> wrote:

Quote:
What could we try to make the load faster?

I don't have detailed statistics to hand, but one thing we've noticed is
that there is a very high ratio of selects to upserts during the load;
If the OS also shows more physical disk reads than writes, add RAM
if it doesn't those reads aren't costing much.

[planner woes]
Quote:
Our theory is that this is because the statistics for the
tables have not been updated since they were created, and so it looks like
they're empty, even as they grow to some vast size. Does that seem
plausible?
sounds plausible, but I don't know Oracle.

Quote:
A colleague swears that there's some way to tell the planner not to
trust the statistics, so it will make plans using indexes even when
it doesn't know that's a good idea; can anyone shed any light on that?
Seeing as you can't rewrite the queries you can't use planner hints.
but there may be a anaylsis or planner setting you can change.

--
⚂⚃ 100% natural

Reply With Quote
  #4  
Old   
Tom Anderson
 
Posts: n/a

Default Re: Statistics during a complex load - 02-20-2011 , 07:44 AM



[fu'd to comp.databases.oracle.server]

On Sat, 19 Feb 2011, Jasen Betts wrote:

Quote:
On 2011-02-16, Tom Anderson <twic (AT) urchin (DOT) earth.li> wrote:

What could we try to make the load faster?

I don't have detailed statistics to hand, but one thing we've noticed is
that there is a very high ratio of selects to upserts during the load;

If the OS also shows more physical disk reads than writes, add RAM
if it doesn't those reads aren't costing much.
Good point. The numbers we've been looking at are mostly from the 'most
expensive queries' report in the management tool. I don't think we've made
a global survey of reads vs writes. We should do that.

tom

--
Tech - No Babble

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.