dbTalk Databases Forums  

[BUGS] BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
John Engelhart
 
Posts: n/a

Default [BUGS] BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES - 03-13-2005 , 08:03 PM







The following bug has been logged online:

Bug reference: 1541
Logged by: John Engelhart
Email address: johne (AT) zang (DOT) com
PostgreSQL version: 8.0.1
Operating system: FreeBSD 5.3-p5
Description: Unusually long INSERT times after fresh clean/CREATE
TABLES
Details:

Recently upgraded from 7.4.7 to 8.0.1 and have discovered an oddity.

Since I'm developing an SQL based application, I routinely "start from
scratch" with a script that deletes all the tables in my database and
rebuilds them. A problem started when I upgraded from 7.4.7 to 8.0.1 in
that the first run after the clean takes an UNUSUALLY long time to complete,
on the order of a few hundred inserts/sec. A normal run takes ~30 seconds.
A from scratch run takes ~15 minutes, with the next run completing in 30
seconds with zero changes.

I believe I've eliminated any host or build anomolies due to a simple fact:
Starting fresh, running the data inserter, and then aborting a few seconds
later... then re-running the data-inserter runs at normal speed. It's as if
it need a few records in the DB in order to work at normal speeds. It
almost seems as if it's doing a worst case INSERT update if there are no
primary keys when the BEGIN transaction starts?

I have also dropped all CREATE INDEX statements from my rebuild script and
it behaves exactly the same. The only other 'unusual' thing remaining would
be the various SERIAL NOT NULL UNIQUE PRIMARY KEY statements.

The specific tables and data are quite lengthy, and not appropriate for
pasting to fully recreate the problem. I can send a tarball if requested.

As an example of the times involved (same average times, always
repeatable):

[johne@new] simp_sql> time simp_sql <test8.cpp
INFO: Opening database connection @136/initialize/main_sql.c
INFO: Begining parse @47/main/main_sql.c
INFO: GC- After parse: Heap 348Kb Free 144Kb @49/main/main_sql.c
INFO: Performing house keeping and index rebuild @58/main/main_sql.c
INFO: Marking all blocks @61/main/main_sql.c
INFO: Performing house keeping and index rebuild @76/main/main_sql.c
INFO: GC- Before exiting: Heap 348Kb Free 320Kb @79/main/main_sql.c
INFO: Closing database connection @95/cleanup/main_sql.c
3.861u 3.138s 17:21.25 0.6% 83+4732k 0+0io 0pf+0w
[johne@new] simp_sql>

[IMMEDIATLY after, no accesses to the database anywhere inbetween]

[johne@new] simp_sql> time simp_sql < test8.cpp
INFO: Opening database connection @136/initialize/main_sql.c
INFO: Begining parse @47/main/main_sql.c
INFO: GC- After parse: Heap 348Kb Free 24Kb @49/main/main_sql.c
INFO: Performing house keeping and index rebuild @58/main/main_sql.c
INFO: Marking all blocks @61/main/main_sql.c
INFO: Performing house keeping and index rebuild @76/main/main_sql.c
INFO: GC- Before exiting: Heap 348Kb Free 324Kb @79/main/main_sql.c
INFO: Closing database connection @95/cleanup/main_sql.c
2.990u 2.200s 0:40.00 12.9% 82+4398k 0+0io 0pf+0w
[johne@new] simp_sql>

FYI, my inserter uses the boehm garbage collection library, so the heap/free
numbers are non deterministic between runs. This should not be a
contributing issue at all. The "performing house cleaning" is a "vacuum
full analyze" command since it's just inserted a large amount data both
times.

All the INSERTS are wrapped in a BEGIN/COMMIT pair (per run, not per
statement).

Now a clean refresh, run the inserter with a ^C break, and immediatly re-run
it:

[johne@new] simp_sql> reload [drops the database, creates it, then recreates
the tables/functions]
[johne@new] simp_sql> time simp_sql < test8.cpp
INFO: Opening database connection @136/initialize/main_sql.c
INFO: Begining parse @47/main/main_sql.c
^C0.094u 0.075s 0:01.04 15.3% 36+361k 0+0io 0pf+0w
[johne@new] simp_sql> time simp_sql < test8.cpp
INFO: Opening database connection @136/initialize/main_sql.c
INFO: Begining parse @47/main/main_sql.c
INFO: GC- After parse: Heap 348Kb Free 136Kb @49/main/main_sql.c
INFO: Performing house keeping and index rebuild @58/main/main_sql.c
INFO: Marking all blocks @61/main/main_sql.c
INFO: Performing house keeping and index rebuild @76/main/main_sql.c
INFO: GC- Before exiting: Heap 348Kb Free 324Kb @79/main/main_sql.c
INFO: Closing database connection @95/cleanup/main_sql.c
2.943u 2.218s 0:33.20 15.5% 90+5051k 0+0io 0pf+0w
[johne@new] simp_sql>

Hopefully this is enough info to track down and recreate the problem
independently. If not, I can send you the program, db dumps, etc to help
you recreate it, it's a bit to big to include here.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES - 03-13-2005 , 11:15 PM






"John Engelhart" <johne (AT) zang (DOT) com> writes:
Quote:
Since I'm developing an SQL based application, I routinely "start from
scratch" with a script that deletes all the tables in my database and
rebuilds them. A problem started when I upgraded from 7.4.7 to 8.0.1 in
that the first run after the clean takes an UNUSUALLY long time to complete,
on the order of a few hundred inserts/sec. A normal run takes ~30 seconds.
A from scratch run takes ~15 minutes, with the next run completing in 30
seconds with zero changes.
The 8.0 planner is intentionally sensitive to the current actual
physical sizes of tables. It sounds like you've managed to get it to
plan something on the assumption that the tables are tiny and keep
using that plan after they aren't tiny any more. The old planner had
the same kind of issue but it was far easier to hit, so "revert that
change" isn't an answer that I'm particularly interested in.

Quote:
Hopefully this is enough info to track down and recreate the problem
Not really.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #3  
Old   
John Engelhart
 
Posts: n/a

Default Re: [BUGS] BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES - 03-17-2005 , 11:22 PM




On Mar 14, 2005, at 12:13 AM, Tom Lane wrote:

Quote:
"John Engelhart" <johne (AT) zang (DOT) com> writes:
Since I'm developing an SQL based application, I routinely "start from
scratch" with a script that deletes all the tables in my database and
rebuilds them. A problem started when I upgraded from 7.4.7 to 8.0.1
in
that the first run after the clean takes an UNUSUALLY long time to
complete,
on the order of a few hundred inserts/sec. A normal run takes ~30
seconds.
A from scratch run takes ~15 minutes, with the next run completing in
30
seconds with zero changes.

The 8.0 planner is intentionally sensitive to the current actual
physical sizes of tables. It sounds like you've managed to get it to
plan something on the assumption that the tables are tiny and keep
using that plan after they aren't tiny any more. The old planner had
the same kind of issue but it was far easier to hit, so "revert that
change" isn't an answer that I'm particularly interested in.
I found another data point yesterday. It seems to be session related.
I tried various combinations of "after X number of statements, COMMIT,
ANALYZE, CHECKPOINT" and none of them helped. The one that did help is
after X number of statements, close the database handle and re-open it.
So, my program run does about 60K SQL statements, inserting about 22K
records. After 500 statements, I close the handle and re-open it.
Completes in 30 seconds.

Quote:
Hopefully this is enough info to track down and recreate the problem

Not really.
Let me know if there's anything that you need.

Quote:
regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #4  
Old   
Andrew - Supernews
 
Posts: n/a

Default Re: [BUGS] BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES - 03-18-2005 , 05:56 PM



On 2005-03-14, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us> wrote:
Quote:
The 8.0 planner is intentionally sensitive to the current actual
physical sizes of tables. It sounds like you've managed to get it to
plan something on the assumption that the tables are tiny and keep
using that plan after they aren't tiny any more. The old planner had
the same kind of issue but it was far easier to hit, so "revert that
change" isn't an answer that I'm particularly interested in.
On the irc channel we were just investigating a very similar case to this
(which will show up as a bug report here in due course, though not from me).

It turns out that the scenario above is trivial to hit in 8.0 using
referential constraints; RI triggers cache their plans, and on 8.0 the RI
query is planned as a seqscan if the tables are freshly created. (On 7.4
the plan is an index scan, thanks to the default 1000 rows / 10 pages stats.)

What this basically means is that you can't do even a modest size insert
of new data into a fresh schema without having to either trick the planner
(e.g. enable_seqscan=false before the first insert) or defer the addition of
constraints until after the data load (which can be quite inconvenient in
some cases).

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


Reply With Quote
  #5  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES - 03-18-2005 , 10:49 PM



Andrew - Supernews <andrew+nonews (AT) supernews (DOT) com> writes:
Quote:
It turns out that the scenario above is trivial to hit in 8.0 using
referential constraints; RI triggers cache their plans, and on 8.0 the RI
query is planned as a seqscan if the tables are freshly created. (On 7.4
the plan is an index scan, thanks to the default 1000 rows / 10 pages stats.)
Hm. One thing we could do is to throw in some default values when we
see the table has exactly zero pages --- perhaps ye olde traditional
1000/10, or possibly something else, but anyway not exactly 0/0.

The reason I thought we didn't need to do this sort of hack anymore
is that pg_dump loads the tables first and then creates the RI
constraints. What exactly is the common case where the wrong thing
happens?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.