dbTalk Databases Forums  

Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp table

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp table in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp table - 12-18-2003 , 09:45 PM






"PostgreSQL Bugs List" <pgsql-bugs (AT) postgresql (DOT) org> writes:
Quote:
I tried to create a temp table and got my back-end restarting because of a signal 11.
Hmm. Can you get a debugger backtrace from the core dump?

Quote:
It seems to me, and please correct me if I’m wrong, that there is a limit to the size that a join can handle.
No (and certainly not on a measly 3-million-row case). This could be a
data corruption problem, or something more subtle, but it's not that.

One way of testing the data-corruption theory is to see if you can
select all the data from the tables involved, without any join.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp table - 12-20-2003 , 11:31 AM






"aarjan langereis" <a.j.langereis (AT) inter (DOT) nl.net> writes:
Quote:
How do I get a "debugger backtrace" ?
Find the "core" file left by the crashed backend --- it should be in
$PGDATA/base/yourdbnumber/core and have a file date equal to the time
of the crash. If you don't find one, it's likely that the postmaster
was started under "ulimit -c 0" which prevents core dumps. Restart
the postmaster under "ulimit -c unlimited" and reproduce the crash.

Once you have the core file, do
$ gdb /path/to/postgres-executable /path/to/core-file
gdb> bt
gdb> quit

If bt just produces a list of numbers without any names, it's not going
to be helpful. In that case you need to rebuild Postgres with debugging
symbols and start over.

There is more info in the archives.


Quote:
Selecting all data from the tables involved, does that also include a 'coun=
t(*)', if so, they work:
Mmm, that really only proves that the page headers and tuple headers are
OK, not that there is not data corruption within some row, because
COUNT(*) won't try to extract any field values from any rows. I'd
suggest a SELECT * or COPY TO FILE operation to check whether there is
any data corruption.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo (AT) postgresql (DOT) org


Reply With Quote
  #3  
Old   
aarjan langereis
 
Posts: n/a

Default Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp table - 12-21-2003 , 10:16 AM



I'm using a RedHat machine. In my /etc/init.d/postgresql is this the line
that statsup the postmaster:

su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA -p
/usr/bin/postmaster -o '-p ${PGPORT}' start > /var/log/pgsql.log 2>&1" <
/dev/null

I don't see that "ulimit -c 0".. nowhere in the script. Where do I have to
put the "ulimit -c unlimited"?
Because he is not dumping any core now...

In my first mail I spoke about 2 query's that where rather the same, both
use the 'blocks'-table (3.2M records) and joint it with another one.
One join was with the 'cpus'-table (17 records), this one worked perfect.
The other join was with the 'hosts'-table (205 records), the problem query.
From that and a full look het my 'hosts'-table I can conclude that the data
in the 'blocks'-table is not currupt..

Ok, id some other tests too:

Select * from blocks; gave me the whole table (I didn't look at all records,
but got a result in psql)
select hostid, sum(amount) from blocks group by hostid; crashed (3,2M
records used)
select hostid, sum(amount) from blocks where blockdate::date between
'2003-01-01' and '2003-02-01' group by hostid; worked (also for all other
months!!). (200K- 450K records used)
select hostid, sum(amount) from blocks where blockdate::date between
'2003-01-01' and '2003-07-01' group by hostid; crashed (1390618 records
used)
select hostid, sum(amount) from blocks where blockdate::date between
'2003-02-01' and '2003-07-01' group by hostid; worked (1202952 records used)

To me it seems to be the size of it all...

Yours,

Aarjan

----- Original Message -----
From: "Tom Lane" <tgl (AT) sss (DOT) pgh.pa.us>
To: "aarjan langereis" <a.j.langereis (AT) inter (DOT) nl.net>
Cc: <pgsql-bugs (AT) postgresql (DOT) org>
Sent: Saturday, December 20, 2003 6:27 PM
Subject: Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp
table


Quote:
"aarjan langereis" <a.j.langereis (AT) inter (DOT) nl.net> writes:
How do I get a "debugger backtrace" ?

Find the "core" file left by the crashed backend --- it should be in
$PGDATA/base/yourdbnumber/core and have a file date equal to the time
of the crash. If you don't find one, it's likely that the postmaster
was started under "ulimit -c 0" which prevents core dumps. Restart
the postmaster under "ulimit -c unlimited" and reproduce the crash.

Once you have the core file, do
$ gdb /path/to/postgres-executable /path/to/core-file
gdb> bt
gdb> quit

If bt just produces a list of numbers without any names, it's not going
to be helpful. In that case you need to rebuild Postgres with debugging
symbols and start over.

There is more info in the archives.


Selecting all data from the tables involved, does that also include a
'coun=
t(*)', if so, they work:

Mmm, that really only proves that the page headers and tuple headers are
OK, not that there is not data corruption within some row, because
COUNT(*) won't try to extract any field values from any rows. I'd
suggest a SELECT * or COPY TO FILE operation to check whether there is
any data corruption.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo (AT) postgresql (DOT) org



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #4  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp table - 12-21-2003 , 11:44 AM



"aarjan langereis" <a.j.langereis (AT) chello (DOT) nl> writes:
Quote:
I don't see that "ulimit -c 0".. nowhere in the script. Where do I have to
put the "ulimit -c unlimited"?
I'd suggest adding it to the pg_ctl script.

Quote:
Select * from blocks; gave me the whole table (I didn't look at all records,
but got a result in psql)
select hostid, sum(amount) from blocks group by hostid; crashed (3,2M
records used)
Hm. What do you have sort_mem set to? Does the EXPLAIN output show
different plans for these queries?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match


Reply With Quote
  #5  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp table - 12-24-2003 , 09:19 AM



"aarjan langereis" <a.j.langereis (AT) chello (DOT) nl> writes:
Quote:
I reproduced the crash and got a good core-file (with the symbols, see
below). The only thing is: it's 390Mb .. even after gzip it's big: 250Mb.
Would that be a problem for you? (I have the bandwith if you want)
No point in sending it to me, it's of no value on a different machine
(because the executable files are usually not identical).

Quote:
(gdb) bt
#0 0x0819b684 in LogicalTapeWrite ()
#1 0x0819da13 in ApplySortFunction ()
#2 0x0819cf10 in tuplesort_getdatum ()
#3 0x080ea656 in ExecSort ()
Hm. I'm afraid gdb is lying to you, because this stack trace is
impossible -- those functions don't call each other. I have seen
that happen when the compiler optimization level is too high; gdb's
stack tracing code gets confused. To get any useful information,
you'll need to rebuild with debug symbols enabled (--enable-debug
switch to configure).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Reply With Quote
  #6  
Old   
aarjan langereis
 
Posts: n/a

Default Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp table - 12-25-2003 , 01:07 PM



I've got gdb from the RPM's from redhat 9. Whould that RPM be wrong then?

Or do you mean that I have to recompile the database?

Aarjan

Ps. Merry Christmas!

----- Original Message -----
From: "Tom Lane" <tgl (AT) sss (DOT) pgh.pa.us>
To: "aarjan langereis" <a.j.langereis (AT) chello (DOT) nl>
Cc: <pgsql-bugs (AT) postgresql (DOT) org>
Sent: Wednesday, December 24, 2003 4:15 PM
Subject: Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp
table


Quote:
"aarjan langereis" <a.j.langereis (AT) chello (DOT) nl> writes:
I reproduced the crash and got a good core-file (with the symbols, see
below). The only thing is: it's 390Mb .. even after gzip it's big:
250Mb.
Would that be a problem for you? (I have the bandwith if you want)

No point in sending it to me, it's of no value on a different machine
(because the executable files are usually not identical).

(gdb) bt
#0 0x0819b684 in LogicalTapeWrite ()
#1 0x0819da13 in ApplySortFunction ()
#2 0x0819cf10 in tuplesort_getdatum ()
#3 0x080ea656 in ExecSort ()

Hm. I'm afraid gdb is lying to you, because this stack trace is
impossible -- those functions don't call each other. I have seen
that happen when the compiler optimization level is too high; gdb's
stack tracing code gets confused. To get any useful information,
you'll need to rebuild with debug symbols enabled (--enable-debug
switch to configure).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo (AT) postgresql (DOT) org


Reply With Quote
  #7  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1015: Got a signal 11 while trying to create a temp table - 12-25-2003 , 07:12 PM



"aarjan langereis" <a.j.langereis (AT) chello (DOT) nl> writes:
Quote:
I've got gdb from the RPM's from redhat 9. Whould that RPM be wrong then?
Or do you mean that I have to recompile the database?
You need to recompile Postgres. If you are using an RPM distribution of
Postgres, I think you can get the source RPM (SRPM) and rebuild it with
a debug option, but I don't know the details of that process. If you
built Postgres yourself, then it's just a matter of adding
--enable-debug to your configure switches and rebuilding.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo (AT) postgresql (DOT) org


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.