dbTalk Databases Forums  

Re: [BUGS] BUG #2033: Assertion Failure: File: "procarray.c",

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss Re: [BUGS] BUG #2033: Assertion Failure: File: "procarray.c", in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Joel Stevenson
 
Posts: n/a

Default Re: [BUGS] BUG #2033: Assertion Failure: File: "procarray.c", - 11-09-2005 , 10:36 AM






Quote:
"Joel Stevenson" <joelstevenson (AT) mac (DOT) com> writes:
I'm running into an Assertion failure this morning w/8.1.0. I believe it is
related to using the NOWAIT flag. Here is the log message:

TRAP: FailedAssertion("!(serializable ? !((MyProc->xmin) != ((TransactionId)
0))
: ((MyProc->xmin) != ((TransactionId) 0)))", File: "procarray.c", Line:
492)

With no stack trace, and no information about how to reproduce the
error, I'm afraid this report is just about useless :-(
gdb's 'bt' on one of the core files produces:

#0 0x00138eff in ?? ()
#1 0x0017ec8d in ?? ()
#2 0x00246cd8 in ?? ()
#3 0x00000000 in ?? ()

the backtrace in various other of the 17 core files looks exactly the same.

I am not a C programmer but am happy to try to dig up any information
I can from the core files, but will need instruction from someone
wiser in the ways of gdb than I.

My second email provided more detail about what PG is being used for
and what was happening when the assertion failure occurred. A
further description:

* There is one process which is inserting work units into a work_queue table.
The rate of insertion varies but can become very high.

* There are between zero and 100 other processes which select from the
work_queue table using SELECT ... LIMIT 1 FOR UPDATE NOWAIT. Because of the
where clause conditions and the LIMIT 1, these processes can and do contend
for the same rows in the table. When a row is retrieved it is immediately
updated as 'claimed'.

* Without the NOWAIT flag, the contending processes operate as
expected: either
retrieving a row or waiting on a lock briefly and not retrieving a row
because the first lock resulted in an update causing the match to become
invalid.

* At the time of the assertion failure, there where approximately 20
work_queue
consumer processes running and the work_queue provider process was actively
inserting new data into the table.



Here are some more log lines at the error point:

<snip>
2005-11-09 07:00:06 PST 19681 ERROR: 55P03: could not obtain lock on
row in rel
ation "WORK_QUEUE"
2005-11-09 07:00:06 PST 19681 LOCATION: heap_lock_tuple, heapam.c:2157
2005-11-09 07:00:06 PST 19681 STATEMENT: SELECT ID, UNIT
FROM WORK_QUEUE
WHERE
blah...blah...blah
LIMIT 1
FOR UPDATE
NOWAIT
TRAP: FailedAssertion("!(serializable ? !((MyProc->xmin) !=
((TransactionId) 0))
: ((MyProc->xmin) != ((TransactionId) 0)))", File: "procarray.c", Line: 492)
2005-11-09 07:00:06 PST 14093 LOG: 00000: server process (PID 19681)
was termin
ated by signal 6
2005-11-09 07:00:06 PST 14093 LOCATION: LogChildExit, postmaster.c:2426
2005-11-09 07:00:06 PST 14093 LOG: 00000: terminating any other
active server p
rocesses
2005-11-09 07:00:06 PST 14093 LOCATION: HandleChildCrash, postmaster.c:2307
2005-11-09 07:00:06 PST 20323 WARNING: 57P02: terminating connection
because of
crash of another server process
</snip>

the ERROR: 55P03 seems expected given the description of how NOWAIT operates.

Let me know if I can provide more information.

-Joel

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #2  
Old   
Joel Stevenson
 
Posts: n/a

Default Re: [BUGS] BUG #2033: Assertion Failure: File: "procarray.c", - 11-09-2005 , 12:57 PM






Sorry, I realize this bug is a little lean on details.

PG is set up to serve multiple concurrent processes in a work queue
fashion, therefore these processes are in contention for the same
resources as the queue is basically a FIFO and I'm using SELECT ...
FOR UPDATE NOWAIT in each client to grab the next work segment in
line.

I've been running the setup on 8.1b3 and 8.1RC1 without this
particular Assertion failure, but had not been using the NOWAIT flag
until today. During a period of perhaps 20 processes operating on
the queue it looks like postgres failed this assertion 17 times, the
first coming very shortly after the processes began and the rest
following in quick succession.

I do have the core files if further info is needed from them.

Thanks,
Joel


At 3:19 PM +0000 11/9/05, Joel Stevenson wrote:
Quote:
The following bug has been logged online:

Bug reference: 2033
Logged by: Joel Stevenson
Email address: joelstevenson (AT) mac (DOT) com
PostgreSQL version: 8.1.0
Operating system: RHEL 3 update 6
Description: Assertion Failure: File: "procarray.c", Line: 492
Details:

Hi,

I'm running into an Assertion failure this morning w/8.1.0. I believe it is
related to using the NOWAIT flag. Here is the log message:

TRAP: FailedAssertion("!(serializable ? !((MyProc->xmin) != ((TransactionId)
0))
: ((MyProc->xmin) != ((TransactionId) 0)))", File: "procarray.c", Line:
492)


Postgres was configured using both --enable-debug and --enable-cassert.
Full config options were:

./configure CFLAGS=-O2 -pipe --with-perl --with-openssl
--enable-thread-safety --enable-debug --enable-cassert
--with-includes=/usr/kerberos/include

Some non-default postgresql.conf params:
max_connections = 150
ssl = on
shared_buffers = 4000
work_mem = 102400
maintenance_work_mem = 131072
max_stack_depth = 4096
commit_delay = 100
checkpoint_segments = 5
effective_cache_size = 173015
stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on
autovacuum = on
autovacuum_analyze_scale_factor = 0.1

I've removed the 'NOWAIT' for the time being, but thought I should mention
the issue.

Thanks,
Joel

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Reply With Quote
  #3  
Old   
Jim C. Nasby
 
Posts: n/a

Default Re: [BUGS] BUG #2033: Assertion Failure: File: "procarray.c", - 11-09-2005 , 01:02 PM



A backtrace from one of the cores would be a good start.

On Wed, Nov 09, 2005 at 07:44:01AM -0800, Joel Stevenson wrote:
Quote:
Sorry, I realize this bug is a little lean on details.

PG is set up to serve multiple concurrent processes in a work queue
fashion, therefore these processes are in contention for the same
resources as the queue is basically a FIFO and I'm using SELECT ...
FOR UPDATE NOWAIT in each client to grab the next work segment in
line.

I've been running the setup on 8.1b3 and 8.1RC1 without this
particular Assertion failure, but had not been using the NOWAIT flag
until today. During a period of perhaps 20 processes operating on
the queue it looks like postgres failed this assertion 17 times, the
first coming very shortly after the processes began and the rest
following in quick succession.

I do have the core files if further info is needed from them.

Thanks,
Joel


At 3:19 PM +0000 11/9/05, Joel Stevenson wrote:
The following bug has been logged online:

Bug reference: 2033
Logged by: Joel Stevenson
Email address: joelstevenson (AT) mac (DOT) com
PostgreSQL version: 8.1.0
Operating system: RHEL 3 update 6
Description: Assertion Failure: File: "procarray.c", Line: 492
Details:

Hi,

I'm running into an Assertion failure this morning w/8.1.0. I believe it
is
related to using the NOWAIT flag. Here is the log message:

TRAP: FailedAssertion("!(serializable ? !((MyProc->xmin) !=
((TransactionId)
0))
: ((MyProc->xmin) != ((TransactionId) 0)))", File: "procarray.c", Line:
492)


Postgres was configured using both --enable-debug and --enable-cassert.
Full config options were:

./configure CFLAGS=-O2 -pipe --with-perl --with-openssl
--enable-thread-safety --enable-debug --enable-cassert
--with-includes=/usr/kerberos/include

Some non-default postgresql.conf params:
max_connections = 150
ssl = on
shared_buffers = 4000
work_mem = 102400
maintenance_work_mem = 131072
max_stack_depth = 4096
commit_delay = 100
checkpoint_segments = 5
effective_cache_size = 173015
stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on
autovacuum = on
autovacuum_analyze_scale_factor = 0.1

I've removed the 'NOWAIT' for the time being, but thought I should mention
the issue.

Thanks,
Joel

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Jim C. Nasby, Sr. Engineering Consultant jnasby (AT) pervasive (DOT) com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org


Reply With Quote
  #4  
Old   
Joel Stevenson
 
Posts: n/a

Default Re: [BUGS] BUG #2033: Assertion Failure: File: "procarray.c", - 11-09-2005 , 01:16 PM



At 11:42 AM -0500 11/9/05, Tom Lane wrote:
Quote:
Joel Stevenson <joelstevenson (AT) mac (DOT) com> writes:
gdb's 'bt' on one of the core files produces:

#0 0x00138eff in ?? ()
#1 0x0017ec8d in ?? ()
#2 0x00246cd8 in ?? ()
#3 0x00000000 in ?? ()

This looks like you have a "stripped" executable, which is a bit odd
considering you said that you built with --enable-debug. Please
check that the installed executable is what you think it is ...
Hmmm, I've moved all previous (beta, RC) PG base directories and
re-installed 8.1.0 again with --enable-debug and --enable-cassert.
During the make process I saw the cc lines floating past with the -g
option.

The commands were:

$ ./configure --with-perl --with-openssl --enable-thread-safety
--enable-debug --enable-cassert --with-includes=/usr/kerberos/include
$ make -j4
$ sudo make install
$ cp /usr/local/pgsql.old/data /usr/local/pgsql
$ sudo su postgres
$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data start
$ pkill -ABRT postmaster

This produced a bunch of core files that show the following backtrace:

#0 0x001ea038 in ?? ()
#1 0xbfffa4d8 in ?? ()
#2 0xbfffa5e0 in ?? ()
#3 0xbfffa560 in ?? ()
#4 0x08180844 in ?? ()
#5 0x00000005 in ?? ()
#6 0xbfffa4e0 in ?? ()
#7 0x00000000 in ?? ()

I'm at a loss as to what to do about it, really; is there a hidden
configure flag or something that could be in my environment that's
causing the executable to be stripped? I see various strip related
things within the Makefile, but I definitely didn't make the
install-strip target.

Quote:
Is there any chance of putting together a self-contained test case
that other people could try?
The simple test case so far is not causing the assertion failure,
perhaps because it is just a simple approximation of the case. At
this point I'm more concerned about getting a non-stripped PG
installed so that should this error or another occur the core file
will have perhaps helpful information.

Thanks in advance for any pointers,

Joel

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


Reply With Quote
  #5  
Old   
Joel Stevenson
 
Posts: n/a

Default Re: [BUGS] BUG #2033: Assertion Failure: File: "procarray.c", - 11-09-2005 , 02:06 PM



Thanks Tom, and apologies for the confusion; not having worked w/gdb
before I was invoking it incorrectly. Here's what the core file
produced from the assertion failure has for a backtrace:

$ gdb ./bin/postgres data/core.20633
(gdb) bt
#0 0x00138eff in raise () from /lib/tls/libc.so.6
#1 0x0013a705 in abort () from /lib/tls/libc.so.6
#2 0x0820dc66 in ExceptionalCondition (
conditionName=0x6 <Address 0x6 out of bounds>,
errorType=0x5099 <Address 0x5099 out of bounds>, fileName=0x0,
lineNumber=492)
at assert.c:51
#3 0x081987bd in GetSnapshotData (snapshot=0x8303f04, serializable=0 '\0')
at procarray.c:514
#4 0x082293df in GetTransactionSnapshot () at tqual.c:1265
#5 0x081a8eb8 in PortalStart (portal=0x9b09a24, params=0x9b1c5dc,
snapshot=0x0)
at pquery.c:375
#6 0x081a6286 in exec_bind_message (input_message=0xbfffa2e0) at
postgres.c:1643
#7 0x081a7fb2 in PostgresMain (argc=4, argv=0x9aaa50c,
username=0x9aaa4e4 "joels")
at postgres.c:3205
#8 0x08182a45 in BackendRun (port=0x9ac8d58) at postmaster.c:2854
#9 0x08182529 in BackendStartup (port=0x9ac8d58) at postmaster.c:2498
#10 0x08180a57 in ServerLoop () at postmaster.c:1231
#11 0x0817fee5 in PostmasterMain (argc=3, argv=0x9aa8478) at postmaster.c:943
#12 0x08147a37 in main (argc=3, argv=0x9aa8478) at main.c:256


-Joel

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly

Reply With Quote
  #6  
Old   
Joel Stevenson
 
Posts: n/a

Default Re: [BUGS] BUG #2033: Assertion Failure: File: "procarray.c", - 11-09-2005 , 02:31 PM



At 3:24 PM -0500 11/9/05, Tom Lane wrote:
Quote:
Joel Stevenson <joelstevenson (AT) mac (DOT) com> writes:
Thanks Tom, and apologies for the confusion; not having worked w/gdb
before I was invoking it incorrectly. Here's what the core file
produced from the assertion failure has for a backtrace:

OK, now that that's straightened out, there are some more things to
look at in the core file. Please try these gdb commands:

p *MyProc
p *SerializableSnapshot
p *CurrentTransactionState

regards, tom lane
Ok, here are the results:

(gdb) p *MyProc
$1 = {links = {prev = 4294967295, next = 4294967295}, sem = {semId = 8880136,
semNum = 10}, waitStatus = 0, xid = 0, xmin = 0, pid = 20633,
databaseId = 16385, roleId = 16384, lwWaiting = 0 '\0', lwExclusive
= 1 '\001',
lwWaitLink = 0x0, waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0,
heldLocks = 0, procLocks = {prev = 34618656, next = 34618656}, subxids = {
overflowed = 0 '\0', nxids = 0, xids = {0 <repeats 64 times>}}}

(gdb) p *SerializableSnapshot
$2 = {xmin = 1068152, xmax = 1068155, xcnt = 2, xip = 0x9b142b8, curcid = 0}

(gdb) p *CurrentTransactionState
$3 = {transactionId = 1068154, subTransactionId = 1, name = 0x0,
savepointLevel = 0,
state = TRANS_ABORT, blockState = TBLOCK_ABORT, nestingLevel = 1,
curTransactionContext = 0x9b06a9c, curTransactionOwner = 0x9ab0294,
childXids = 0x0, currentUser = 0, prevXactReadOnly = 0 '\0', parent = 0x0}

-Joel

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #7  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2033: Assertion Failure: File: "procarray.c", - 11-09-2005 , 03:11 PM



Joel Stevenson <joelstevenson (AT) mac (DOT) com> writes:
Quote:
I've been running the setup on 8.1b3 and 8.1RC1 without this
particular Assertion failure, but had not been using the NOWAIT flag
until today. During a period of perhaps 20 processes operating on
the queue it looks like postgres failed this assertion 17 times, the
first coming very shortly after the processes began and the rest
following in quick succession.
BTW, I imagine that the apparent correlation to NOWAIT is occurring
because your client-side code was not checking to see if the NOWAIT
query had failed before trying to launch another query in the same
transaction.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


Reply With Quote
  #8  
Old   
Joel Stevenson
 
Posts: n/a

Default Re: [BUGS] BUG #2033: Assertion Failure: File: "procarray.c", - 11-09-2005 , 03:41 PM



At 4:09 PM -0500 11/9/05, Tom Lane wrote:
Quote:
Joel Stevenson <joelstevenson (AT) mac (DOT) com> writes:
I've been running the setup on 8.1b3 and 8.1RC1 without this
particular Assertion failure, but had not been using the NOWAIT flag
until today. During a period of perhaps 20 processes operating on
the queue it looks like postgres failed this assertion 17 times, the
first coming very shortly after the processes began and the rest
following in quick succession.

BTW, I imagine that the apparent correlation to NOWAIT is occurring
because your client-side code was not checking to see if the NOWAIT
query had failed before trying to launch another query in the same
transaction.
That is indeed what the code was doing. I'd added NOWAIT to the
SELECT statement and was catching the error that was being thrown
from the NOWAIT behavior, but didn't think that the error would
invalidate the current transaction (which was not a prepared
transaction.) The fact that it does kill the transaction makes
perfect sense. I'd been thinking of NOWAIT all wrong: as a "return
on block", but it's an "error on block".

Thanks again, Tom.

-Joel

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.