dbTalk Databases Forums  

[BUGS] BUG #6200: standby bad memory allocations on SELECT

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] BUG #6200: standby bad memory allocations on SELECT in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #21  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT - 01-31-2012 , 03:25 PM






Robert Haas <robertmhaas (AT) gmail (DOT) com> writes:
Quote:
On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us> wrote:
BTW, after a bit more reflection it occurs to me that it's not so much
that the data is necessarily *bad*, as that it seemingly doesn't match
the tuple descriptor that the backend's trying to interpret it with.

Hmm. Could this be caused by the recovery process failing to obtain a
sufficiently strong lock on a buffer before replaying some WAL record?
Well, I was kinda speculating that inadequate locking could result in
use of a stale (or too-new?) tuple descriptor, and that would be as good
a candidate as any if the basic theory were right. But Bridget says
they are not doing any DDL, so it's hard to see how there'd be any tuple
descriptor mismatch at all. Still baffled ...

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #22  
Old   
Bridget Frey
 
Posts: n/a

Default Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT - 01-31-2012 , 07:36 PM






So here's a better stack trace for the segfault issue (again, just to
summarize, since this is a long thread, we're seeing two issues: 1) alloc
errors that do not crash the DB (although we modified postgres to panic
when this happens in our test environment, and posted a stack earlier) 2) a
postgres segfault that happens once every couple of days on our slaves.
We're still not sure if these are the same issue or not. This stack is not
perfect because it still has some things optimized out (this came from our
production database), but it's much more detailed than the last one we
posted for a segfault... hope this helps get closer to an answer on
this... I'd also be interested in knowing if the postgres experts thing
these two symptoms are likely related... or totally separate issues...

Thanks!
-B

#0 0x0000000000455dc1 in slot_deform_tuple (slot=0x53cfc20, natts=70) at
heaptuple.c:1090
1090 off = att_align_pointer(off, thisatt->attalign, -1,
(gdb) bt
#0 0x0000000000455dc1 in slot_deform_tuple (slot=0x53cfc20, natts=70) at
heaptuple.c:1090
#1 0x0000000000455fbd in slot_getallattrs (slot=0x53cfc20) at
heaptuple.c:1253
#2 0x0000000000458ac7 in printtup (slot=0x53cfc20, self=0x534f1e0) at
printtup.c:300
#3 0x000000000055bd69 in ExecutePlan (queryDesc=0x5515978,
direction=<value optimized out>, count=0)
at execMain.c:1464
#4 standard_ExecutorRun (queryDesc=0x5515978, direction=<value optimized
out>, count=0) at execMain.c:313
#5 0x0000000000623594 in PortalRunSelect (portal=0x5394f10, forward=<value
optimized out>, count=0,
dest=0x534f1e0) at pquery.c:943
#6 0x0000000000624ae0 in PortalRun (portal=0x5394f10,
count=9223372036854775807, isTopLevel=1 '\001',
dest=0x534f1e0, altdest=0x534f1e0, completionTag=0x7fff014e0640 "") at
pquery.c:787
#7 0x00000000006220f2 in exec_execute_message (argc=<value optimized out>,
argv=<value optimized out>,
username=<value optimized out>) at postgres.c:1963
#8 PostgresMain (argc=<value optimized out>, argv=<value optimized out>,
username=<value optimized out>)
at postgres.c:3983
#9 0x00000000005e6ba4 in ServerLoop () at postmaster.c:3601
#10 0x00000000005e791c in PostmasterMain (argc=5, argv=0x524cab0) at
postmaster.c:1116
#11 0x000000000058b9ae in in (argc=5, argv=<value optimized out>) at
main.c:199

Reply With Quote
  #23  
Old   
Robert Haas
 
Posts: n/a

Default Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT - 02-01-2012 , 09:38 AM



On Tue, Jan 31, 2012 at 4:25 PM, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us> wrote:
Quote:
Robert Haas <robertmhaas (AT) gmail (DOT) com> writes:
On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us> wrote:
BTW, after a bit more reflection it occurs to me that it's not so much
that the data is necessarily *bad*, as that it seemingly doesn't match
the tuple descriptor that the backend's trying to interpret it with.

Hmm. *Could this be caused by the recovery process failing to obtain a
sufficiently strong lock on a buffer before replaying some WAL record?

Well, I was kinda speculating that inadequate locking could result in
use of a stale (or too-new?) tuple descriptor, and that would be as good
a candidate as any if the basic theory were right. *But Bridget says
they are not doing any DDL, so it's hard to see how there'd be any tuple
descriptor mismatch at all. *Still baffled ...
No, I wasn't thinking about a tuple descriptor mismatch. I was
imagining that the page contents themselves might be in flux while
we're trying to read from it. Off the top of my head I don't see how
that can happen, but it would be awfully interesting to be able to see
which WAL record last touched the relevant heap page, and how long
before the error that happened.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #24  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT - 02-01-2012 , 10:19 AM



Robert Haas <robertmhaas (AT) gmail (DOT) com> writes:
Quote:
No, I wasn't thinking about a tuple descriptor mismatch. I was
imagining that the page contents themselves might be in flux while
we're trying to read from it.
Oh, gotcha. Yes, that's a horribly plausible idea. All it'd take is
one WAL replay routine that hasn't been upgraded to acquire sufficient
buffer locks. Pre-hot-standby, there was no reason for them to be
careful about locking.

On the other hand, if that were the cause, you'd expect the symptoms
to be a bit more variable...

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #25  
Old   
Robert Haas
 
Posts: n/a

Default Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT - 02-01-2012 , 11:58 AM



On Wed, Feb 1, 2012 at 11:19 AM, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us> wrote:
Quote:
Robert Haas <robertmhaas (AT) gmail (DOT) com> writes:
No, I wasn't thinking about a tuple descriptor mismatch. *I was
imagining that the page contents themselves might be in flux while
we're trying to read from it.

Oh, gotcha. *Yes, that's a horribly plausible idea. *All it'd take is
one WAL replay routine that hasn't been upgraded to acquire sufficient
buffer locks. *Pre-hot-standby, there was no reason for them to be
careful about locking.

On the other hand, if that were the cause, you'd expect the symptoms
to be a bit more variable...
Well, OP has two: crash, and invalid memory allocation. Both share
the common thread that they happen while trying to decode a tuple.

It would be nice to get a dump of what PostgreSQL thought the entire
block looked like at the time the crash happened. That information is
presumably already in the core dump, but I'm not sure if there's a
nice way to extract it using gdb.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #26  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT - 02-01-2012 , 03:06 PM



Robert Haas <robertmhaas (AT) gmail (DOT) com> writes:
Quote:
No, I wasn't thinking about a tuple descriptor mismatch. *I was
imagining that the page contents themselves might be in flux while
we're trying to read from it.

It would be nice to get a dump of what PostgreSQL thought the entire
block looked like at the time the crash happened. That information is
presumably already in the core dump, but I'm not sure if there's a
nice way to extract it using gdb.
It probably would be possible to get the page out of the dump, but
I'd be really surprised if that proved much. By the time the
crash-dump-making code gets around to examining the shared memory, the
other process that's hypothetically changing the page will have done its
work and moved on. A crash in process X doesn't freeze execution in
process Y, at least not in any Unixoid system I've ever heard of.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #27  
Old   
Alvaro Herrera
 
Posts: n/a

Default Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT - 02-01-2012 , 07:32 PM



Excerpts from Tom Lane's message of mi=C3=A9 feb 01 18:06:27 -0300 2012:
Quote:
Robert Haas <robertmhaas (AT) gmail (DOT) com> writes:
No, I wasn't thinking about a tuple descriptor mismatch. I was
imagining that the page contents themselves might be in flux while
we're trying to read from it.
=20
It would be nice to get a dump of what PostgreSQL thought the entire
block looked like at the time the crash happened. That information is
presumably already in the core dump, but I'm not sure if there's a
nice way to extract it using gdb.
=20
It probably would be possible to get the page out of the dump, but
I'd be really surprised if that proved much. By the time the
crash-dump-making code gets around to examining the shared memory, the
other process that's hypothetically changing the page will have done its
work and moved on. A crash in process X doesn't freeze execution in
process Y, at least not in any Unixoid system I've ever heard of.
Maybe you can do something like send SIGSTOP to every other backend,
then attach to them and find which one was touching the same buffer,
then peek at what it was doing.

--=20
=C3=81lvaro Herrera <alvherre (AT) commandprompt (DOT) com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--=20
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #28  
Old   
Duncan Rance
 
Posts: n/a

Default Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT - 02-02-2012 , 03:23 AM



I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the last reproduction of the problem I saw this:

Client 2 aborted in state 0: ERROR: invalid memory alloc request size 18446744073709551613

So like Tom said, these two issues could well be related. I just wanted to mention it here in this thread, FYI.


--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #29  
Old   
Duncan Rance
 
Posts: n/a

Default Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT - 02-02-2012 , 04:05 AM



I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the last reproduction of the problem I saw this:

Client 2 aborted in state 0: ERROR: invalid memory alloc request size 18446744073709551613

So like Tom said, these two issues could well be related. I just wanted to mention it here in this thread, FYI.

--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.