dbTalk Databases Forums  

[BUGS] My investigations of the postmaster Bus error

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] My investigations of the postmaster Bus error in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Martin Pitt
 
Posts: n/a

Default [BUGS] My investigations of the postmaster Bus error - 10-11-2005 , 02:22 PM







--cvVnyQ+4j833TQvp
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi PostgreSQL developers!

There have already been some reports about the mysterious Bus error
that postmaster dies with on some architectures. Since that bites
pretty hard, I did some investigations and tests on various
architectures with various configurations.

As background, Debian currently builds with gcc 4.0.2 by default, and
I use the latest 7.4.9 and 8.0.4 PostgreSQL versions. The default is
to build with -O2.

Here are the results:

* On i386, PowerPC, AMD 64, S/390, arm, and Alpha all versions work
fine with all tested compiler versions (gcc 3.3.3 and 4.0.2).

* On IA 64, HP PARISC, and sparc postmaster 7.4 and 8.0 fail with a
bus error when ran from initdb. It works fine as soon as I

- build with gcc 3.3 or
- build with -O0 or
- run postmaster through initdb under gdb (grumpf) or
- run postmaster through initdb under strace or
- run postmaster directly (not through initdb).

Yay Heisenbugs. :-/

Also, at least 8.1 on sparc works also well with gcc 4.0 and -O2.

* And then there is MIPS, which really sucks. It constantly crashes
in all configurations I tried it with:

8.0 with gcc-4.0 -O2
8.0 with gcc-4.0 -O0
8.0 with gcc-3.3 -O2
8.0 with gcc-3.3 -O2 and --disable-spinlocks
7.4 with gcc-4.0 -O2 original without any patches
7.4 with gcc-3.3 -O2 with recent MIPS spinlock patch

This also produces an usable backtrace:

Starting program:
/home/mpitt/8.0/postgresql-8.0-8.0.3/debian/tmp/usr/lib/postgresql/8.0/b=
in/postmaster

Program received signal SIGBUS, Bus error.
0x006e4f80 in InitializeGUCOptions () at guc.c:2360
2360 *conf->variable =3D
conf->reset_val;
(gdb) bt
#0 0x006e4f80 in InitializeGUCOptions () at guc.c:2360
#1 0x005c7f68 in PostmasterMain (argc=3D1, argv=3D0x100539e0) at postma=
ster.c:439
#2 0x0056f874 in main (argc=3D1, argv=3D0x100539e0) at main.c:268

Some weeks ago I tracked down the particular variable it fails on
(some float variable; unfortunately I forgot the name, but if it is
important, I can redo the research), but I did not find any
datatype mismatch or similar obvious things.

Does anybody have an idea about these bus errors? Also, if somebody
wants to track down the MIPS bug: I can offer temporary ssh access to
a Debian sid with all required build dependencies, gdb, and the like
for debugging.

Thanks and have a nice day!

Martin

--=20
Martin Pitt http://www.piware.de
Ubuntu Developer http://www.ubuntu.com
Debian Developer http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?

--cvVnyQ+4j833TQvp
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDTA7LDecnbV4Fd/IRApmHAKDAuwn9p0gOrRsIj8WAqkZcHiQUlQCg9j32
Epr/Erbs29Foa+AuUPadqjU=
=T4GZ
-----END PGP SIGNATURE-----

--cvVnyQ+4j833TQvp--

Reply With Quote
  #2  
Old   
Jim C. Nasby
 
Posts: n/a

Default Re: [BUGS] My investigations of the postmaster Bus error - 10-11-2005 , 06:15 PM






gerbil started failing with bus errors some time ago. We were finally
able to 'fix it' by clearing out the CVS checkout, but the first
failure could have been legitimate. See
http://pgbuildfarm.org/cgi-bin/show_...-26%2009:18:41

Hope this helps...

On Tue, Oct 11, 2005 at 09:13:15PM +0200, Martin Pitt wrote:
Quote:
Hi PostgreSQL developers!

There have already been some reports about the mysterious Bus error
that postmaster dies with on some architectures. Since that bites
pretty hard, I did some investigations and tests on various
architectures with various configurations.

As background, Debian currently builds with gcc 4.0.2 by default, and
I use the latest 7.4.9 and 8.0.4 PostgreSQL versions. The default is
to build with -O2.

Here are the results:

* On i386, PowerPC, AMD 64, S/390, arm, and Alpha all versions work
fine with all tested compiler versions (gcc 3.3.3 and 4.0.2).

* On IA 64, HP PARISC, and sparc postmaster 7.4 and 8.0 fail with a
bus error when ran from initdb. It works fine as soon as I

- build with gcc 3.3 or
- build with -O0 or
- run postmaster through initdb under gdb (grumpf) or
- run postmaster through initdb under strace or
- run postmaster directly (not through initdb).

Yay Heisenbugs. :-/

Also, at least 8.1 on sparc works also well with gcc 4.0 and -O2.

* And then there is MIPS, which really sucks. It constantly crashes
in all configurations I tried it with:

8.0 with gcc-4.0 -O2
8.0 with gcc-4.0 -O0
8.0 with gcc-3.3 -O2
8.0 with gcc-3.3 -O2 and --disable-spinlocks
7.4 with gcc-4.0 -O2 original without any patches
7.4 with gcc-3.3 -O2 with recent MIPS spinlock patch

This also produces an usable backtrace:

Starting program:
/home/mpitt/8.0/postgresql-8.0-8.0.3/debian/tmp/usr/lib/postgresql/8.0/bin/postmaster

Program received signal SIGBUS, Bus error.
0x006e4f80 in InitializeGUCOptions () at guc.c:2360
2360 *conf->variable =
conf->reset_val;
(gdb) bt
#0 0x006e4f80 in InitializeGUCOptions () at guc.c:2360
#1 0x005c7f68 in PostmasterMain (argc=1, argv=0x100539e0) at postmaster.c:439
#2 0x0056f874 in main (argc=1, argv=0x100539e0) at main.c:268

Some weeks ago I tracked down the particular variable it fails on
(some float variable; unfortunately I forgot the name, but if it is
important, I can redo the research), but I did not find any
datatype mismatch or similar obvious things.

Does anybody have an idea about these bus errors? Also, if somebody
wants to track down the MIPS bug: I can offer temporary ssh access to
a Debian sid with all required build dependencies, gdb, and the like
for debugging.

Thanks and have a nice day!

Martin

--
Martin Pitt http://www.piware.de
Ubuntu Developer http://www.ubuntu.com
Debian Developer http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?


--
Jim C. Nasby, Sr. Engineering Consultant jnasby (AT) pervasive (DOT) com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Reply With Quote
  #3  
Old   
Alvaro Herrera
 
Posts: n/a

Default Re: [BUGS] My investigations of the postmaster Bus error - 12-22-2005 , 07:56 AM



--BOKacYhQ+x31HxR3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

[Sorry for copying -patches in my last email, I actually meant to send
it to pgsql-bugs]

Alvaro Herrera wrote:

Quote:
I've been playing with the MIPS machine a little and still haven't found
any _obvious_ cause for the problem. However I suspect that it may be
related to unaligned memory access, which _I think_ results in a SIGBUS
on MIPS.
However, this may turn out to be a red herring, because the variables
are allocated in the data segment and not by malloc, so I think it's
pretty hard to believe there's any unaligned acccess. A small program
that simulates what Postgres is doing here is attached, and it doesn't
fail with SIGBUS, which is rather what I'd expect. There may be
something different in the way Postgres does things, but I haven't been
able to find what. Suggestions welcome.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--BOKacYhQ+x31HxR3
Content-Type: text/x-csrc; charset=us-ascii
Content-Disposition: attachment; filename="sigbus.c"

#include <stdio.h>
#include <stdlib.h>

struct config_generic
{
const char *name;
};

struct config_real
{
struct config_generic gen;
double *var;
double reset_val;
};

static double dbl_val = 4.0;

static struct config_real test = {
{ "generic name" },
&dbl_val,
2.0
};

static struct config_generic **guc_variables;

int main(int argc, char **argv)
{
struct config_generic *generic_var;
struct config_real *double_var;

guc_variables = (struct config_generic **) malloc(sizeof(struct config_generic *));
guc_variables[0] = &test.gen;

generic_var = guc_variables[0];
double_var = (struct config_real *) generic_var;

printf("sizeof(double) = %d, sizeof(double *) = %d\n",
sizeof(double),
sizeof(double *));

printf("var is %f\n", *double_var->var);

printf("storing reset_val into var\n");
*double_var->var = double_var->reset_val;

printf("var is %f\n", *double_var->var);

return 0;
}

--BOKacYhQ+x31HxR3
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--BOKacYhQ+x31HxR3--


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.