dbTalk Databases Forums  

[BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Andy Klosterman
 
Posts: n/a

Default [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl - 02-08-2006 , 06:31 AM







The following bug has been logged online:

Bug reference: 2246
Logged by: Andy Klosterman
Email address: andrew5 (AT) ece (DOT) cmu.edu
PostgreSQL version: 8.1.0
Operating system: Debian testing: Linux nc3 2.4.27-2-386 #1 Wed Nov 30
21:38:51 JST 2005 i686 GNU/Linux
Description: Bad malloc interactions: ecpg, openssl
Details:

Before going into a full description and figuring out some example code for
this situation, I'm fishing for interesting in tracking it down and fixing
it (or not).

On a program that I (pre-)compile with ecpg and connect to a remote Postgres
instance over an SSL connection (as set up in pg_hba.conf with appropriate
certificates installed) my application prematurely terminates with the
following error:
*** glibc detected *** corrupted double-linked list: 0x0807c830 ***
Abort.

(Without an SSL connection (as set in ph_hba.conf) the program executes just
fine. This leads me to cast suspicion on SSL libraries.)

The back trace from gdb looks like this (which doesn't appear to be too
informative, but looks like an exception stack):
#0 0x401bc851 in kill () from /lib/libc.so.6
#1 0x4014a309 in pthread_kill () from /lib/libpthread.so.0
#2 0x4014a6c0 in raise () from /lib/libpthread.so.0
#3 0x401bc606 in raise () from /lib/libc.so.6
#4 0x401bd971 in abort () from /lib/libc.so.6
#5 0x401ef930 in __fsetlocking () from /lib/libc.so.6
#6 0x401f52b9 in malloc_usable_size () from /lib/libc.so.6
#7 0x401f5395 in malloc_usable_size () from /lib/libc.so.6
#8 0x401f5a43 in malloc_trim () from /lib/libc.so.6
#9 0x401f5d51 in free () from /lib/libc.so.6
#10 0x4052ce6c in zcfree () from /usr/lib/libz.so.1
#11 0x4052f83f in inflateEnd () from /usr/lib/libz.so.1
#12 0x4040f262 in COMP_rle () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
#13 0x0807e680 in ?? ()
#14 0x00000000 in ?? ()

After a bit of digging around online, I discovered the MALLOC_CHECK_
environment variable and how it changes the behavior of malloc (man 3
malloc). The above back trace was without MALLOC_CHECK_ in the environment
(e.g., unsetenv MALLOC_CHECK_).

Running with MALLOC_CHECK_ equal to 2 or 1 allows my program to run to
completion.

With MALLOC_CHECK_ set to 0 (which is supposed to ignore corruption), I get
a segfault. Running inside gdb gets me the following back trace:
#0 0x403d6f73 in ASN1_template_free ()
from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#1 0x403d6e0d in ASN1_primitive_free ()
from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#2 0x403d7023 in ASN1_item_free () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
#3 0x403d0c07 in X509_CERT_AUX_free ()
from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#4 0x403d077a in X509_CINF_free () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
#5 0x403d6e35 in ASN1_primitive_free ()
from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#6 0x403d7023 in ASN1_item_free () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
#7 0x403d0927 in X509_free () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
#8 0x402d16f3 in pqsecure_destroy () from /usr/lib/libpq.so.4
#9 0x402c387a in PQconninfoFree () from /usr/lib/libpq.so.4
#10 0x402c39c3 in PQfinish () from /usr/lib/libpq.so.4
#11 0x4002f41b in ECPGget_connection () from /usr/lib/libecpg.so.5
#12 0x40030223 in ECPGdisconnect () from /usr/lib/libecpg.so.5
#13 0x0804a113 in DBDisconnect (arg_connection=0x8054faf
"client_correctness")
at client_test.pgcc:215
#14 0x0804a64e in DoCorrectnessChecks () at client_test.pgcc:278
#15 0x0804aaa1 in main (argc=7, argv=0xbffffa84) at
client_test.pgcc:523

PURE SPECULATION: It looks like there is either trouble in the interaction
between Postgres and the SSL library or just a bit of trouble within the SSL
library.
SPECULATION: Another possibility is that I misunderstand some aspect of
multi-threaded interactions with Postgres (I open uniquely named connections
to the DB for each thread of my test program). Maybe I need to have a
"lock" around the code that makes DB connections and make sure that only one
happens at a time (might be better handled within Postgres/SSL if that is
the case).

PROCEEDING FURTHER: If there is any desire on the part of any developers to
pursue this further, I'm open. As things stand right now, I have
workarounds:
1. Don't use an SSL connection to the DB.
2. Do a "setenv MALLOC_CHECK_ 1" (or 2) and it works.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply With Quote
  #2  
Old   
Alvaro Herrera
 
Posts: n/a

Default Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl - 02-08-2006 , 07:47 AM






Andy Klosterman wrote:

Quote:
Before going into a full description and figuring out some example code for
this situation, I'm fishing for interesting in tracking it down and fixing
it (or not).
Whenever there is a bug that causes a crash, there is interest in
tracking it down and fixing it. Please do provide a test case.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
  #3  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl - 02-08-2006 , 07:10 PM



"Andy Klosterman" <andrew5 (AT) ece (DOT) cmu.edu> writes:
Quote:
SPECULATION: Another possibility is that I misunderstand some aspect of
multi-threaded interactions with Postgres (I open uniquely named connections
to the DB for each thread of my test program). Maybe I need to have a
"lock" around the code that makes DB connections and make sure that only one
happens at a time (might be better handled within Postgres/SSL if that is
the case).
There could be some re-entrancy problem in the SSL connection startup
code --- if you add such a lock, does it get more reliable? Also, did
you remember to build PG with --enable-thread-safety ?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org


Reply With Quote
  #4  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl - 02-13-2006 , 02:16 PM



Andrew Klosterman <andrew5 (AT) ece (DOT) cmu.edu> writes:
Quote:
I threw in a pthread mutex around the code making the database connections
for each of my threads. The problem is still there ("corrupted
double-linked list").

Even tuning things down and instructing my code to only run a single
pthread manifests the problem over an SSL connection.
Hmm. Based on that, the problem is starting to smell more like a
garden-variety memory clobber, for instance malloc'ing a chunk smaller
than the data that's later stuffed into it. It might be worth running
the program under something like ElectricFence, which will catch the
offender on-the-spot rather than only later when corruption of malloc's
private data structures is detected.

Looking back at your original message, I wonder if it could be the
combination of ecpg and SSL that triggers it? I'd have thought that
libpq/SSL alone would be pretty well wrung out, but ecpg is not so
widely used.

BTW, you did say this was i386 right? If it were a 64-bit architecture,
I'd be about ready to bet money on the wrong-malloc-size-calculation
theory.

Quote:
Tracking down exactly what's tickling the problem in this case could be
tricky...
Yeah :-(. If you aren't able to narrow it further by yourself, please
try to put together a self-contained test case.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Reply With Quote
  #5  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl - 02-13-2006 , 03:09 PM



Andrew Klosterman <andrew5 (AT) ece (DOT) cmu.edu> writes:
Quote:
(gdb) bt
#0 0x401c3851 in kill () from /lib/libc.so.6
#1 0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0
#2 0x40139823 in memalign () from /usr/lib/libefence.so.0
#3 0x401399ad in malloc () from /usr/lib/libefence.so.0
#4 0x40139a10 in calloc () from /usr/lib/libefence.so.0
#5 0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.so.3
#6 0x402c8b3f in ?? () from /usr/lib/libpq.so.4
#7 0x402ded88 in ?? () from /usr/lib/libpq.so.4
#8 0x00000000 in ?? ()
Any chance of doing this with debug symbols? libpq does not call
krb5_set_default_tgs_ktypes directly, so I don't think I believe the
above backtrace. gdb is easily misled without debug symbols :-(

I'm not sure if Debian does things the way Red Hat does, but on RH
there are separate "debuginfo" RPMs corresponding to each regular
RPM --- if you install the ones matching your libpq and libkrb5
RPMs you should be able to get better info.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
  #6  
Old   
Stephen Frost
 
Posts: n/a

Default Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl - 02-13-2006 , 03:16 PM




--vrw1DaqqwcH9Ar2H
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

* Tom Lane (tgl (AT) sss (DOT) pgh.pa.us) wrote:
Quote:
Andrew Klosterman <andrew5 (AT) ece (DOT) cmu.edu> writes:
(gdb) bt
#0 0x401c3851 in kill () from /lib/libc.so.6
#1 0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0
#2 0x40139823 in memalign () from /usr/lib/libefence.so.0
#3 0x401399ad in malloc () from /usr/lib/libefence.so.0
#4 0x40139a10 in calloc () from /usr/lib/libefence.so.0
#5 0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.=
so.3
#6 0x402c8b3f in ?? () from /usr/lib/libpq.so.4
#7 0x402ded88 in ?? () from /usr/lib/libpq.so.4
#8 0x00000000 in ?? ()
=20
Any chance of doing this with debug symbols? libpq does not call
krb5_set_default_tgs_ktypes directly, so I don't think I believe the
above backtrace. gdb is easily misled without debug symbols :-(
Hrmpf, I missed this bug-on-Debian report. I'll go check the archive
for the rest.

Quote:
I'm not sure if Debian does things the way Red Hat does, but on RH
there are separate "debuginfo" RPMs corresponding to each regular
RPM --- if you install the ones matching your libpq and libkrb5
RPMs you should be able to get better info.
We do have debugging .debs- for some things. We don't have them for
everything and unfortunately we don't yet have them for Postgres. I'll
talk to Martin about building some though so that in the future it's
easier to debug these problems.

Thanks,

Stephen

--vrw1DaqqwcH9Ar2H
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFD8Pc/rzgMPqB3kigRAtoRAJ0cO0FD3bLZCDtbTzIa5VlQz/nIxwCcCaqG
qgIubAmczUpuH6C6JkwEJvA=
=jaMp
-----END PGP SIGNATURE-----

--vrw1DaqqwcH9Ar2H--


Reply With Quote
  #7  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl - 02-13-2006 , 03:45 PM



Stephen Frost <sfrost (AT) snowman (DOT) net> writes:
Quote:
We do have debugging .debs- for some things. We don't have them for
everything and unfortunately we don't yet have them for Postgres. I'll
talk to Martin about building some though so that in the future it's
easier to debug these problems.
Hmm. Andrew, it seems your choices are to rebuild the relevant
libraries from source, or to concentrate on developing a test case
that other people can try.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Reply With Quote
  #8  
Old   
Andrew Klosterman
 
Posts: n/a

Default Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl - 02-13-2006 , 07:15 PM



On Mon, 13 Feb 2006, Tom Lane wrote:

Quote:
Andrew Klosterman <andrew5 (AT) ece (DOT) cmu.edu> writes:
I threw in a pthread mutex around the code making the database connections
for each of my threads. The problem is still there ("corrupted
double-linked list").

Even tuning things down and instructing my code to only run a single
pthread manifests the problem over an SSL connection.

Hmm. Based on that, the problem is starting to smell more like a
garden-variety memory clobber, for instance malloc'ing a chunk smaller
than the data that's later stuffed into it. It might be worth running
the program under something like ElectricFence, which will catch the
offender on-the-spot rather than only later when corruption of malloc's
private data structures is detected.

Looking back at your original message, I wonder if it could be the
combination of ecpg and SSL that triggers it? I'd have thought that
libpq/SSL alone would be pretty well wrung out, but ecpg is not so
widely used.

BTW, you did say this was i386 right? If it were a 64-bit architecture,
I'd be about ready to bet money on the wrong-malloc-size-calculation
theory.

Tracking down exactly what's tickling the problem in this case could be
tricky...

Yeah :-(. If you aren't able to narrow it further by yourself, please
try to put together a self-contained test case.

regards, tom lane
I just did the "electric fence" thing for you and this is what I get in
gdb...

Electric Fence 2.1 Copyright (C) 1987-1998 Bruce Perens.

ElectricFence Aborting: Allocating 0 bytes, probably a bug.

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 16384 (LWP 24753)]
0x401c3851 in kill () from /lib/libc.so.6
(gdb) bt
#0 0x401c3851 in kill () from /lib/libc.so.6
#1 0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0
#2 0x40139823 in memalign () from /usr/lib/libefence.so.0
#3 0x401399ad in malloc () from /usr/lib/libefence.so.0
#4 0x40139a10 in calloc () from /usr/lib/libefence.so.0
#5 0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.so.3
#6 0x402c8b3f in ?? () from /usr/lib/libpq.so.4
#7 0x402ded88 in ?? () from /usr/lib/libpq.so.4
#8 0x00000000 in ?? ()

Looks like something fishy going on between libpq and libkrb5. I'm
especially suspicious since I'm not using kerberos for authentication at
all.

I am developing on i386 (more or less).
# uname -m
i686

--Andrew J. Klosterman
andrew5 (AT) ece (DOT) cmu.edu


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


Reply With Quote
  #9  
Old   
Andrew Klosterman
 
Posts: n/a

Default Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl - 02-13-2006 , 07:15 PM



On Mon, 13 Feb 2006, Tom Lane wrote:

Quote:
Andrew Klosterman <andrew5 (AT) ece (DOT) cmu.edu> writes:
(gdb) bt
#0 0x401c3851 in kill () from /lib/libc.so.6
#1 0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0
#2 0x40139823 in memalign () from /usr/lib/libefence.so.0
#3 0x401399ad in malloc () from /usr/lib/libefence.so.0
#4 0x40139a10 in calloc () from /usr/lib/libefence.so.0
#5 0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.so.3
#6 0x402c8b3f in ?? () from /usr/lib/libpq.so.4
#7 0x402ded88 in ?? () from /usr/lib/libpq.so.4
#8 0x00000000 in ?? ()

Any chance of doing this with debug symbols? libpq does not call
krb5_set_default_tgs_ktypes directly, so I don't think I believe the
above backtrace. gdb is easily misled without debug symbols :-(

I'm not sure if Debian does things the way Red Hat does, but on RH
there are separate "debuginfo" RPMs corresponding to each regular
RPM --- if you install the ones matching your libpq and libkrb5
RPMs you should be able to get better info.

regards, tom lane
I thought about that and did some quick checks of how to get debug symbols
in libraries on Debian. I didn't come up with anything right away. I'll
poke around and see what I can come up with.

--Andrew J. Klosterman
andrew5 (AT) ece (DOT) cmu.edu

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org


Reply With Quote
  #10  
Old   
Andrew Klosterman
 
Posts: n/a

Default Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl - 02-13-2006 , 07:15 PM



On Wed, 8 Feb 2006, Tom Lane wrote:

Quote:
"Andy Klosterman" <andrew5 (AT) ece (DOT) cmu.edu> writes:
SPECULATION: Another possibility is that I misunderstand some aspect of
multi-threaded interactions with Postgres (I open uniquely named connections
to the DB for each thread of my test program). Maybe I need to have a
"lock" around the code that makes DB connections and make sure that only one
happens at a time (might be better handled within Postgres/SSL if that is
the case).

There could be some re-entrancy problem in the SSL connection startup
code --- if you add such a lock, does it get more reliable? Also, did
you remember to build PG with --enable-thread-safety ?

regards, tom lane
(I'm back after a bit of an illness. Much better now!)

I threw in a pthread mutex around the code making the database connections
for each of my threads. The problem is still there ("corrupted
double-linked list").

Even tuning things down and instructing my code to only run a single
pthread manifests the problem over an SSL connection. Everything is just
fine without SSL. Other code I've written works just fine with (and
without) threads connecting to the database with (and without) SSL.
Tracking down exactly what's tickling the problem in this case could be
tricky...

I'm using the pre-built debian testing packages, not self-compiled code,
for my postgres installation. From the information I can gather from the
debian build logs (http://buildd.debian.org/build.php), everything was
configured and built with threads enabled.

--Andrew J. Klosterman
andrew5 (AT) ece (DOT) cmu.edu

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.