![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Before going into a full description and figuring out some example code for this situation, I'm fishing for interesting in tracking it down and fixing it (or not). |
#3
| |||
| |||
|
|
SPECULATION: Another possibility is that I misunderstand some aspect of multi-threaded interactions with Postgres (I open uniquely named connections to the DB for each thread of my test program). Maybe I need to have a "lock" around the code that makes DB connections and make sure that only one happens at a time (might be better handled within Postgres/SSL if that is the case). |
#4
| |||
| |||
|
|
I threw in a pthread mutex around the code making the database connections for each of my threads. The problem is still there ("corrupted double-linked list"). Even tuning things down and instructing my code to only run a single pthread manifests the problem over an SSL connection. |
|
Tracking down exactly what's tickling the problem in this case could be tricky... |
#5
| |||
| |||
|
|
(gdb) bt #0 0x401c3851 in kill () from /lib/libc.so.6 #1 0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0 #2 0x40139823 in memalign () from /usr/lib/libefence.so.0 #3 0x401399ad in malloc () from /usr/lib/libefence.so.0 #4 0x40139a10 in calloc () from /usr/lib/libefence.so.0 #5 0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.so.3 #6 0x402c8b3f in ?? () from /usr/lib/libpq.so.4 #7 0x402ded88 in ?? () from /usr/lib/libpq.so.4 #8 0x00000000 in ?? () |
#6
| |||
| |||
|
|
Andrew Klosterman <andrew5 (AT) ece (DOT) cmu.edu> writes: (gdb) bt #0 0x401c3851 in kill () from /lib/libc.so.6 #1 0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0 #2 0x40139823 in memalign () from /usr/lib/libefence.so.0 #3 0x401399ad in malloc () from /usr/lib/libefence.so.0 #4 0x40139a10 in calloc () from /usr/lib/libefence.so.0 #5 0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.= so.3 #6 0x402c8b3f in ?? () from /usr/lib/libpq.so.4 #7 0x402ded88 in ?? () from /usr/lib/libpq.so.4 #8 0x00000000 in ?? () =20 Any chance of doing this with debug symbols? libpq does not call krb5_set_default_tgs_ktypes directly, so I don't think I believe the above backtrace. gdb is easily misled without debug symbols :-( |
|
I'm not sure if Debian does things the way Red Hat does, but on RH there are separate "debuginfo" RPMs corresponding to each regular RPM --- if you install the ones matching your libpq and libkrb5 RPMs you should be able to get better info. |
#7
| |||
| |||
|
|
We do have debugging .debs- for some things. We don't have them for everything and unfortunately we don't yet have them for Postgres. I'll talk to Martin about building some though so that in the future it's easier to debug these problems. |
#8
| |||
| |||
|
|
Andrew Klosterman <andrew5 (AT) ece (DOT) cmu.edu> writes: I threw in a pthread mutex around the code making the database connections for each of my threads. The problem is still there ("corrupted double-linked list"). Even tuning things down and instructing my code to only run a single pthread manifests the problem over an SSL connection. Hmm. Based on that, the problem is starting to smell more like a garden-variety memory clobber, for instance malloc'ing a chunk smaller than the data that's later stuffed into it. It might be worth running the program under something like ElectricFence, which will catch the offender on-the-spot rather than only later when corruption of malloc's private data structures is detected. Looking back at your original message, I wonder if it could be the combination of ecpg and SSL that triggers it? I'd have thought that libpq/SSL alone would be pretty well wrung out, but ecpg is not so widely used. BTW, you did say this was i386 right? If it were a 64-bit architecture, I'd be about ready to bet money on the wrong-malloc-size-calculation theory. Tracking down exactly what's tickling the problem in this case could be tricky... Yeah :-(. If you aren't able to narrow it further by yourself, please try to put together a self-contained test case. regards, tom lane |
#9
| |||
| |||
|
|
Andrew Klosterman <andrew5 (AT) ece (DOT) cmu.edu> writes: (gdb) bt #0 0x401c3851 in kill () from /lib/libc.so.6 #1 0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0 #2 0x40139823 in memalign () from /usr/lib/libefence.so.0 #3 0x401399ad in malloc () from /usr/lib/libefence.so.0 #4 0x40139a10 in calloc () from /usr/lib/libefence.so.0 #5 0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.so.3 #6 0x402c8b3f in ?? () from /usr/lib/libpq.so.4 #7 0x402ded88 in ?? () from /usr/lib/libpq.so.4 #8 0x00000000 in ?? () Any chance of doing this with debug symbols? libpq does not call krb5_set_default_tgs_ktypes directly, so I don't think I believe the above backtrace. gdb is easily misled without debug symbols :-( I'm not sure if Debian does things the way Red Hat does, but on RH there are separate "debuginfo" RPMs corresponding to each regular RPM --- if you install the ones matching your libpq and libkrb5 RPMs you should be able to get better info. regards, tom lane |
#10
| |||
| |||
|
|
"Andy Klosterman" <andrew5 (AT) ece (DOT) cmu.edu> writes: SPECULATION: Another possibility is that I misunderstand some aspect of multi-threaded interactions with Postgres (I open uniquely named connections to the DB for each thread of my test program). Maybe I need to have a "lock" around the code that makes DB connections and make sure that only one happens at a time (might be better handled within Postgres/SSL if that is the case). There could be some re-entrancy problem in the SSL connection startup code --- if you add such a lock, does it get more reliable? Also, did you remember to build PG with --enable-thread-safety ? regards, tom lane |
![]() |
| Thread Tools | |
| Display Modes | |
| |