![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
|
The nature of the bug is that an 'lo_read' operation performed with DBD::Pg caused a segfault with postgresql-libs-7.3.2 and "hangs" on files >= 32768 bytes with postgresql-libs-7.3.4. The hang is actually a read() loop on the socket generating EAGAIN error on each read(). |
#2
| |||
| |||
|
|
Can you get us a gdb stack trace from the segfault cases? #36 0x40421712 in pqsecure_read () from /usr/lib/libpq.so.3 #37 0x40421712 in pqsecure_read () from /usr/lib/libpq.so.3 #38 0x40421712 in pqsecure_read () from /usr/lib/libpq.so.3 #39 0x40421712 in pqsecure_read () from /usr/lib/libpq.so.3 ... cut thousands of lines; you get the idea |
#3
| |||
| |||
|
|
Is it just me, or are both sides reading waiting for the other side to send data? |
#4
| |||
| |||
|
|
In 7.3.2 pqsecure_read will recurse to self when SSL_read returns SSL_ERROR_WANT_READ. I changed the recursion to a loop in 7.3.4. Evidently, in 7.3.2 it's possible for the recursion to overflow your alloted stack space before the process uses up its timeslice :-(. In 7.3.4 the loop simply doesn't exit. |
|
I don't understand why, though. What I expected would happen is that the process would busy-wait until more data becomes available from the connection. That's not ideal but the full fix seemed safer to postpone to 7.4. Can you set up this situation, then attach with gdb to the connected backend and see what it thinks it's doing? A stacktrace from that side of the connection might shed some light. |
#5
| |||
| |||
|
|
Kevin Houle <kjh (AT) cert (DOT) org> writes: Is it just me, or are both sides reading waiting for the other side to send data? Sure looks like it. Could it be an OpenSSL bug? |
#6
| |||
| |||
|
|
Kevin Houle <kjh (AT) cert (DOT) org> writes: Is it just me, or are both sides reading waiting for the other side to send data? Sure looks like it. Could it be an OpenSSL bug? |
#7
| |||
| |||
|
|
Tom Lane wrote: Kevin Houle <kjh (AT) cert (DOT) org> writes: Is it just me, or are both sides reading waiting for the other side to send data? Sure looks like it. Could it be an OpenSSL bug? One more data point. The DBD::Pg 'lo_extract' function works fine across SSL. There is no issue with large objects >= 32K using 'lo_extract'. So that casts doubt on it being an OpenSSL issue. Is there a different code path within libpq.so to move data from the server to the client via SSL for lo_extract than for lo_read that we can learn from? I'm looking at the code, but for the first time. |
#8
| |||
| |||
|
|
One more data point. The DBD::Pg 'lo_extract' function works fine across SSL. There is no issue with large objects >= 32K using 'lo_extract'. So that casts doubt on it being an OpenSSL issue. |
#9
| |||
| |||
|
|
The nature of the bug is that an 'lo_read' operation performed with DBD::Pg caused a segfault with postgresql-libs-7.3.2 and "hangs" on files >= 32768 bytes with postgresql-libs-7.3.4. The hang is actually a read() loop on the socket generating EAGAIN error on each read(). |
#10
| |||
| |||
|
|
Kevin Houle <kjh (AT) cert (DOT) org> writes: The nature of the bug is that an 'lo_read' operation performed with DBD::Pg caused a segfault with postgresql-libs-7.3.2 and "hangs" on files >= 32768 bytes with postgresql-libs-7.3.4. The hang is actually a read() loop on the socket generating EAGAIN error on each read(). I finally realized what's going on here. 7.3 branch CVS tip should fix it. |
![]() |
| Thread Tools | |
| Display Modes | |
| |