dbTalk Databases Forums  

[BUGS] BUG #1736: endless loop in PQconnectdb

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] BUG #1736: endless loop in PQconnectdb in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Karsten Desler
 
Posts: n/a

Default [BUGS] BUG #1736: endless loop in PQconnectdb - 06-28-2005 , 03:41 PM







The following bug has been logged online:

Bug reference: 1736
Logged by: Karsten Desler
Email address: pgsql (AT) soohrt (DOT) org
PostgreSQL version: 7.4.7
Operating system: debian sarge
Description: endless loop in PQconnectdb
Details:

I've got a pretty flaky tcpip connection to a Postgres 7.4.7 database server
and often times (once or twice a day) my program gets stuck in an endless
busy-loop in PGconnectdb.

An excerpt from a strace:
poll([{fd=389, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}], 1,
-1) = 1
recv(389, "", 1, 0) = 0
poll([{fd=389, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}], 1,
-1) = 1
recv(389, "", 1, 0) = 0
....

SSL is not involved. Sadly I can't say how far along in the connection
process the bug is triggered, but I could install a libpq3 with debugging
symbols and add a few strategically placed gdb watch/break points, if
needed.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly

Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1736: endless loop in PQconnectdb - 06-28-2005 , 05:33 PM






"Karsten Desler" <pgsql (AT) soohrt (DOT) org> writes:
Quote:
I've got a pretty flaky tcpip connection to a Postgres 7.4.7 database server
and often times (once or twice a day) my program gets stuck in an endless
busy-loop in PGconnectdb.
Hmm. Maybe you have a test case for the proposed patch for bug #1467?
Please see the patch posted in pgsql-patches a couple days ago, and let
us know if it helps.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #3  
Old   
Bruce Momjian
 
Posts: n/a

Default Re: [BUGS] BUG #1736: endless loop in PQconnectdb - 06-28-2005 , 07:23 PM




Yes, please --- the patch is at:

http://archives.postgresql.org/pgsql...6/msg00486.php

---------------------------------------------------------------------------

Tom Lane wrote:
Quote:
"Karsten Desler" <pgsql (AT) soohrt (DOT) org> writes:
I've got a pretty flaky tcpip connection to a Postgres 7.4.7 database server
and often times (once or twice a day) my program gets stuck in an endless
busy-loop in PGconnectdb.

Hmm. Maybe you have a test case for the proposed patch for bug #1467?
Please see the patch posted in pgsql-patches a couple days ago, and let
us know if it helps.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly

--
Bruce Momjian | http://candle.pha.pa.us
pgman (AT) candle (DOT) pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo (AT) postgresql (DOT) org


Reply With Quote
  #4  
Old   
Karsten Desler
 
Posts: n/a

Default Re: [BUGS] BUG #1736: endless loop in PQconnectdb - 06-29-2005 , 01:07 PM



* Bruce Momjian wrote:
Quote:
Yes, please --- the patch is at:

http://archives.postgresql.org/pgsql...6/msg00486.php
Thanks.
I've applied the patch now, and I'll keep you posted.

Best regards,
Karsten Desler

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


Reply With Quote
  #5  
Old   
Karsten Desler
 
Posts: n/a

Default Re: [BUGS] BUG #1736: endless loop in PQconnectdb - 06-30-2005 , 08:21 AM



* Karsten Desler wrote:
Quote:
* Bruce Momjian wrote:
Yes, please --- the patch is at:

http://archives.postgresql.org/pgsql...6/msg00486.php

Thanks.
I've applied the patch now, and I'll keep you posted.
It doesn't seem to have helped, and while poking around a little, I
found another annoyance. libpq seems to leak memory if I pass a dns name as
host in conninfo. It doesn't leak when I do the getaddrinfo myself and pass
an IP.

root 16580 0.0 0.2 4004 1304 ? S 10:07 0:00 monitor xxx.xxx.xxx.xxx
root 22434 0.0 1.9 47328 9240 ? S Jun07 5:42 monitor xxx.xxx.xxx.yyy

==9980== 4648 bytes in 166 blocks are definitely lost in loss record 4 of 4
==9980== at 0x1B90459D: malloc (vg_replace_malloc.c:130)
==9980== by 0x1BC39E3B: (within /lib/tls/i686/cmov/libresolv-2.3.2.so)
==9980== by 0x1BC38B92: __libc_res_nquery (in /lib/tls/i686/cmov/libresolv-2.3.2.so)
==9980== by 0x1BC39289: (within /lib/tls/i686/cmov/libresolv-2.3.2.so)
==9980== by 0x1BC38E8F: __libc_res_nsearch (in /lib/tls/i686/cmov/libresolv-2.3.2.so)
==9980== by 0x1BDA907D: ???
==9980== by 0x1B9F0A65: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==9980== by 0x1B9F1673: getaddrinfo (in /lib/tls/i686/cmov/libc-2.3.2.so)
==9980== by 0x1B9259A1: getaddrinfo_all (in /usr/lib/libpq.so.3.1)
==9980== by 0x1B916F3B: (within /usr/lib/libpq.so.3.1)
==9980== by 0x1B9164E9: PQconnectStart (in /usr/lib/libpq.so.3.1)
==9980== by 0x1B916471: PQconnectdb (in /usr/lib/libpq.so.3.1)

Best regards,
Karsten Desler

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


Reply With Quote
  #6  
Old   
Bruce Momjian
 
Posts: n/a

Default Re: [BUGS] BUG #1736: endless loop in PQconnectdb - 07-02-2005 , 06:08 PM



Karsten Desler wrote:
Quote:
* Karsten Desler wrote:
* Bruce Momjian wrote:
Yes, please --- the patch is at:

http://archives.postgresql.org/pgsql...6/msg00486.php

Thanks.
I've applied the patch now, and I'll keep you posted.

It doesn't seem to have helped, and while poking around a little, I
found another annoyance. libpq seems to leak memory if I pass a dns name as
host in conninfo. It doesn't leak when I do the getaddrinfo myself and pass
an IP.

root 16580 0.0 0.2 4004 1304 ? S 10:07 0:00 monitor xxx.xxx.xxx.xxx
root 22434 0.0 1.9 47328 9240 ? S Jun07 5:42 monitor xxx.xxx.xxx.yyy

==9980== 4648 bytes in 166 blocks are definitely lost in loss record 4 of 4
==9980== at 0x1B90459D: malloc (vg_replace_malloc.c:130)
==9980== by 0x1BC39E3B: (within /lib/tls/i686/cmov/libresolv-2.3.2.so)
==9980== by 0x1BC38B92: __libc_res_nquery (in /lib/tls/i686/cmov/libresolv-2.3.2.so)
==9980== by 0x1BC39289: (within /lib/tls/i686/cmov/libresolv-2.3.2.so)
==9980== by 0x1BC38E8F: __libc_res_nsearch (in /lib/tls/i686/cmov/libresolv-2.3.2.so)
==9980== by 0x1BDA907D: ???
==9980== by 0x1B9F0A65: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==9980== by 0x1B9F1673: getaddrinfo (in /lib/tls/i686/cmov/libc-2.3.2.so)
==9980== by 0x1B9259A1: getaddrinfo_all (in /usr/lib/libpq.so.3.1)
==9980== by 0x1B916F3B: (within /usr/lib/libpq.so.3.1)
==9980== by 0x1B9164E9: PQconnectStart (in /usr/lib/libpq.so.3.1)
==9980== by 0x1B916471: PQconnectdb (in /usr/lib/libpq.so.3.1)
I think what you are seeing is that the getaddrinfo memory is placed in
the PGconn structure that isn't freed until PQclear is called. Does
your test call PQclear()?

--
Bruce Momjian | http://candle.pha.pa.us
pgman (AT) candle (DOT) pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #7  
Old   
Karsten Desler
 
Posts: n/a

Default Re: [BUGS] BUG #1736: endless loop in PQconnectdb - 07-03-2005 , 10:24 AM



--SUOF0GtieIMvvwua
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline

* Bruce Momjian wrote:
Quote:
I think what you are seeing is that the getaddrinfo memory is placed in
the PGconn structure that isn't freed until PQclear is called. Does
your test call PQclear()?
s/PQclear/PQfinish/
It does call PQclear on the result, and PQfinish on the connection.
The code is attached.

With postgres doing the dns lookup:
fubar:~# while true; do ps aux|grep -v grep|grep test; sleep 30; done
root 3245 3.6 0.2 4056 1352 pts/3 S+ 10:37 0:01 ./test
root 3245 3.6 0.3 4056 1456 pts/3 S+ 10:37 0:02 ./test
root 3245 3.7 0.3 4184 1560 pts/3 S+ 10:37 0:03 ./test
root 3245 3.7 0.3 4312 1668 pts/3 R+ 10:37 0:04 ./test
root 3245 3.6 0.3 4440 1760 pts/3 S+ 10:37 0:05 ./test

with an output of:
called PQconnectdb: 0x804a008
called PQexec: 0x80dcbe0
calling PQclear: 0x80dcbe0
calling PQfinish: 0x804a008
....
called PQconnectdb: 0x804a008
called PQexec: 0x80dcea0
calling PQclear: 0x80dcea0
calling PQfinish: 0x804a008
....
called PQconnectdb: 0x804a008
called PQexec: 0x80dd620
calling PQclear: 0x80dd620
calling PQfinish: 0x804a008
....

and valgrind complaining about lost blocks:
==3290== 35224 bytes in 1258 blocks are definitely lost in loss record 8 of 8
==3290== at 0x1B90459D: malloc (vg_replace_malloc.c:130)
==3290== by 0x1BC38E3B: (within /lib/tls/i686/cmov/libresolv-2.3.2.so)
==3290== by 0x1BC37B92: __libc_res_nquery (in /lib/tls/i686/cmov/libresolv-2.3.2.so)
==3290== by 0x1BC38289: (within /lib/tls/i686/cmov/libresolv-2.3.2.so)
==3290== by 0x1BC37E8F: __libc_res_nsearch (in /lib/tls/i686/cmov/libresolv-2.3.2.so)
==3290== by 0x1BDA307D: ???
==3290== by 0x1B9EFA65: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==3290== by 0x1B9F0673: getaddrinfo (in /lib/tls/i686/cmov/libc-2.3.2.so)
==3290== by 0x1B925701: getaddrinfo_all (in /usr/lib/libpq.so.3.1)
==3290== by 0x1B916F0B: (within /usr/lib/libpq.so.3.1)
==3290== by 0x1B9164B9: PQconnectStart (in /usr/lib/libpq.so.3.1)
==3290== by 0x1B916441: PQconnectdb (in /usr/lib/libpq.so.3.1)


With the IP in the host field:
fubar:~# while true; do ps aux|grep -v grep|grep test; sleep 30; done
root 3312 1.4 0.2 3872 1092 pts/3 S+ 10:42 0:00 ./test
root 3312 1.6 0.2 3872 1092 pts/3 S+ 10:42 0:00 ./test
root 3312 1.9 0.2 3872 1092 pts/3 S+ 10:42 0:01 ./test
root 3312 2.0 0.2 3872 1092 pts/3 S+ 10:42 0:01 ./test
root 3312 2.0 0.2 3872 1092 pts/3 S+ 10:42 0:02 ./test

output:
called PQconnectdb: 0x804a008
called PQexec: 0x80525b8
calling PQclear: 0x80525b8
calling PQfinish: 0x804a008
....
called PQconnectdb: 0x804a008
called PQexec: 0x80525b8
calling PQclear: 0x80525b8
calling PQfinish: 0x804a008
....
called PQconnectdb: 0x804a008
called PQexec: 0x80525b8
calling PQclear: 0x80525b8
calling PQfinish: 0x804a008
....

and no leaking output from valgrind.

Best regards,
Karsten Desler

--SUOF0GtieIMvvwua
Content-Type: text/x-csrc; charset=iso-8859-1
Content-Disposition: attachment; filename="test.c"

#include <postgresql/libpq-fe.h>
#include <stdio.h>

// gcc -lpq -o test test.c

static const char *conninfo = "host=xxx.xxx.xx.xx port=5432 dbname=xxx user=xxx password=xxx connect_timeout=30";
// static const char *conninfo = "host=db.xxx.de port=5432 dbname=xxx user=xxx password=xxx connect_timeout=30";

static int fetch_from_database(void)
{
PGconn *conn;
PGresult *res;
char sql[32768];
int c_username, c_ip;
int ret = -1;
unsigned int max, i;

conn = PQconnectdb(conninfo);
printf("called PQconnectdb: %p\n", conn);
if (!conn) {
printf("connection failed\n");
goto out_finish;
}

snprintf(sql, sizeof(sql), "SELECT username,ip FROM extras WHERE server=(SELECT id FROM servers WHERE IP='xxx.xxx.xx.xx') ORDER BY username");
res = PQexec(conn, sql);
printf("called PQexec: %p\n", res);
if (PQresultStatus(res) != PGRES_TUPLES_OK) {
printf("couldn't get data\n");
goto out;
}

max = (unsigned int)PQntuples(res);
if (max == 0) {
ret = 0;
goto out;
}

if (max > 64) {
printf("too many results\n");
goto out;
}

c_username = PQfnumber(res, "username");
c_ip = PQfnumber(res, "ip");

if (c_username == -1 || c_ip == -1) {
printf("weird table structure found\n");
goto out;
}

for (i = 0; i < max; i++) {
(void)PQgetvalue(res, i, c_username);
(void)PQgetvalue(res, i, c_ip);
}

ret = 0;
out:
printf("calling PQclear: %p\n", res);
PQclear(res);
out_finish:
printf("calling PQfinish: %p\n", conn);
PQfinish(conn);
res = NULL;
conn = NULL;

return ret;
}

int main(void)
{
while(fetch_from_database() == 0);
return 0;
}

--SUOF0GtieIMvvwua
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--SUOF0GtieIMvvwua--


Reply With Quote
  #8  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1736: endless loop in PQconnectdb - 07-03-2005 , 10:59 AM



Karsten Desler <kdesler (AT) soohrt (DOT) org> writes:
Quote:
* Bruce Momjian wrote:
I think what you are seeing is that the getaddrinfo memory is placed in
the PGconn structure that isn't freed until PQclear is called. Does
your test call PQclear()?

s/PQclear/PQfinish/
It does call PQclear on the result, and PQfinish on the connection.
In that case I think there is no doubt that you've found a bug in
getaddrinfo/freeaddrinfo, and you ought to be reporting it to your
libc provider. We do call freeaddrinfo on the result of getaddrinfo,
so if not everything is cleaned up, that's a library bug not ours.

You could check this by reducing the test case to getaddrinfo()
then freeaddrinfo() using the same parameters that fe-connect.c
passes.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Reply With Quote
  #9  
Old   
Karsten Desler
 
Posts: n/a

Default Re: [BUGS] BUG #1736: endless loop in PQconnectdb - 07-04-2005 , 08:14 AM



* Tom Lane wrote:
Quote:
Karsten Desler <kdesler (AT) soohrt (DOT) org> writes:
* Bruce Momjian wrote:
I think what you are seeing is that the getaddrinfo memory is placed in
the PGconn structure that isn't freed until PQclear is called. Does
your test call PQclear()?

s/PQclear/PQfinish/
It does call PQclear on the result, and PQfinish on the connection.

In that case I think there is no doubt that you've found a bug in
getaddrinfo/freeaddrinfo, and you ought to be reporting it to your
libc provider. We do call freeaddrinfo on the result of getaddrinfo,
so if not everything is cleaned up, that's a library bug not ours.

You could check this by reducing the test case to getaddrinfo()
then freeaddrinfo() using the same parameters that fe-connect.c
passes.
Indeed. Sorry for the noise.
The GNU libc 2.3.2 leaks ai->ai_canonname for every struct addrinfo
in the result list.

The original problem hasn't happened again (it seems like the faulty
ethernet switch, that was the cause for the flaky connection was
finally replaced). Anyway, if it happenes again, I'll notify you.

Regards,
Karsten Desler

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.