![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
|
Hi all, In the last couple of days, we've been bitten (a couple of times, on different servers) by an apparent glitch or bad interaction in the Windows implementation of rename(). The relevant log message is: [2006-04-17 16:49:22.583 ] 2252 LOG: could not rename file "pg_xlog/000000010000010A000000BD" to "pg_xlog/000000010000010A000000D7", continuing to try It apparently just keeps on looping indefinitely. The "completed rename" message from port/dirmod.c never shows up. Shortly thereafter, Postgres becomes unresponsive. Attempts to make a new connection just block. Autovacuums block. A "pg_ctl ... stop -m fast" doesn't work. Only "pg_ctl ... stop -m immediate" does. With the last occurrence, I saved off the output of "handle -a" and "pslist -x" in case that's helpful. Any thoughts on what might be going wrong? If it happens again, what other clues should I be looking for? |
#2
| |||
| |||
|
|
In the last couple of days, we've been bitten (a couple of times, on different servers) by an apparent glitch or bad interaction in the Windows implementation of rename(). The relevant log message is: [2006-04-17 16:49:22.583 ] 2252 LOG: could not rename file "pg_xlog/000000010000010A000000BD" to "pg_xlog/000000010000010A000000D7", continuing to try It apparently just keeps on looping indefinitely. The "completed rename" message from port/dirmod.c never shows up. |
#3
| |||
| |||
|
|
Hi all, =20 In the last couple of days, we've been bitten (a couple of=20 times, on different servers) by an apparent glitch or bad=20 interaction in the Windows implementation of rename(). =20 The relevant log message is: =20 [2006-04-17 16:49:22.583 ] 2252 LOG: could not rename file=20 "pg_xlog/000000010000010A000000BD" to=20 "pg_xlog/000000010000010A000000D7", continuing to try =20 It apparently just keeps on looping indefinitely. The=20 "completed rename" message from port/dirmod.c never shows up. =20 Shortly thereafter, Postgres becomes unresponsive. Attempts=20 to make a new connection just block. Autovacuums block. A=20 "pg_ctl ... stop -m fast" doesn't work. Only "pg_ctl ...=20 stop -m immediate" does. =20 With the last occurrence, I saved off the output of "handle=20 -a" and "pslist -x" in case that's helpful.=20=20 =20 Any thoughts on what might be going wrong? If it happens=20 again, what other clues should I be looking for? |
#4
| |||
| |||
|
|
Bruce Momjian <pgman (AT) candle (DOT) pha.pa.us> 04/18/06 2:58 am Yes, comment I added to dirmod.c give a hint: |
#5
| |||
| |||
|
#6
| |||
| |||
|
|
-----Original Message----- From: Peter Brant [mailto:Peter.Brant (AT) wicourts (DOT) gov]=20 Sent: Tuesday, April 18, 2006 4:15 PM To: Bruce Momjian; Qingqing Zhou <zhouqq (AT) cs (DOT) toronto.edu;=20 Magnus Hagander <mha (AT) sollentuna (DOT) net Cc: pgsql-bugs (AT) postgresql (DOT) org Subject: Re: [BUGS] [Win32] Problem with rename() =20 Unfortunately, it's not that simple. It would be=20 straightforward to track down if it were. =20 In response to other questions: =20 It's Postgres 8.1.3 running on Windows 2003 Server. No=20 anti-virus software is installed. The servers are=20 essentially bare except for the OS and Postgres. =20 We have "handle -a" output from two occurrences (different servers): =20 For the first one: =20 LOG: could not rename file=20 "pg_xlog/000000010000010A000000BD" to=20 "pg_xlog/000000010000010A000000D7", continuing to try =20 Only one process (postgres.exe) is holding a handle to pg_xlog/000000010000010A000000BD: =20 F84: Event \BaseNamedObjects\pgident: postgres: bigbird bigbird 127.0.0.1(3306) BIND FF4: File G:\pgsql\data\pg_xlog\000000010000010A000000BD =20 Nothing has the target file open. =20 The second is similar, except that two postgres.exe processes=20 (and nothing else) have the file open: =20 LOG: could not rename file=20 "pg_xlog/000000010000010A0000006E" to=20 "pg_xlog/000000010000010A00000087", continuing to try =20 #1: F84: Event \BaseNamedObjects\pgident: postgres: bigbird bigbird 127.0.0.1(2367) SELECT EFC: File G:\pgsql\data\pg_xlog\000000010000010A0000006E =20 #2: F84: Event \BaseNamedObjects\pgident: postgres: bigbird bigbird 127.0.0.1(2420) SELECT FF4: File G:\pgsql\data\pg_xlog\000000010000010A0000006E =20 Nothing has the target file open. =20 Pete =20 Bruce Momjian <pgman (AT) candle (DOT) pha.pa.us> 04/18/06 2:58 am Yes, comment I added to dirmod.c give a hint: =20 /* * We need these loops because even though PostgreSQL=20 uses flags that * allow rename while the file is open, other=20 applications might have * these files open without those flags. */ =20 so someone else has the file opened, but didn't use the=20 required flags. =20 As to what could have it open, I don't know. =20 =20 |
#7
| |||
| |||
|
|
LOG: could not rename file "pg_xlog/000000010000010A000000BD" to "pg_xlog/000000010000010A000000D7", continuing to try ... Only one process (postgres.exe) is holding a handle to pg_xlog/000000010000010A000000BD: ... The second is similar, except that two postgres.exe processes (and nothing else) have the file open: |
#8
| |||
| |||
|
|
Looking at our code, we have the comment: /* These flags allow concurrent rename/unlink */ (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE), But I'm not sure that those flags actually guarantee that. They do allow concurrent unlink, but not necessarily rename. I read elsewhere that it should work, but can't find backing docs on MSDN. Seems it works in most cases, but perhaps there are some where it doesn't? |
#9
| |||
| |||
|
|
Looking at our code, we have the comment: /* These flags allow concurrent rename/unlink */ (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE), =20 But I'm not sure that those flags actually guarantee that.=20 They do allow concurrent unlink, but not necessarily rename. I read=20 elsewhere that it should work, but can't find backing docs on MSDN. Seems it=20 works in most cases, but perhaps there are some where it doesn't? =20 I think there are two different cases involved in rename: =20 1. Someone has a handle for the file-to-be-renamed; 2. Someone has a handle for the file that is to be deleted=20 (ie currently has the name being renamed to). =20 If #2 doesn't work then we've got serious problems. I think=20 though that #1 can only occur in the context of WAL segment recycling, so we can probably work around it if that doesn't work. |
#10
| |||
| |||
|
|
"Harald Armin Massa" <haraldarminmassa (AT) gmail (DOT) com> 04/18/06 4:35 pm "G" - is that really a LOKAL drive at that server, or rather some NAS |
![]() |
| Thread Tools | |
| Display Modes | |
| |