dbTalk Databases Forums  

Re: [BUGS] BUG #2712: could not fsync segment: Permission

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss Re: [BUGS] BUG #2712: could not fsync segment: Permission in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Peter Brant
 
Posts: n/a

Default Re: [BUGS] BUG #2712: could not fsync segment: Permission - 10-23-2006 , 12:05 PM






The same problem exists in 8.1 too. See this thread

http://archives.postgresql.org/pgsql...4/msg00177.php

Tom and Magnus tracked down a cause, but I don't think a fix was ever
implemented.

FWIW, we were bitten by the fsync problem which you noticed too.
Unfortunately we were never able to track down a cause (see the mailing
list archives). They are separate problems though.

Pete

Quote:
"Thomas H." <me (AT) alternize (DOT) com> 23.10.2006 18:21
there is defenitely something terribly wrong in the windows 8.2b1
regarding
file access/locking. 2nd total db lockup today due to file access locks
(all
hold by postmaster):

2006-10-23 17:48:10 LOCATION: exec_simple_query, postgres.c:1007
2006-10-23 17:48:14 LOG: 00000: could not rename file
"pg_xlog/00000001000000040000002E" to
"pg_xlog/000000010000000400000037", continuing to try


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2712: could not fsync segment: Permission - 10-23-2006 , 12:50 PM






"Peter Brant" <Peter.Brant (AT) wicourts (DOT) gov> writes:
Quote:
The same problem exists in 8.1 too. See this thread
http://archives.postgresql.org/pgsql...4/msg00177.php
Tom and Magnus tracked down a cause, but I don't think a fix was ever
implemented.
Thomas seems to have two different issues there: the "could not rename
file" problem on the pg_xlog file is probably explained by the mechanism
we identified back then (and I'm not sure why no fix has been
installed), but there is no known reason other than antivirus software
for the "could not fsync" problem.

As for fixing the problem we do understand: ISTM it's just an awful idea
for pgrename and pgunlink to be willing to loop forever. I think they
should time out and report the failure after some reasonable period
(say between 10 sec and a minute).

If we simply made that change, then the behavior when there's an idle
backend sitting on a filehandle for an old xlog segment would be that
checkpoints would fail at the MoveOfflineLogs stage, which would not
be fatal, but it'd be annoying. We'd probably want to further tweak
InstallXLogFileSegment so that rename failure isn't an ERROR, at least
not on Windows. (I think we could just make it return false, which'd
cause the caller to try to delete the xlog segment, which should work
even though rename doesn't.)

I'm not in a position to test this though. Magnus or Bruce?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Reply With Quote
  #3  
Old   
Thomas H.
 
Posts: n/a

Default Re: [BUGS] BUG #2712: could not fsync segment: Permission - 10-23-2006 , 01:01 PM



Quote:
The same problem exists in 8.1 too. See this thread
its only appearing in 8.2 here, i've just rechecked our logs...
is there any workaround? how did you get around that problem of having a
total lockdown?

thanks,
thomas

Quote:
"Thomas H." <me (AT) alternize (DOT) com> 23.10.2006 18:21
there is defenitely something terribly wrong in the windows 8.2b1
regarding
file access/locking. 2nd total db lockup today due to file access locks
(all
hold by postmaster):

2006-10-23 17:48:10 LOCATION: exec_simple_query, postgres.c:1007
2006-10-23 17:48:14 LOG: 00000: could not rename file
"pg_xlog/00000001000000040000002E" to
"pg_xlog/000000010000000400000037", continuing to try


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Reply With Quote
  #4  
Old   
Peter Brant
 
Posts: n/a

Default Re: [BUGS] BUG #2712: could not fsync segment: Permission - 10-23-2006 , 02:01 PM



That might be one cause (or it might otherwise exacerbate the problem),
but it isn't the only cause. We weren't running anti-virus software and
neither is Thomas. Unfortunately with the last go around, we
collectively ran out of ideas before an underlying cause could be
identified.

Pete

Quote:
Tom Lane <tgl (AT) sss (DOT) pgh.pa.us> 23.10.2006 19:49
installed), but there is no known reason other than antivirus software
for the "could not fsync" problem.


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #5  
Old   
Peter Brant
 
Posts: n/a

Default Re: [BUGS] BUG #2712: could not fsync segment: Permission - 10-23-2006 , 02:26 PM



Move to Linux. :-) In our case, everything but the database servers
were already Linux so it was an easy choice. Things have been rock
solid since then.

Once things get stuck, I don't think there is an alternative besides
"stop -m immediate". However, since the problem is caused by an idle
backend holding onto an old WAL segment, maybe having your middle
tier/connection pool close and reopen the connections to the database
every so often would function as a workaround. Somebody with more
knowledge of PG internals than I would have to define "every so often"
though (if the idea is viable at all).

Pete

Quote:
"Thomas H." <me (AT) alternize (DOT) com> 23.10.2006 20:00
is there any workaround? how did you get around that problem of having
a
total lockdown?


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Reply With Quote
  #6  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2712: could not fsync segment: Permission - 10-23-2006 , 02:48 PM



"Peter Brant" <Peter.Brant (AT) wicourts (DOT) gov> writes:
Quote:
FWIW, we were bitten by the fsync problem which you noticed too.
Unfortunately we were never able to track down a cause (see the mailing
list archives). They are separate problems though.
Actually, now that I look back in the archives, I think we had theorized
that the fsync errors come from attempting to fsync a file that's
already been deleted but some backend still has a reference to.
Apparently that leads to EACCES instead of ENOENT (which the code is
already prepared to expect).

http://archives.postgresql.org/pgsql...4/msg00215.php

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org


Reply With Quote
  #7  
Old   
Thomas H.
 
Posts: n/a

Default Re: [BUGS] BUG #2712: could not fsync segment: Permission - 10-23-2006 , 03:51 PM



Quote:
Actually, now that I look back in the archives, I think we had theorized
that the fsync errors come from attempting to fsync a file that's
already been deleted but some backend still has a reference to.
Apparently that leads to EACCES instead of ENOENT (which the code is
already prepared to expect).
with process explorer i can actually check which postgres.exe instance (in
all cases i've checked its just 1 instance, and always just 1 file) holds
the lock for the file in question. but will that help in determining why it
is still holding a reference?
the postgres instance that holds the lock eventually closes the filehandle
after some minutes. the process itself is not killed but continues
thereafter.

let me know if i can be of any assistance. since we do regurarly reindex one
table whose index size keeps growing despite of often vacuuming, the
fsync-problem happens almost 4-5 times per hour.

regards,
thomas



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
  #8  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2712: could not fsync segment: Permission - 10-23-2006 , 04:00 PM



"Thomas H." <me (AT) alternize (DOT) com> writes:
Quote:
with process explorer i can actually check which postgres.exe instance (in
all cases i've checked its just 1 instance, and always just 1 file) holds
the lock for the file in question.
So which one is it?

Quote:
the postgres instance that holds the lock eventually closes the filehandle
after some minutes. the process itself is not killed but continues
thereafter.
That sounds a bit like what I'd expect the bgwriter to do, but the
bgwriter is also the one trying to issue the fsync.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


Reply With Quote
  #9  
Old   
Thomas H.
 
Posts: n/a

Default Re: [BUGS] BUG #2712: could not fsync segment: Permission - 10-23-2006 , 04:09 PM



Quote:
with process explorer i can actually check which postgres.exe instance
(in
all cases i've checked its just 1 instance, and always just 1 file) holds
the lock for the file in question.

So which one is it?
it's always one of the db-"slaves" and not "logger process", "stats
collector process" or "writer process":

right now its PID 4844 ("\BaseNamedObjects\pgident: postgres: db_outnow
outnow1 127.0.0.1(2122) idle") that tries to write
"D:\DB\PostgreSQL-8.2\data\base\3964774\6422331"

can i somehow check what object that file-OID belong(ed/s) to?

- thomas



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


Reply With Quote
  #10  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2712: could not fsync segment: Permission - 10-23-2006 , 04:22 PM



"Thomas H." <me (AT) alternize (DOT) com> writes:
Quote:
right now its PID 4844 ("\BaseNamedObjects\pgident: postgres: db_outnow
outnow1 127.0.0.1(2122) idle") that tries to write
"D:\DB\PostgreSQL-8.2\data\base\3964774\6422331"
Do you actually mean it's trying to write that file? Or is it just
sitting there holding the open filehandle?

Quote:
can i somehow check what object that file-OID belong(ed/s) to?
You can check in pg_class.relfilenode and pg_class.oid of that database
to see if you get a match. But our theory is that this table has been
deleted ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.