dbTalk Databases Forums  

[BUGS] strange disappearence of postgres file

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] strange disappearence of postgres file in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Harald Armin Massa
 
Posts: n/a

Default [BUGS] strange disappearence of postgres file - 11-21-2005 , 08:41 AM






------=_Part_26006_25833118.1132584037478
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

I ran into a very strange disappearance of a postgresql data file.

The environment:

Windows XP professional

select version();
PostgreSQL 8.0.1 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2(mingw-special)

All datafiles are within c:\ghum\data

that part of the harddrive is only accessable to user account postgres (=3D
the one running the service) and Administrator.

the logentries:

postgresql-2005-11-21_080758.log
2005-11-21 08:08:05 LOG: startup process (PID 1220) exited with unexpected
status 128
2005-11-21 08:08:05 LOG: aborting startup due to startup process failure
2005-11-21 08:08:06 LOG: logger shutting down

<file://///Svs00004/K__Anw_Allg/massa/haase/pglog/postgresql-2005-11-21_084=
022.log>
postgresql-2005-11-21_084022.log
2005-11-21 08:40:24 LOG: database system was shut down at 2005-11-20
16:46:29 Westeurop=E4ische Normalzeit
2005-11-21 08:40:24 LOG: checkpoint record is at 4/2C880BC0
2005-11-21 08:40:24 LOG: redo record is at 4/2C880BC0; undo record is at
0/0; shutdown TRUE
2005-11-21 08:40:24 LOG: next transaction ID: 421189; next OID: 381050
2005-11-21 08:40:24 LOG: database system is ready
2005-11-21 09:45:50 ERROR: could not open relation 1663/17253/43471: No suc=
h
file or directory
2005-11-21 09:45:52 ERROR: could not open relation 1663/17253/43471: No suc=
h
file or directory
2005-11-21 09:45:59 ERROR: could not open relation 1663/17253/43471: No suc=
h
file or directory
2005-11-21 09:46:09 ERROR: could not open relation 1663/17253/43471: No suc=
h
file or directory
2005-11-21 09:46:30 ERROR: could not open relation 1663/17253/43471: No suc=
h
file or directory

.... and on and on and on...

My checking showed that... within directory data/base/17253 there is indeed
no file named 43471; but ones named 43470,43472,43473 .....

Additional Information: that is a laptop-computer, and it is quite possible
that it was restarted between 8:08 and 8:40 this morning.

But ... I see no line mentioning a "killing of a file". Can I do something
more to find out WHY this file disappeared? (and esp. make sure that no mor=
e
files disappear )

Harald


--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstra=DFe 202b
70197 Stuttgart
0173/9409607

------=_Part_26006_25833118.1132584037478
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

I ran into a very strange disappearance of a postgresql data file.<br><br>T=
he environment:<br><br>Windows XP professional<br><br>select version();<br>=
&nbsp;PostgreSQL 8.0.1 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.=
4.2
(mingw-special)<br><br>All datafiles are within c:\ghum\data<br><br>that p=
art of the harddrive is only accessable to user account postgres (=3D the o=
ne running the service) and Administrator.<br><br>the logentries:<br><br>
postgresql-2005-11-21_080758.log<br>2005-11-21 08:08:05 LOG:&nbsp; startup =
process (PID 1220) exited with unexpected status 128<br>2005-11-21 08:08:05=
LOG:&nbsp; aborting startup due to startup process failure<br>2005-11-21 0=
8:08:06 LOG:&nbsp; logger shutting down
<br><br><a href=3D"file://///Svs00004/K__Anw_Allg/massa/haase/pglog/postgre=
sql-2005-11-21_084022.log"></a>postgresql-2005-11-21_084022.log<br>2005-11-=
21 08:40:24 LOG:&nbsp; database system was shut down at 2005-11-20 16:46:29=
Westeurop=E4ische Normalzeit
<br>2005-11-21 08:40:24 LOG:&nbsp; checkpoint record is at 4/2C880BC0<br>20=
05-11-21 08:40:24 LOG:&nbsp; redo record is at 4/2C880BC0; undo record is a=
t 0/0; shutdown TRUE<br>2005-11-21 08:40:24 LOG:&nbsp; next transaction ID:=
421189; next OID: 381050
<br>2005-11-21 08:40:24 LOG:&nbsp; database system is ready<br>2005-11-21 0=
9:45:50 ERROR:&nbsp; could not open relation 1663/17253/43471: No such file=
or directory<br>2005-11-21 09:45:52 ERROR:&nbsp; could not open relation 1=
663/17253/43471: No such file or directory
<br>2005-11-21 09:45:59 ERROR:&nbsp; could not open relation 1663/17253/434=
71: No such file or directory<br>2005-11-21 09:46:09 ERROR:&nbsp; could not=
open relation 1663/17253/43471: No such file or directory<br>2005-11-21 09=
:46:30 ERROR:&nbsp; could not open relation 1663/17253/43471: No such file =
or directory
<br><br>... and on and on and on...<br><br>My checking showed that... withi=
n directory data/base/17253 there is indeed no file named 43471; but ones n=
amed 43470,43472,43473 .....<br><br>Additional Information: that is a lapto=
p-computer, and it is quite possible that it was restarted between 8:08 and=
8:40 this morning.
<br><br>But ... I see no line mentioning a &quot;killing of a file&quot;. C=
an I do something more to find out WHY this file disappeared?&nbsp; (and es=
p. make sure that no more files disappear )<br><br>Harald<br><br clear=
=3D"all">
<br>-- <br>GHUM Harald Massa<br>persuasion python postgresql<br>Harald Armi=
n Massa<br>Reinsburgstra=DFe 202b<br>70197 Stuttgart<br>0173/9409607

------=_Part_26006_25833118.1132584037478--

Reply With Quote
  #2  
Old   
Qingqing Zhou
 
Posts: n/a

Default Re: [BUGS] strange disappearence of postgres file - 11-22-2005 , 07:05 PM







"Harald Armin Massa" <haraldarminmassa (AT) gmail (DOT) com> wrote
Quote:
2005-11-21 09:45:50 ERROR: could not open relation 1663/17253/43471: No
such
file or directory

My checking showed that... within directory data/base/17253 there is
indeed
no file named 43471; but ones named 43470,43472,43473 .....

This may be not a direct step to solve the problem, but let's get some
feelings first:

Is 43471 a temp table? To find out this: "select relname, nspname from
pg_class, pg_namespace where pg_namespace.oid=pg_class.relnamespace and
relfilenode = 43471;"

Regards,
Qingqing



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


Reply With Quote
  #3  
Old   
Harald Armin Massa
 
Posts: n/a

Default Re: [BUGS] strange disappearence of postgres file - 11-23-2005 , 01:25 AM



------=_Part_318_13305032.1132730692237
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Quingqing,

no, it is definitely NOT a temp table. I know which table it is, because:

that "could not open relation ..." error message was exactly provokable wit=
h
"select * from repofeld", which is one of my tables in that database.

(And that was the way it was passed on to me: my application put that error
on screen)


Harald

--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstra=DFe 202b
70197 Stuttgart
0173/9409607

------=_Part_318_13305032.1132730692237
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Quingqing,<br>
<br>
no, it is definitely NOT a temp table. I know which table it is, because:<b=
r>
<br>
that &quot;could not open relation ...&quot; error message was exactly prov=
okable
with &quot;select * from repofeld&quot;, which is one of my tables in that
database.<br>
<br>
(And that was the way it was passed on to me: my application put that error=
on screen)<br>
<br>
<br>
Harald<br>
<br>-- <br>GHUM Harald Massa<br>persuasion python postgresql<br>Harald Armi=
n Massa<br>Reinsburgstra=DFe 202b<br>70197 Stuttgart<br>0173/9409607

------=_Part_318_13305032.1132730692237--

Reply With Quote
  #4  
Old   
Qingqing Zhou
 
Posts: n/a

Default Re: [BUGS] strange disappearence of postgres file - 11-23-2005 , 02:09 AM




On Wed, 23 Nov 2005, Harald Armin Massa wrote:
Quote:
no, it is definitely NOT a temp table. I know which table it is,
because: that "could not open relation ..." error message was exactly
provokable with "select * from repofeld", which is one of my tables in
that database.

What is your file system, NTFS or FAT32? Is that table newly created?

So there is a valid record in pg_class but the representing data file is
lost ... a possible theory of what's happened could be:

1) create the table;
2) a checkpoint happens;
3) lost power;
[ restar the machine and database ]
4) file system recovery - unable to recovery your data file;
5) database recovery - don't play WAL and recreate your data file because
of the checkpoint;

Is that possible?

Regards,
Qingqing

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #5  
Old   
Harald Armin Massa
 
Posts: n/a

Default Re: [BUGS] strange disappearence of postgres file - 11-23-2005 , 03:31 AM



------=_Part_779_19106343.1132738172885
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hello QuingQing,

Quote:
What is your file system, NTFS or FAT32? Is that table newly created?

File System is NTFS. That table was created with database installation,
which was short after after release of PostgreSQL 8.0 - so that database wa=
s
in use for > 9 months. That table is a central table of the application, an=
d
is used very often ... the application was running for the whole 9 months :=
)

So there is a valid record in pg_class but the representing data file is
Quote:
lost ... a possible theory of what's happened could be:

1) create the table;
2) a checkpoint happens;
3) lost power;
[ restar the machine and database ]
4) file system recovery - unable to recovery your data file;
5) database recovery - don't play WAL and recreate your data file because
of the checkpoint;

Is that possible?
Quote:
Rather not ... that table was there from the beginning and for some months.
(it gets created with initial database install)

The only strange thing is that short 3 lines log before the big error. As t=
o
power loss: that is a laptop; so "loosing power" would need to take out the
accumulator. Which is quite possible, but rather unlikely given the
technical level of the user.

postgresql-2005-11-21_080758.log
2005-11-21 08:08:05 LOG: startup process (PID 1220) exited with unexpected
status 128
2005-11-21 08:08:05 LOG: aborting startup due to startup process failure
2005-11-21 08:08:06 LOG: logger shutting down

postgresql-2005-11-21_084022.log
2005-11-21 08:40:24 LOG: database system was shut down at 2005-11-20
16:46:29 Westeurop=E4ische Normalzeit

--> can that exiting process with 128 be a sign of "system gets killed whil=
e
postgres is starting up"? (and taking a file with it????)

As much as I learned, PostgreSQL only APPENDS to the data files; and only a
vacuum full can truncate them ... and on that machine there is no vacuum
full happening.

It still keeps being a mystery.

Harald

--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstra=DFe 202b
70197 Stuttgart
0173/9409607

------=_Part_779_19106343.1132738172885
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hello QuingQing,<br><div><blockquote class=3D"gmail_quote" style=3D"border-=
left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left=
: 1ex;"><br>What is your file system, NTFS or FAT32? Is that table newly cr=
eated?
</blockquote><div><br></div><div>File System is NTFS. That table was create=
d with database installation, which was short after after release of Postgr=
eSQL 8.0 - so that database was in use for &gt; 9 months. That table is a c=
entral table of the application, and is used very often ... the application=
was running for the whole 9 months
<br><br></div><blockquote class=3D"gmail_quote" style=3D"border-left: 1px s=
olid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">So =
there is a valid record in pg_class but the representing data file is<br>lo=
st ... a possible theory of what's happened could be:
<br><br>1) create the table;<br>2) a checkpoint happens;<br>3) lost power;<=
br>[ restar the machine and database ]<br>4) file system recovery - unable =
to recovery your data file;<br>5) database recovery - don't play WAL and re=
create your data file because
<br>of the checkpoint;</blockquote><br><blockquote class=3D"gmail_quote" st=
yle=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex=
; padding-left: 1ex;">Is that possible?<br></blockquote></div>Rather not ..=
.. that table was there from the beginning and for some months. (it gets cre=
ated with initial database install)
<br><br>The only strange thing is that short 3 lines log before the big err=
or. As to power loss: that is a laptop; so &quot;loosing power&quot; would =
need to take out the accumulator. Which is quite possible, but rather unlik=
ely given the technical level of the user.
<br><br>postgresql-2005-11-21_080758<div id=3D"mb_0">.log<br>2005-11-21 08:=
08:05 LOG:&nbsp; startup process (PID 1220) exited with unexpected status 1=
28<br>2005-11-21 08:08:05 LOG:&nbsp; aborting startup due to startup proces=
s failure
<br>2005-11-21 08:08:06 LOG:&nbsp; logger shutting down
<br><br>postgresql-2005-11-21_084022.log<br>2005-11-21 08:40:24 LOG:&nbsp; =
database system was shut down at 2005-11-20 16:46:29 Westeurop=E4ische Norm=
alzeit
</div><br>--&gt; can that exiting process with 128 be a sign of &quot;syste=
m gets killed while postgres is starting up&quot;?&nbsp; (and taking a file=
with it????)<br><br>As much as I learned, PostgreSQL only APPENDS to the d=
ata files; and only a vacuum full can truncate them ... and on that machine=
there is no vacuum full happening.=20
<br><br>It still keeps being a mystery. <br><br>Harald<br clear=3D"all"><br=
Quote:
-- <br>GHUM Harald Massa<br>persuasion python postgresql<br>Harald Armin M=
assa<br>Reinsburgstra=DFe 202b<br>70197 Stuttgart<br>0173/9409607

------=_Part_779_19106343.1132738172885--


Reply With Quote
  #6  
Old   
Alvaro Herrera
 
Posts: n/a

Default Re: [BUGS] strange disappearence of postgres file - 11-23-2005 , 06:15 AM



Harald Armin Massa wrote:
Quote:
Hello QuingQing,

What is your file system, NTFS or FAT32? Is that table newly created?

File System is NTFS. That table was created with database installation,
which was short after after release of PostgreSQL 8.0 - so that database was
in use for > 9 months. That table is a central table of the application, and
is used very often ... the application was running for the whole 9 months
How many database-wide vacuums did you run during these 9 months? I'm
smelling transaction Id wraparound in pg_class or some other system
catalog.

This has been known to happen. Please see the archives.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Reply With Quote
  #7  
Old   
Harald Armin Massa
 
Posts: n/a

Default Re: [BUGS] strange disappearence of postgres file - 11-23-2005 , 07:11 AM



------=_Part_1500_2133182.1132751496000
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Alvaro,

How many database-wide vacuums did you run during these 9 months? I'm
Quote:
smelling transaction Id wraparound in pg_class or some other system
catalog.

This has been known to happen. Please see the archives.

"database wide" vacuums have been very very seldom; I fear the only one was
at initial db bulk load. at the moment I am on a way to upgrade to 8.1, wit=
h
autovacuum enabled ... so in the long run THAT possible reason should
vanish.

And I understand correctly that your suspicion is: file gets killed because
of transaction id wraparound in system table?

I will scan teh archives (as soon as postgresql.org provices access to them
again

Harald

--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstra=DFe 202b
70197 Stuttgart
0173/9409607

------=_Part_1500_2133182.1132751496000
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Alvaro,<br><div><span class=3D"gmail_quote"></span><div><br></div><blockquo=
te class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, 204, 204)=
; margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">How many database-wide vac=
uums did you run during these 9 months?&nbsp;&nbsp;I'm
<br>smelling transaction Id wraparound in pg_class or some other system<br>=
catalog.<br><br>This has been known to happen.&nbsp;&nbsp;Please see the ar=
chives.<br></blockquote></div><br>&quot;database wide&quot; vacuums have be=
en very very seldom; I fear the only one was at initial db bulk load. at th=
e moment I am on a way to upgrade to=20
8.1, with autovacuum enabled ... so in the long run THAT possible reason sh=
ould vanish.<br><br>And I understand correctly that your suspicion is: file=
gets killed because of transaction id wraparound in system table? <br>
<br>I will scan teh archives (as soon as <a href=3D"http://postgresql.org">=
postgresql.org</a> provices access to them again <br><br>Harald<br><br>--=
<br>GHUM Harald Massa<br>persuasion python postgresql<br>Harald Armin Mass=
a
<br>Reinsburgstra=DFe 202b<br>70197 Stuttgart<br>0173/9409607

------=_Part_1500_2133182.1132751496000--


Reply With Quote
  #8  
Old   
Alvaro Herrera
 
Posts: n/a

Default Re: [BUGS] strange disappearence of postgres file - 11-23-2005 , 04:43 PM



Harald Armin Massa wrote:
Quote:
Alvaro,

How many database-wide vacuums did you run during these 9 months? I'm
smelling transaction Id wraparound in pg_class or some other system
catalog.

"database wide" vacuums have been very very seldom; I fear the only one was
at initial db bulk load. at the moment I am on a way to upgrade to 8.1, with
autovacuum enabled ... so in the long run THAT possible reason should
vanish.

And I understand correctly that your suspicion is: file gets killed because
of transaction id wraparound in system table?
No, the file doesn't get killed, but an old tuple on the pg_class system
catalog magically appears because it was deleted and marked with a very
old Xmax, that after the wraparound is seen as future; and thus the
tuple that was deleted is suddenly visible. For this to happen you need
to have failed to vacuum pg_class in a very long time.

There may be other explanations, but we'd need more data, like how much
tuples for that table you got on pg_class and what do they look like.
Try something like

set enable_indexscan to off;
select xmin, xmax, relname from pg_class where relname = 'the table';

You can search the archives using http://www.pgsql.ru

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.