dbTalk Databases Forums  

[Info-ingres] Corrupted tables

comp.databases.ingres comp.databases.ingres


Discuss [Info-ingres] Corrupted tables in the comp.databases.ingres forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Rob McKenzie
 
Posts: n/a

Default [Info-ingres] Corrupted tables - 09-06-2005 , 09:33 AM






I've been luck enough to walk into, and inherit, an site that is running an unsupported version of Ingres. My first week here has handed me three major tables in the production system that are btree and all are having headaches. CA tech support was nice enough to advise me to run verifydb which will probably patch the tables, but we may loose data.

The vesion on Ingres is: Ingres Alpha VMS Version II 2.0/9808 (axp.vms/00)

The errors we are getting are as follows:
E_DM93A7_BAD_FILE_PAGE_ADDR Page 3832 in table a, owner: proddba, database: prodacw, has an incorrect page number: 4567.
Other page fields: page_stat 00000150,
page_log_address (00000000,00000000),
page_tran_id (0000000000000000).
Corrupted page cannot be read into the server cache.
E_DM9206_BM_BAD_PAGE_NUMBER Page number on page doesn't match its location.
E_DM920C_BM_BAD_FAULT_PAGE Error faulting a page.
E_DM9C83_DM0P_CACHEFIX_PAGE An error occurred while fixing a page in the buffer manager.

There are a few others, but they are all similar in nature to the above. I ran verifydb in report mode and it finds references to missing pages. What are peoples experiences with problems like this? Has verifydb in run mode fixed these problems without data loss and total corruption? I WILL begin working on upgrading Ingres, but I can't do it today, so in the interim I'd like to get the users running again.

Thanks,



Rob McKenzie
Phone: (905) 989-1750
Cell: (905) 715-9593
Email: rob.mckenzie (AT) rogers (DOT) com


Reply With Quote
  #2  
Old   
martin.bowes@ctsu.ox.ac.uk
 
Posts: n/a

Default Re: [Info-ingres] Corrupted tables - 09-06-2005 , 10:13 AM






Hi Rob,

I've tended to shy away from verifydb in run mode unless
specifically instructed by a CA Tech Rep.

And even when instructed to do so, I'd look at some
alternatives first - no good reason, I just scared easily. This looks like a
corrupted FMAP, FHDR.

So what I would try first is to use copydb to copy the data - in
ascii if possible (does the table contain binary columns(byte, byte
varying or long byte))- to disk. Then load it into a dummy database. See
what happens. Check for erros in the frontend and in the errlog.log.
Check the number of rows copyied out and the number loaded. Does
this agree with a select count(*) from a? Does verifydb in report mode
run clean on the new tables?

If so then I would suggest trying a modify on the tables in
question on the prod database. That should probably fix up the FMAP
and FHDRs.Check if a verifydb in report mode likes the modified tables.

Martin Bowes

On 6 Sep 2005 at 10:33, Rob McKenzie wrote:

Quote:
I've been luck enough to walk into, and inherit, an site that is
running an unsupported version of Ingres. My first week here has
handed me three major tables in the production system that are btree
and all are having headaches. CA tech support was nice enough to
advise me to run verifydb which will probably patch the tables, but we
may loose data.

The vesion on Ingres is: Ingres Alpha VMS Version II 2.0/9808
(axp.vms/00)

The errors we are getting are as follows:
E_DM93A7_BAD_FILE_PAGE_ADDR Page 3832 in table a, owner: proddba,
database: prodacw, has an incorrect page number: 4567. Other page
fields: page_stat 00000150, page_log_address (00000000,00000000),
page_tran_id (0000000000000000).
Corrupted page cannot be read into the server cache.
E_DM9206_BM_BAD_PAGE_NUMBER Page number on page doesn't match its
location.E_DM920C_BM_BAD_FAULT_PAGE Error faulting a page.
E_DM9C83_DM0P_CACHEFIX_PAGE An error occurred while fixing a page in
the buffer manager.

There are a few others, but they are all similar in nature to the
above. I ran verifydb in report mode and it finds references to
missing pages. What are peoples experiences with problems like this?
Has verifydb in run mode fixed these problems without data loss and
total corruption? I WILL begin working on upgrading Ingres, but I
can't do it today, so in the interim I'd like to get the users running
again.

Thanks,



Rob McKenzie
Phone: (905) 989-1750
Cell: (905) 715-9593
Email: rob.mckenzie (AT) rogers (DOT) com



Reply With Quote
  #3  
Old   
Laframboise André
 
Posts: n/a

Default RE: [Info-ingres] Corrupted tables - 09-06-2005 , 10:41 AM



We've had similar problems and most times, verifydb was able to fix them.

Verifydb basicaly ignores page headers and turns the table to heap.



It's usually a last resort effort to save the table. If possible, run a
select count

before and after to see if any rows are gone.



Is there a copy of the table in good shape on old checkpoints ?



Andre



_____

From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Rob McKenzie
Sent: Tuesday, September 06, 2005 10:34 AM
To: info-ingres (AT) cariboulake (DOT) com
Subject: [Info-ingres] Corrupted tables



I've been luck enough to walk into, and inherit, an site that is running an
unsupported version of Ingres. My first week here has handed me three major
tables in the production system that are btree and all are having headaches.
CA tech support was nice enough to advise me to run verifydb which will
probably patch the tables, but we may loose data.



The vesion on Ingres is: Ingres Alpha VMS Version II 2.0/9808 (axp.vms/00)



The errors we are getting are as follows:

E_DM93A7_BAD_FILE_PAGE_ADDR Page 3832 in table a, owner: proddba,
database: prodacw, has an incorrect page number: 4567.
Other page fields: page_stat 00000150,

page_log_address (00000000,00000000),

page_tran_id (0000000000000000).
Corrupted page cannot be read into the server cache.
E_DM9206_BM_BAD_PAGE_NUMBER Page number on page doesn't match its
location.

E_DM920C_BM_BAD_FAULT_PAGE Error faulting a page.
E_DM9C83_DM0P_CACHEFIX_PAGE An error occurred while fixing a page in the
buffer manager.



There are a few others, but they are all similar in nature to the above. I
ran verifydb in report mode and it finds references to missing pages. What
are peoples experiences with problems like this? Has verifydb in run mode
fixed these problems without data loss and total corruption? I WILL begin
working on upgrading Ingres, but I can't do it today, so in the interim I'd
like to get the users running again.



Thanks,





Rob McKenzie
Phone: (905) 989-1750
Cell: (905) 715-9593
Email: rob.mckenzie (AT) rogers (DOT) com



Reply With Quote
  #4  
Old   
Armand Pirvu
 
Posts: n/a

Default Re: [Info-ingres] Corrupted tables - 09-06-2005 , 10:57 AM



Hi Rob,

You may want to make some first hand checkings.

1. If the table has secondary indexes, do a select count(coln) from index, coln being a key column of the secondary index. Then do a select count(cola) from table, cola NOT being part of the table structure key nor any secondary index attached to that table. If they match , that means you could traverse the whole table (in the second case) and most likely you have a verifydb bug (not sure if there is any in 2.0 though, but ran into one in 2.6 where verifydb reported bogus messages).
2. If you are sure the secondary index is ok, you may want to rebuild using the secondary index.
2.1. Create tbl_patch with the identical structure as tbl
2.2. Insert into tbl_patch (
col1, col2...)
select
a.col1, a.col2,...
from tbl a, tbl_index b
where
a.tid=b.tidp
and
a.col1 = b.col1
....

The columns in the where clause are those that defined the secondary index
Then compare the two of them.


3. Or, you may want to restore the table.

By the way, these errors are they the subsequent result of a rollforwarddb ?


Armand

Rob McKenzie <rob.mckenzie (AT) rogers (DOT) com> wrote:
I've been luck enough to walk into, and inherit, an site that is running an unsupported version of Ingres. My first week here has handed me three major tables in the production system that are btree and all are having headaches. CA tech support was nice enough to advise me to run verifydb which will probably patch the tables, but we may loose data.

The vesion on Ingres is: Ingres Alpha VMS Version II 2.0/9808 (axp.vms/00)

The errors we are getting are as follows:
E_DM93A7_BAD_FILE_PAGE_ADDR Page 3832 in table a, owner: proddba, database: prodacw, has an incorrect page number: 4567.
Other page fields: page_stat 00000150,
page_log_address (00000000,00000000),
page_tran_id (0000000000000000).
Corrupted page cannot be read into the server cache.
E_DM9206_BM_BAD_PAGE_NUMBER Page number on page doesn't match its location.
E_DM920C_BM_BAD_FAULT_PAGE Error faulting a page.
E_DM9C83_DM0P_CACHEFIX_PAGE An error occurred while fixing a page in the buffer manager.

There are a few others, but they are all similar in nature to the above. I ran verifydb in report mode and it finds references to missing pages. What are peoples experiences with problems like this? Has verifydb in run mode fixed these problems without data loss and total corruption? I WILL begin working on upgrading Ingres, but I can't do it today, so in the interim I'd like to get the users running again.

Thanks,



Rob McKenzie
Phone: (905) 989-1750
Cell: (905) 715-9593
Email: rob.mckenzie (AT) rogers (DOT) com


================================================== =====
Armand

"If I had only known, I would have been a locksmith."
Albert Einstein
================================================== =====

Reply With Quote
  #5  
Old   
martin.bowes@ctsu.ox.ac.uk
 
Posts: n/a

Default Re: [Info-ingres] Corrupted tables - 09-07-2005 , 02:54 AM



Hi Rob

Quote:
I've tried to modify and copy out the tables - niether will work. A
select count(*) fails also. A select * of the table will run to a
certain point the crap when it apparently hits the bad addresses.
SOOOOO.... I'm fearful that my only options may be to try the
verifydb.
Yep. Sounds like the remaining option.

Quote:
Karl suggested DM801 (Ithink) - what do you know about this?
dm801 allows the DMF cache to skip bad rows. This will allow
the modify to proceed. But I think this would mean the loss of the bad
rows - I'm not sure.

dm802 would allow the DMF cache to return garbage(??) for
bad rows. Which could allow them to be kept and deleted. It may allow
the modify to work.

My suggestion would be that if you are time constrained on
recovering these tables then the verifydb would be the remaining best
bet.

Furthermore, if you have some time to play later - make a copy
of the underlying unix file that is associated with the table. From
iirelation get a note of the values of relfhdr, relstat, reltups, relpages for
the table. With this information we can do a file swap on a dummy
database and recreate the problem table there.

If you have time to play with this then it might be worthwhile
investigating old checkpoints - recovered to a foreign host - and see at
what stage the thing becomes unreadable.

Best of luck,

Marty
--
Random Duckman Quote #17:
Cornfed - No one else could have created the family you did... I mean
that in a good way.



Reply With Quote
  #6  
Old   
Peter Gale
 
Posts: n/a

Default RE: [Info-ingres] Corrupted tables - 09-07-2005 , 03:31 AM



Hi Rob,

Not sure if this will help at all but here goes.

SELECT COUNT(non key column)

I pretty sure this will do a table scan and possibly return a valid count.
This might help to determine if any methods used to extract the rows have
actually got them all. It might also reveal if the end product of verifydb
includes all the rows.

Does the table have any secondary indexes? You could query the index
directly to get a row count and even use this to access the base table
without going near the index. You would need to apply restrictions that
would force the secondary to be used such as

SELECT t.*
FROM index i, table t
WHERE i.some_column between min_column_value and max_column_value
AND i.tidp = t.tid

Note: Use of tids is not recommended but if you can be sure this table is
not being manipulated by anyone else you should be OK>

SET LOCKMODE SESSION WHERE READLOCK=NOLOCK

This might allow you to copy out everything except the bad rows but probably
produces the same result as DM801, that is if either of them actually allows
a copy or select to run to an error free conclusion.

Do you have the option to recover from ckpt prior to the corruption (if you
know when that was) and rfwd the journals?

One other point on DM802 is that if you can get an old copy of the table
(from a previous ckpt) this might allow some reconstruction of rows that
return garbage when this trace point is used. Nothing guaranteed here and I
strongly advise you don't use the TIDS to try and join old table to current
table as they are highly likely to be different.

HTH

Peter

T: +44 (0)1398 341777
PGale (AT) Comp-Soln (DOT) co.uk


-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of
martin.bowes (AT) ctsu (DOT) ox.ac.uk
Sent: 07 September 2005 08:54
To: rob.mckenzie (AT) rogers (DOT) com
Cc: info-ingres (AT) cariboulake (DOT) com
Subject: Re: [Info-ingres] Corrupted tables

Hi Rob

Quote:
I've tried to modify and copy out the tables - niether will work. A
select count(*) fails also. A select * of the table will run to a
certain point the crap when it apparently hits the bad addresses.
SOOOOO.... I'm fearful that my only options may be to try the
verifydb.
Yep. Sounds like the remaining option.

Quote:
Karl suggested DM801 (Ithink) - what do you know about this?
dm801 allows the DMF cache to skip bad rows. This will allow
the modify to proceed. But I think this would mean the loss of the bad
rows - I'm not sure.

dm802 would allow the DMF cache to return garbage(??) for
bad rows. Which could allow them to be kept and deleted. It may allow
the modify to work.

My suggestion would be that if you are time constrained on
recovering these tables then the verifydb would be the remaining best
bet.

Furthermore, if you have some time to play later - make a copy
of the underlying unix file that is associated with the table. From
iirelation get a note of the values of relfhdr, relstat, reltups, relpages
for
the table. With this information we can do a file swap on a dummy
database and recreate the problem table there.

If you have time to play with this then it might be worthwhile
investigating old checkpoints - recovered to a foreign host - and see at
what stage the thing becomes unreadable.

Best of luck,

Marty
--
Random Duckman Quote #17:
Cornfed - No one else could have created the family you did... I mean
that in a good way.

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres



Reply With Quote
  #7  
Old   
Gibson Jonathan
 
Posts: n/a

Default RE: [Info-ingres] Corrupted tables - 09-07-2005 , 04:06 AM



Hi Rob,

We've suffered from this in the past and it was caused by having an
incorrectly configured cache. Basically, the cache was slightly bigger
that the memory available (it took tech support several days to come up
with the necessary information on how to configure a large cache on AIX
and even then I wasn't convinced it was right). The other bad news is
that verifydb did not work. It was unable to address the bad pages and
consequently couldn't fix the problem. Even if it had, I'd say it would
be 99.9% certain of losing the data "associated" with the bad pages. We
were forced to restore from a backup that was valid before the
corruption occurred. We managed via auditdb to work out what new
policies were entered in the period the corruption occurred in but
everything else had to manually re-keyed. I'm sure given enough time
and patience, you could replay all your transactions but if it's an
important production database, you won't be able to afford the downtime.
If you have a spare machine, then you could take a ckp now and roll that
onto the spare machine so you can post process the data whilst you get
your system up and running.

If it's just showing one table affected, you might be able to restore
that by itself in some way (OS, table recovery or full recovery to
another database and do a table copy) but you'll need to work out the
logical implications regarding referential integrity of your database.

Regards
Jon

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Rob McKenzie
Sent: 06 September 2005 15:34
To: info-ingres (AT) cariboulake (DOT) com
Subject: [Info-ingres] Corrupted tables


I've been luck enough to walk into, and inherit, an site that is
running an unsupported version of Ingres. My first week here has handed
me three major tables in the production system that are btree and all
are having headaches. CA tech support was nice enough to advise me to
run verifydb which will probably patch the tables, but we may loose
data.

The vesion on Ingres is: Ingres Alpha VMS Version II 2.0/9808
(axp.vms/00)

The errors we are getting are as follows:
E_DM93A7_BAD_FILE_PAGE_ADDR Page 3832 in table a, owner:
proddba, database: prodacw, has an incorrect page number: 4567.
Other page fields: page_stat 00000150,
page_log_address (00000000,00000000),

page_tran_id (0000000000000000).
Corrupted page cannot be read into the server cache.
E_DM9206_BM_BAD_PAGE_NUMBER Page number on page doesn't
match its location.
E_DM920C_BM_BAD_FAULT_PAGE Error faulting a page..
E_DM9C83_DM0P_CACHEFIX_PAGE An error occurred while fixing a
page in the buffer manager.

There are a few others, but they are all similar in nature to
the above. I ran verifydb in report mode and it finds references to
missing pages. What are peoples experiences with problems like this?
Has verifydb in run mode fixed these problems without data loss and
total corruption? I WILL begin working on upgrading Ingres, but I can't
do it today, so in the interim I'd like to get the users running again.

Thanks,



Rob McKenzie
Phone: (905) 989-1750
Cell: (905) 715-9593
Email: rob.mckenzie (AT) rogers (DOT) com


__________________________________________________ ___________________
This message has been checked for all known viruses by
bluesource. For further information visit www.blue-source.com

powered by Messagelabs




************************************************** ********************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. No one else is authorised to distribute, forward,
print, copy or act upon any information contained in this email.
If you have received this email in error, please notify the sender.

Hiscox Syndicates Limited, Hiscox Insurance Company Limited, Hiscox
Connect Limited, Hiscox Underwriting Limited and Hiscox Investment Management Limited are authorised and regulated by the Financial
Services Authority. Hiscox plc is a company registered in England
and Wales under company registration number 2837811 and registered
office at 1 Great St Helen's, London EC3A 6HX
************************************************** ********************


__________________________________________________ ___________________
This message has been checked for all known viruses by blue-source. For further information visit www.blue-source.com

powered by Messagelabs

Reply With Quote
  #8  
Old   
Gibson Jonathan
 
Posts: n/a

Default RE: [Info-ingres] Corrupted tables - 09-07-2005 , 04:16 AM



Hi Peter,

I doubt a select count will work as the table will fall over on anything
that requires a table scan. Been here before and it's a nightmare.
Using a secondary index could help in rebuilding the table, especially
in conjunction with DM802. Like Marty, I wouldn't trust verifydb in
this situation. The best I think it will do is to lose the bad data but
I doubt it will even get that far. Copydb is not an option, the table
is screwed too much for this work.

If I was Rob, I'd be checking all my settings in config.dat as I'll bet
something is configured beyond the system resources (talking from
experience...)

Cheers
Jon

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Peter Gale
Sent: 07 September 2005 09:32
To: martin.bowes (AT) ctsu (DOT) ox.ac.uk; rob.mckenzie (AT) rogers (DOT) com
Cc: info-ingres (AT) cariboulake (DOT) com
Subject: RE: [Info-ingres] Corrupted tables


Hi Rob,

Not sure if this will help at all but here goes.

SELECT COUNT(non key column)

I pretty sure this will do a table scan and possibly return a valid
count.
This might help to determine if any methods used to extract the rows
have
actually got them all. It might also reveal if the end product of
verifydb
includes all the rows.

Does the table have any secondary indexes? You could query the index
directly to get a row count and even use this to access the base table
without going near the index. You would need to apply restrictions that
would force the secondary to be used such as

SELECT t.*
FROM index i, table t
WHERE i.some_column between min_column_value and max_column_value
AND i.tidp = t.tid

Note: Use of tids is not recommended but if you can be sure this table
is
not being manipulated by anyone else you should be OK>

SET LOCKMODE SESSION WHERE READLOCK=NOLOCK

This might allow you to copy out everything except the bad rows but
probably
produces the same result as DM801, that is if either of them actually
allows
a copy or select to run to an error free conclusion.

Do you have the option to recover from ckpt prior to the corruption (if
you
know when that was) and rfwd the journals?

One other point on DM802 is that if you can get an old copy of the table
(from a previous ckpt) this might allow some reconstruction of rows that
return garbage when this trace point is used. Nothing guaranteed here
and I
strongly advise you don't use the TIDS to try and join old table to
current
table as they are highly likely to be different.

HTH

Peter

T: +44 (0)1398 341777
PGale (AT) Comp-Soln (DOT) co.uk


-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of
martin.bowes (AT) ctsu (DOT) ox.ac.uk
Sent: 07 September 2005 08:54
To: rob.mckenzie (AT) rogers (DOT) com
Cc: info-ingres (AT) cariboulake (DOT) com
Subject: Re: [Info-ingres] Corrupted tables

Hi Rob

Quote:
I've tried to modify and copy out the tables - niether will work. A
select count(*) fails also. A select * of the table will run to a
certain point the crap when it apparently hits the bad addresses.
SOOOOO.... I'm fearful that my only options may be to try the
verifydb.
Yep. Sounds like the remaining option.

Quote:
Karl suggested DM801 (Ithink) - what do you know about this?
dm801 allows the DMF cache to skip bad rows. This will allow
the modify to proceed. But I think this would mean the loss of the bad
rows - I'm not sure.

dm802 would allow the DMF cache to return garbage(??) for
bad rows. Which could allow them to be kept and deleted. It may allow
the modify to work.

My suggestion would be that if you are time constrained on
recovering these tables then the verifydb would be the remaining best
bet.

Furthermore, if you have some time to play later - make a copy
of the underlying unix file that is associated with the table. From
iirelation get a note of the values of relfhdr, relstat, reltups,
relpages
for
the table. With this information we can do a file swap on a dummy
database and recreate the problem table there.

If you have time to play with this then it might be worthwhile
investigating old checkpoints - recovered to a foreign host - and see at

what stage the thing becomes unreadable.

Best of luck,

Marty
--
Random Duckman Quote #17:
Cornfed - No one else could have created the family you did... I mean
that in a good way.

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

__________________________________________________ ___________________
This message has been checked for all known viruses by bluesource. For
further information visit www.blue-source.com

powered by Messagelabs


************************************************** ********************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. No one else is authorised to distribute, forward,
print, copy or act upon any information contained in this email.
If you have received this email in error, please notify the sender.

Hiscox Syndicates Limited, Hiscox Insurance Company Limited, Hiscox
Connect Limited, Hiscox Underwriting Limited and Hiscox Investment Management Limited are authorised and regulated by the Financial
Services Authority. Hiscox plc is a company registered in England
and Wales under company registration number 2837811 and registered
office at 1 Great St Helen's, London EC3A 6HX
************************************************** ********************


__________________________________________________ ___________________
This message has been checked for all known viruses by blue-source. For further information visit www.blue-source.com

powered by Messagelabs



Reply With Quote
  #9  
Old   
Peter Gale
 
Posts: n/a

Default RE: [Info-ingres] Corrupted tables - 09-07-2005 , 04:37 AM



Hi Jon,

My point about select count(non key column) is that it does not use the
index whereas select count(*) does, hence it might (bold, uppercase,
underlined) work.

Also READLOCK=NOLOCK can cause bad rows to be skipped. I have seen this in
the past because (I believe) the logic goes that when running
READLOCK=NOLOCK you are enabling dirty reads, therefore it is possible that
index pointers and data pages don't match up, therefore skip any errors that
are encountered. Therefore READLOCK=NOLOCK might allow retrieval of all but
the corrupted rows using either COPY or SELECT.

Have a good one.

Peter

T: +44 (0)1398 341777
PGale (AT) Comp-Soln (DOT) co.uk


-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Gibson Jonathan
Sent: 07 September 2005 10:17
To: PGale (AT) comp-soln (DOT) co.uk; martin.bowes (AT) ctsu (DOT) ox.ac.uk;
rob.mckenzie (AT) rogers (DOT) com
Cc: info-ingres (AT) cariboulake (DOT) com
Subject: RE: [Info-ingres] Corrupted tables

Hi Peter,

I doubt a select count will work as the table will fall over on anything
that requires a table scan. Been here before and it's a nightmare.
Using a secondary index could help in rebuilding the table, especially
in conjunction with DM802. Like Marty, I wouldn't trust verifydb in
this situation. The best I think it will do is to lose the bad data but
I doubt it will even get that far. Copydb is not an option, the table
is screwed too much for this work.

If I was Rob, I'd be checking all my settings in config.dat as I'll bet
something is configured beyond the system resources (talking from
experience...)

Cheers
Jon

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Peter Gale
Sent: 07 September 2005 09:32
To: martin.bowes (AT) ctsu (DOT) ox.ac.uk; rob.mckenzie (AT) rogers (DOT) com
Cc: info-ingres (AT) cariboulake (DOT) com
Subject: RE: [Info-ingres] Corrupted tables


Hi Rob,

Not sure if this will help at all but here goes.

SELECT COUNT(non key column)

I pretty sure this will do a table scan and possibly return a valid
count.
This might help to determine if any methods used to extract the rows
have
actually got them all. It might also reveal if the end product of
verifydb
includes all the rows.

Does the table have any secondary indexes? You could query the index
directly to get a row count and even use this to access the base table
without going near the index. You would need to apply restrictions that
would force the secondary to be used such as

SELECT t.*
FROM index i, table t
WHERE i.some_column between min_column_value and max_column_value
AND i.tidp = t.tid

Note: Use of tids is not recommended but if you can be sure this table
is
not being manipulated by anyone else you should be OK>

SET LOCKMODE SESSION WHERE READLOCK=NOLOCK

This might allow you to copy out everything except the bad rows but
probably
produces the same result as DM801, that is if either of them actually
allows
a copy or select to run to an error free conclusion.

Do you have the option to recover from ckpt prior to the corruption (if
you
know when that was) and rfwd the journals?

One other point on DM802 is that if you can get an old copy of the table
(from a previous ckpt) this might allow some reconstruction of rows that
return garbage when this trace point is used. Nothing guaranteed here
and I
strongly advise you don't use the TIDS to try and join old table to
current
table as they are highly likely to be different.

HTH

Peter

T: +44 (0)1398 341777
PGale (AT) Comp-Soln (DOT) co.uk


-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of
martin.bowes (AT) ctsu (DOT) ox.ac.uk
Sent: 07 September 2005 08:54
To: rob.mckenzie (AT) rogers (DOT) com
Cc: info-ingres (AT) cariboulake (DOT) com
Subject: Re: [Info-ingres] Corrupted tables

Hi Rob

Quote:
I've tried to modify and copy out the tables - niether will work. A
select count(*) fails also. A select * of the table will run to a
certain point the crap when it apparently hits the bad addresses.
SOOOOO.... I'm fearful that my only options may be to try the
verifydb.
Yep. Sounds like the remaining option.

Quote:
Karl suggested DM801 (Ithink) - what do you know about this?
dm801 allows the DMF cache to skip bad rows. This will allow
the modify to proceed. But I think this would mean the loss of the bad
rows - I'm not sure.

dm802 would allow the DMF cache to return garbage(??) for
bad rows. Which could allow them to be kept and deleted. It may allow
the modify to work.

My suggestion would be that if you are time constrained on
recovering these tables then the verifydb would be the remaining best
bet.

Furthermore, if you have some time to play later - make a copy
of the underlying unix file that is associated with the table. From
iirelation get a note of the values of relfhdr, relstat, reltups,
relpages
for
the table. With this information we can do a file swap on a dummy
database and recreate the problem table there.

If you have time to play with this then it might be worthwhile
investigating old checkpoints - recovered to a foreign host - and see at

what stage the thing becomes unreadable.

Best of luck,

Marty
--
Random Duckman Quote #17:
Cornfed - No one else could have created the family you did... I mean
that in a good way.

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

__________________________________________________ ___________________
This message has been checked for all known viruses by bluesource. For
further information visit www.blue-source.com

powered by Messagelabs


************************************************** ********************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. No one else is authorised to distribute, forward,
print, copy or act upon any information contained in this email.
If you have received this email in error, please notify the sender.

Hiscox Syndicates Limited, Hiscox Insurance Company Limited, Hiscox
Connect Limited, Hiscox Underwriting Limited and Hiscox Investment
Management Limited are authorised and regulated by the Financial
Services Authority. Hiscox plc is a company registered in England
and Wales under company registration number 2837811 and registered
office at 1 Great St Helen's, London EC3A 6HX
************************************************** ********************


__________________________________________________ ___________________
This message has been checked for all known viruses by blue-source. For
further information visit www.blue-source.com

powered by Messagelabs

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres



Reply With Quote
  #10  
Old   
Oscar Carlés
 
Posts: n/a

Default Re: [Info-ingres] Corrupted tables - 09-07-2005 , 09:25 AM



Jon:
As I understand, this situation (having an incorrectly configured
cache...) could be prevent using cache_sharing=ON. I mean, bassically if
you use cache_sharing=OFF (the default), Ingres use dynamic memory to
allocate cache space on iidbms process environment, note that others
Ingres memory resources like opf, psf, qsf..., use dynamic memory also.
This could overwhelm the available memory and produce the table
corruption. On the other hand, if you turn cache_sharing ON, Ingres use
pre allocate memory (and it checks on ingstart) from OS shared memory.

Regards

Oscar

Gibson Jonathan wrote:

Quote:
Hi Rob,

We've suffered from this in the past and it was caused by having an
incorrectly configured cache. Basically, the cache was slightly
bigger that the memory available (it took tech support several days to
come up with the necessary information on how to configure a large
cache on AIX and even then I wasn't convinced it was right). The
other bad news is that verifydb did not work. It was unable to
address the bad pages and consequently couldn't fix the problem. Even
if it had, I'd say it would be 99.9% certain of losing the data
"associated" with the bad pages. We were forced to restore from a
backup that was valid before the corruption occurred. We managed via
auditdb to work out what new policies were entered in the period the
corruption occurred in but everything else had to manually re-keyed.
I'm sure given enough time and patience, you could replay all your
transactions but if it's an important production database, you won't
be able to afford the downtime. If you have a spare machine, then you
could take a ckp now and roll that onto the spare machine so you can
post process the data whilst you get your system up and running.

If it's just showing one table affected, you might be able to restore
that by itself in some way (OS, table recovery or full recovery to
another database and do a table copy) but you'll need to work out the
logical implications regarding referential integrity of your database.

Regards
Jon

-----Original Message-----
*From:* info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] *On Behalf Of *Rob McKenzie
*Sent:* 06 September 2005 15:34
*To:* info-ingres (AT) cariboulake (DOT) com
*Subject:* [Info-ingres] Corrupted tables

I've been luck enough to walk into, and inherit, an site that is
running an unsupported version of Ingres. My first week here has
handed me three major tables in the production system that are
btree and all are having headaches. CA tech support was nice
enough to advise me to run verifydb which will probably patch the
tables, but we may loose data.

The vesion on Ingres is: Ingres Alpha VMS Version II 2.0/9808
(axp.vms/00)

The errors we are getting are as follows:
E_DM93A7_BAD_FILE_PAGE_ADDR Page 3832 in table a, owner:
proddba, database: prodacw, has an incorrect page number: 4567.
Other page fields: page_stat 00000150,
page_log_address (00000000,00000000),
page_tran_id (0000000000000000).
Corrupted page cannot be read into the server cache.
E_DM9206_BM_BAD_PAGE_NUMBER Page number on page doesn't match
its location.
E_DM920C_BM_BAD_FAULT_PAGE Error faulting a page.
E_DM9C83_DM0P_CACHEFIX_PAGE An error occurred while fixing a
page in the buffer manager.

There are a few others, but they are all similar in nature to the
above. I ran verifydb in report mode and it finds references to
missing pages. What are peoples experiences with problems like
this? Has verifydb in run mode fixed these problems without data
loss and total corruption? I WILL begin working on upgrading
Ingres, but I can't do it today, so in the interim I'd like to get
the users running again.

Thanks,



Rob McKenzie
Phone: (905) 989-1750
Cell: (905) 715-9593
Email: rob.mckenzie (AT) rogers (DOT) com

__________________________________________________ ___________________
This message has been checked for all known viruses by bluesource.
For further information visit www.blue-source.com

powered by Messagelabs



************************************************** ********************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. No one else is authorised to distribute, forward,
print, copy or act upon any information contained in this email.
If you have received this email in error, please notify the sender.

Hiscox Syndicates Limited, Hiscox Insurance Company Limited, Hiscox
Connect Limited, Hiscox Underwriting Limited and Hiscox Investment
Management Limited are authorised and regulated by the Financial
Services Authority. Hiscox plc is a company registered in England
and Wales under company registration number 2837811 and registered
office at 1 Great St Helen's, London EC3A 6HX
************************************************** ********************

__________________________________________________ ___________________
This message has been checked for all known viruses by blue-source.
For further information visit www.blue-source.com

powered by Messagelabs


--
Lic. Oscar Carlés Barriocanal
Director de Servicios al Cliente
Íntegra S.R.L.
Telefax: (595 21) 424473 RA




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.