dbTalk Databases Forums  

[Info-ingres] Problem with online checkpoints for r3 and Linux

comp.databases.ingres comp.databases.ingres


Discuss [Info-ingres] Problem with online checkpoints for r3 and Linux in the comp.databases.ingres forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Michael Leo
 
Posts: n/a

Default [Info-ingres] Problem with online checkpoints for r3 and Linux - 03-15-2005 , 05:40 PM






Gang,

Working with one of our clients, we have discovered that Ingres r3 online
checkpoints on Linux (RH Enterprise 3.0, supported) can occasionally fail
if a table's file is extended while it is being "tarred" up.

We are using the default checkpoint template, so we aren't doing
anything "funny" in cktmpl.def.

Good news here is that you really have to work to make this happen.
You have to insert a ton of records into a fairly large table while
the online checkpoint is backing up that table.

Bad news is the checkpoint doesn't appear to be recorded in the config
file and doesn't show up in the "infodb" output. The failed "dump"
attempt seems to be logged there. Very odd.

Here are the scripts we used to do this:

ftp://ftp.cariboulake.com/test_olcp/test_olcp.zip

You may need to run the script several times to get the error to
occur, but thre

We used a pretty small system and pretty big system to test this:

Small AMD Athon 1GHz single processor, 1.5GB RAM, 70GB IDE disk
Small Pentium Xeon 2.3Ghz dual HT processor, 4GB RAM, 160GB USCSI disk

Last year we saw this error in Solaris with Ingres II 2.6, which
can be mitigated by using the 'correct' version of tar. This
was worked through Ingres technical support under CA Startrak Issue 12399005.

Here is the output of the checkpoint attempt:

Tue Mar 15 18:17:07 2005 CPP: Preparing to checkpoint database: test_olcp
Tue Mar 15 18:17:07 2005 CPP: Preparing stall of database, active xact cnt: 0
Tue Mar 15 18:17:07 2005 CPP: Finished stall of database
beginning checkpoint to disk
/opt/CA/IngresII/ingres/ckp/default/test_olcp of 1 locations
Tue Mar 15 18:17:08 2005 CPP: Start checkpoint of location:
ii_database to disk:
path = '/opt/CA/IngresII/ingres/ckp/default/test_olcp'
file = 'c0002001.ckp'
executing checkpoint to disk
/bin/tar: aaaaaaop.t00: file changed as we read it
/bin/tar: Error exit delayed from previous errors
Tue Mar 15 18:17:09 2005 E_DM1101_CPP_WRITE_ERROR Error writing
checkpoint.
Tue Mar 15 18:17:09 2005 E_DM110B_CPP_FAILED Error occurred
checkpointing the database.

Here is what infodb says:

==================Tue Mar 15 18:19:54 2005 Database
Information=================

Database : (test_olcp,mal) ID : 0x42376CB8 Collation : default
Extents : 5 Last Table Id : 239
Config File Version Id : 0x00060001 Database Version Id : 7
Mode : DDL ALLOWED, ONLINE CHECKPOINT ENABLED
Status : VALID,JOURNAL,DUMP,CFG_BACKUP

The Database has been Checkpointed.
The Database is Journaled.

Journals are valid from checkpoint sequence : 1

----Journal
information---------------------------------------------------------
Checkpoint sequence : 2 Journal sequence : 1
Current journal block : 161 Journal block size : 16384
Initial journal size : 4 Target journal size : 512
Last Log Address Journaled : <1110927916:68089:2560>
----Dump
information------------------------------------------------------------
Checkpoint sequence : 2 Dump sequence : 1
Current dump block : 0 Dump block size : 16384
Initial dump size : 4 Target dump size : 512
Last Log Address Dumped : <0:0:0>
----Checkpoint History for
Journal----------------------------------------------
Date Ckp_sequence First_jnl Last_jnl valid mode

----------------------------------------------------------------------------
Tue Mar 15 18:17:06 2005 1 1 1
1 OFFLINE
----Checkpoint History for
Dump-------------------------------------------------
Date Ckp_sequence First_dmp Last_dmp valid mode

----------------------------------------------------------------------------
Tue Mar 15 18:17:07 2005 2 0 0 0 ONLINE
----Cluster Journal
History-----------------------------------------------------
Node ID Current Journal Current Block Last Log Address
------------------------------------------------------------
None.
----Extent
directory------------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------------------
ii_database ROOT,DATA
/opt/CA/IngresII/ingres/data/default/test_olcp
ii_journal JOURNAL
/opt/CA/IngresII/ingres/jnl/default/test_olcp
ii_checkpoint CHECKPOINT
/opt/CA/IngresII/ingres/ckp/default/test_olcp
ii_dump DUMP
/opt/CA/IngresII/ingres/dmp/default/test_olcp
ii_work WORK
/opt/CA/IngresII/ingres/work/default/test_olcp
================================================== ==============================

You can see that it doesn't record the status of checkpoint #2, but
it does record the dump status of checkpoint #2.

Anyone else seen this. Got a better suggestion than using tar for
backup on RH 3.0?

Cheers,

--
Michael Leo Java, J2EE, BEA WebLogic,
Caribou Lake LLC Oracle, Open Source, Ingres,
mleo (AT) cariboulake (DOT) com Real Enterprise Applications

Climate is what we expect. Weather is what we get.
- Mark Twain

Reply With Quote
  #2  
Old   
Armand Pirvu \(yahoo\)
 
Posts: n/a

Default RE: [Info-ingres] Problem with online checkpoints for r3 and Linux - 03-15-2005 , 06:03 PM






Hi Mike,

I got into some similar things, on AIX though.
Here is what it would be worthwhile trying.
In ckptmpl.def, I changed this way

BSTD: /bin/echo beginning checkpoint to tape %C of %N locations
BSDD: /bin/echo beginning checkpoint to disk %C of %N locations
ESTD: /bin/echo ending checkpoint to tape %C of %N locations
ESDD: /bin/echo ending checkpoint to disk %C of %N locations
PSTD: /bin/echo mount tape %M and press return; read foo;
WSTD: cd %D; iitar cbf 20 %C .
PSDD: /bin/echo executing checkpoint to disk
WSDD: cd %D; iitar cf %A .


And in $II_SYSTEM/ingres/utility it is iitar which overcomes a tar
stupid message that says file has changed size.

#!/usr/bin/ksh
#
# Script to set the exit code on tar
# when a file size change event occurs
#
# Set locale to C so error will be found
#
LC_ALL=C
LANG=C
#
# Set source directories for commands
#
TARSOURCE=/usr/bin/tar
GREPSOURCE=/usr/bin/grep
#
# Create file for errors to be written too.
#
touch /tmp/errorlog
#
# Start tar command with errors redirected to log file
#
$TARSOURCE "$@" 2> /tmp/errorlog
if [ -f /tmp/errorlog ] ;then
$GREPSOURCE "file changed size" /tmp/errorlog
echo $? > /tmp/status
if [ -f /tmp/status -a /tmp/status = 0 ] ;then
exit 0
fi
fi
exit

Maybe it works for you as well.

Cheers,
Armand

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Michael Leo
Sent: Tuesday, March 15, 2005 5:40 PM
To: info-ingres (AT) cariboulake (DOT) com
Subject: [Info-ingres] Problem with online checkpoints for r3 and Linux

Gang,

Working with one of our clients, we have discovered that Ingres r3
online
checkpoints on Linux (RH Enterprise 3.0, supported) can occasionally
fail
if a table's file is extended while it is being "tarred" up.

We are using the default checkpoint template, so we aren't doing
anything "funny" in cktmpl.def.

Good news here is that you really have to work to make this happen.
You have to insert a ton of records into a fairly large table while
the online checkpoint is backing up that table.

Bad news is the checkpoint doesn't appear to be recorded in the config
file and doesn't show up in the "infodb" output. The failed "dump"
attempt seems to be logged there. Very odd.

Here are the scripts we used to do this:

ftp://ftp.cariboulake.com/test_olcp/test_olcp.zip

You may need to run the script several times to get the error to
occur, but thre

We used a pretty small system and pretty big system to test this:

Small AMD Athon 1GHz single processor, 1.5GB RAM, 70GB IDE disk
Small Pentium Xeon 2.3Ghz dual HT processor, 4GB RAM, 160GB USCSI
disk

Last year we saw this error in Solaris with Ingres II 2.6, which
can be mitigated by using the 'correct' version of tar. This
was worked through Ingres technical support under CA Startrak Issue
12399005.

Here is the output of the checkpoint attempt:

Tue Mar 15 18:17:07 2005 CPP: Preparing to checkpoint database:
test_olcp
Tue Mar 15 18:17:07 2005 CPP: Preparing stall of database, active xact
cnt: 0
Tue Mar 15 18:17:07 2005 CPP: Finished stall of database
beginning checkpoint to disk
/opt/CA/IngresII/ingres/ckp/default/test_olcp of 1 locations
Tue Mar 15 18:17:08 2005 CPP: Start checkpoint of location:
ii_database to disk:
path = '/opt/CA/IngresII/ingres/ckp/default/test_olcp'
file = 'c0002001.ckp'
executing checkpoint to disk
/bin/tar: aaaaaaop.t00: file changed as we read it
/bin/tar: Error exit delayed from previous errors
Tue Mar 15 18:17:09 2005 E_DM1101_CPP_WRITE_ERROR Error writing
checkpoint.
Tue Mar 15 18:17:09 2005 E_DM110B_CPP_FAILED Error occurred
checkpointing the database.

Here is what infodb says:

==================Tue Mar 15 18:19:54 2005 Database
Information=================

Database : (test_olcp,mal) ID : 0x42376CB8 Collation : default
Extents : 5 Last Table Id : 239
Config File Version Id : 0x00060001 Database Version Id : 7
Mode : DDL ALLOWED, ONLINE CHECKPOINT ENABLED
Status : VALID,JOURNAL,DUMP,CFG_BACKUP

The Database has been Checkpointed.
The Database is Journaled.

Journals are valid from checkpoint sequence : 1

----Journal
information---------------------------------------------------------
Checkpoint sequence : 2 Journal sequence :
1
Current journal block : 161 Journal block size :
16384
Initial journal size : 4 Target journal size :
512
Last Log Address Journaled : <1110927916:68089:2560>
----Dump
information------------------------------------------------------------
Checkpoint sequence : 2 Dump sequence :
1
Current dump block : 0 Dump block size :
16384
Initial dump size : 4 Target dump size :
512
Last Log Address Dumped : <0:0:0>
----Checkpoint History for
Journal----------------------------------------------
Date Ckp_sequence First_jnl Last_jnl valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:06 2005 1 1 1
1 OFFLINE
----Checkpoint History for
Dump-------------------------------------------------
Date Ckp_sequence First_dmp Last_dmp valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:07 2005 2 0 0 0
ONLINE
----Cluster Journal
History-----------------------------------------------------
Node ID Current Journal Current Block Last Log Address
------------------------------------------------------------
None.
----Extent
directory------------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------------------
ii_database ROOT,DATA
/opt/CA/IngresII/ingres/data/default/test_olcp
ii_journal JOURNAL
/opt/CA/IngresII/ingres/jnl/default/test_olcp
ii_checkpoint CHECKPOINT
/opt/CA/IngresII/ingres/ckp/default/test_olcp
ii_dump DUMP
/opt/CA/IngresII/ingres/dmp/default/test_olcp
ii_work WORK
/opt/CA/IngresII/ingres/work/default/test_olcp
================================================== ======================
========

You can see that it doesn't record the status of checkpoint #2, but
it does record the dump status of checkpoint #2.

Anyone else seen this. Got a better suggestion than using tar for
backup on RH 3.0?

Cheers,

--
Michael Leo Java, J2EE, BEA WebLogic,
Caribou Lake LLC Oracle, Open Source, Ingres,
mleo (AT) cariboulake (DOT) com Real Enterprise Applications

Climate is what we expect. Weather is what we get.
- Mark Twain
_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres


Reply With Quote
  #3  
Old   
Michael Leo
 
Posts: n/a

Default RE: [Info-ingres] Problem with online checkpoints for r3 and Linux - 03-15-2005 , 07:03 PM




Armand,

That is the perfect solution, except the iitar is not shipped
with Ingres II r3 on Linux.

Methinks I will need to open a call with CA and see if they can
slip that file into an official version so the customer is
comfortable it will be supported.

Thanks!

Mike Leo


At 6:03 PM -0600 3/15/05, Armand Pirvu \(yahoo\) wrote:
Quote:
Hi Mike,

I got into some similar things, on AIX though.
Here is what it would be worthwhile trying.
In ckptmpl.def, I changed this way

BSTD: /bin/echo beginning checkpoint to tape %C of %N locations
BSDD: /bin/echo beginning checkpoint to disk %C of %N locations
ESTD: /bin/echo ending checkpoint to tape %C of %N locations
ESDD: /bin/echo ending checkpoint to disk %C of %N locations
PSTD: /bin/echo mount tape %M and press return; read foo;
WSTD: cd %D; iitar cbf 20 %C .
PSDD: /bin/echo executing checkpoint to disk
WSDD: cd %D; iitar cf %A .


And in $II_SYSTEM/ingres/utility it is iitar which overcomes a tar
stupid message that says file has changed size.

#!/usr/bin/ksh
#
# Script to set the exit code on tar
# when a file size change event occurs
#
# Set locale to C so error will be found
#
LC_ALL=C
LANG=C
#
# Set source directories for commands
#
TARSOURCE=/usr/bin/tar
GREPSOURCE=/usr/bin/grep
#
# Create file for errors to be written too.
#
touch /tmp/errorlog
#
# Start tar command with errors redirected to log file
#
$TARSOURCE "$@" 2> /tmp/errorlog
if [ -f /tmp/errorlog ] ;then
$GREPSOURCE "file changed size" /tmp/errorlog
echo $? > /tmp/status
if [ -f /tmp/status -a /tmp/status = 0 ] ;then
exit 0
fi
fi
exit

Maybe it works for you as well.

Cheers,
Armand

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Michael Leo
Sent: Tuesday, March 15, 2005 5:40 PM
To: info-ingres (AT) cariboulake (DOT) com
Subject: [Info-ingres] Problem with online checkpoints for r3 and Linux

Gang,

Working with one of our clients, we have discovered that Ingres r3
online
checkpoints on Linux (RH Enterprise 3.0, supported) can occasionally
fail
if a table's file is extended while it is being "tarred" up.

We are using the default checkpoint template, so we aren't doing
anything "funny" in cktmpl.def.

Good news here is that you really have to work to make this happen.
You have to insert a ton of records into a fairly large table while
the online checkpoint is backing up that table.

Bad news is the checkpoint doesn't appear to be recorded in the config
file and doesn't show up in the "infodb" output. The failed "dump"
attempt seems to be logged there. Very odd.

Here are the scripts we used to do this:

ftp://ftp.cariboulake.com/test_olcp/test_olcp.zip

You may need to run the script several times to get the error to
occur, but thre

We used a pretty small system and pretty big system to test this:

Small AMD Athon 1GHz single processor, 1.5GB RAM, 70GB IDE disk
Small Pentium Xeon 2.3Ghz dual HT processor, 4GB RAM, 160GB USCSI
disk

Last year we saw this error in Solaris with Ingres II 2.6, which
can be mitigated by using the 'correct' version of tar. This
was worked through Ingres technical support under CA Startrak Issue
12399005.

Here is the output of the checkpoint attempt:

Tue Mar 15 18:17:07 2005 CPP: Preparing to checkpoint database:
test_olcp
Tue Mar 15 18:17:07 2005 CPP: Preparing stall of database, active xact
cnt: 0
Tue Mar 15 18:17:07 2005 CPP: Finished stall of database
beginning checkpoint to disk
/opt/CA/IngresII/ingres/ckp/default/test_olcp of 1 locations
Tue Mar 15 18:17:08 2005 CPP: Start checkpoint of location:
ii_database to disk:
path = '/opt/CA/IngresII/ingres/ckp/default/test_olcp'
file = 'c0002001.ckp'
executing checkpoint to disk
/bin/tar: aaaaaaop.t00: file changed as we read it
/bin/tar: Error exit delayed from previous errors
Tue Mar 15 18:17:09 2005 E_DM1101_CPP_WRITE_ERROR Error writing
checkpoint.
Tue Mar 15 18:17:09 2005 E_DM110B_CPP_FAILED Error occurred
checkpointing the database.

Here is what infodb says:

==================Tue Mar 15 18:19:54 2005 Database
Information=================

Database : (test_olcp,mal) ID : 0x42376CB8 Collation : default
Extents : 5 Last Table Id : 239
Config File Version Id : 0x00060001 Database Version Id : 7
Mode : DDL ALLOWED, ONLINE CHECKPOINT ENABLED
Status : VALID,JOURNAL,DUMP,CFG_BACKUP

The Database has been Checkpointed.
The Database is Journaled.

Journals are valid from checkpoint sequence : 1

----Journal
information---------------------------------------------------------
Checkpoint sequence : 2 Journal sequence :
1
Current journal block : 161 Journal block size :
16384
Initial journal size : 4 Target journal size :
512
Last Log Address Journaled : <1110927916:68089:2560
----Dump
information------------------------------------------------------------
Checkpoint sequence : 2 Dump sequence :
1
Current dump block : 0 Dump block size :
16384
Initial dump size : 4 Target dump size :
512
Last Log Address Dumped : <0:0:0
----Checkpoint History for
Journal----------------------------------------------
Date Ckp_sequence First_jnl Last_jnl valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:06 2005 1 1 1
1 OFFLINE
----Checkpoint History for
Dump-------------------------------------------------
Date Ckp_sequence First_dmp Last_dmp valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:07 2005 2 0 0 0
ONLINE
----Cluster Journal
History-----------------------------------------------------
Node ID Current Journal Current Block Last Log Address
------------------------------------------------------------
None.
----Extent
directory------------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------------------
ii_database ROOT,DATA
/opt/CA/IngresII/ingres/data/default/test_olcp
ii_journal JOURNAL
/opt/CA/IngresII/ingres/jnl/default/test_olcp
ii_checkpoint CHECKPOINT
/opt/CA/IngresII/ingres/ckp/default/test_olcp
ii_dump DUMP
/opt/CA/IngresII/ingres/dmp/default/test_olcp
ii_work WORK
/opt/CA/IngresII/ingres/work/default/test_olcp
================================================== ======================
========

You can see that it doesn't record the status of checkpoint #2, but
it does record the dump status of checkpoint #2.

Anyone else seen this. Got a better suggestion than using tar for
backup on RH 3.0?

Cheers,

--
Michael Leo Java, J2EE, BEA WebLogic,
Caribou Lake LLC Oracle, Open Source, Ingres,
mleo (AT) cariboulake (DOT) com Real Enterprise Applications

Climate is what we expect. Weather is what we get.
- Mark Twain
_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres



Reply With Quote
  #4  
Old   
Armand Pirvu \(yahoo\)
 
Posts: n/a

Default RE: [Info-ingres] Problem with online checkpoints for r3 and Linux - 03-15-2005 , 07:47 PM



Hi Mike,

Glad it works.
Personally I am not that thrilled with tar and I would use cpio.
I found it more reliable.
I could send you a checkpoint template file based on cpio if you need.

Cheers,
Armand

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Michael Leo
Sent: Tuesday, March 15, 2005 7:04 PM
To: Armand Pirvu (yahoo); 'Michael Leo'; info-ingres (AT) cariboulake (DOT) com
Subject: RE: [Info-ingres] Problem with online checkpoints for r3 and
Linux


Armand,

That is the perfect solution, except the iitar is not shipped
with Ingres II r3 on Linux.

Methinks I will need to open a call with CA and see if they can
slip that file into an official version so the customer is
comfortable it will be supported.

Thanks!

Mike Leo


At 6:03 PM -0600 3/15/05, Armand Pirvu \(yahoo\) wrote:
Quote:
Hi Mike,

I got into some similar things, on AIX though.
Here is what it would be worthwhile trying.
In ckptmpl.def, I changed this way

BSTD: /bin/echo beginning checkpoint to tape %C of %N locations
BSDD: /bin/echo beginning checkpoint to disk %C of %N locations
ESTD: /bin/echo ending checkpoint to tape %C of %N locations
ESDD: /bin/echo ending checkpoint to disk %C of %N locations
PSTD: /bin/echo mount tape %M and press return; read foo;
WSTD: cd %D; iitar cbf 20 %C .
PSDD: /bin/echo executing checkpoint to disk
WSDD: cd %D; iitar cf %A .


And in $II_SYSTEM/ingres/utility it is iitar which overcomes a tar
stupid message that says file has changed size.

#!/usr/bin/ksh
#
# Script to set the exit code on tar
# when a file size change event occurs
#
# Set locale to C so error will be found
#
LC_ALL=C
LANG=C
#
# Set source directories for commands
#
TARSOURCE=/usr/bin/tar
GREPSOURCE=/usr/bin/grep
#
# Create file for errors to be written too.
#
touch /tmp/errorlog
#
# Start tar command with errors redirected to log file
#
$TARSOURCE "$@" 2> /tmp/errorlog
if [ -f /tmp/errorlog ] ;then
$GREPSOURCE "file changed size" /tmp/errorlog
echo $? > /tmp/status
if [ -f /tmp/status -a /tmp/status = 0 ] ;then
exit 0
fi
fi
exit

Maybe it works for you as well.

Cheers,
Armand

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Michael Leo
Sent: Tuesday, March 15, 2005 5:40 PM
To: info-ingres (AT) cariboulake (DOT) com
Subject: [Info-ingres] Problem with online checkpoints for r3 and Linux

Gang,

Working with one of our clients, we have discovered that Ingres r3
online
checkpoints on Linux (RH Enterprise 3.0, supported) can occasionally
fail
if a table's file is extended while it is being "tarred" up.

We are using the default checkpoint template, so we aren't doing
anything "funny" in cktmpl.def.

Good news here is that you really have to work to make this happen.
You have to insert a ton of records into a fairly large table while
the online checkpoint is backing up that table.

Bad news is the checkpoint doesn't appear to be recorded in the config
file and doesn't show up in the "infodb" output. The failed "dump"
attempt seems to be logged there. Very odd.

Here are the scripts we used to do this:

ftp://ftp.cariboulake.com/test_olcp/test_olcp.zip

You may need to run the script several times to get the error to
occur, but thre

We used a pretty small system and pretty big system to test this:

Small AMD Athon 1GHz single processor, 1.5GB RAM, 70GB IDE disk
Small Pentium Xeon 2.3Ghz dual HT processor, 4GB RAM, 160GB USCSI
disk

Last year we saw this error in Solaris with Ingres II 2.6, which
can be mitigated by using the 'correct' version of tar. This
was worked through Ingres technical support under CA Startrak Issue
12399005.

Here is the output of the checkpoint attempt:

Tue Mar 15 18:17:07 2005 CPP: Preparing to checkpoint database:
test_olcp
Tue Mar 15 18:17:07 2005 CPP: Preparing stall of database, active xact
cnt: 0
Tue Mar 15 18:17:07 2005 CPP: Finished stall of database
beginning checkpoint to disk
/opt/CA/IngresII/ingres/ckp/default/test_olcp of 1 locations
Tue Mar 15 18:17:08 2005 CPP: Start checkpoint of location:
ii_database to disk:
path = '/opt/CA/IngresII/ingres/ckp/default/test_olcp'
file = 'c0002001.ckp'
executing checkpoint to disk
/bin/tar: aaaaaaop.t00: file changed as we read it
/bin/tar: Error exit delayed from previous errors
Tue Mar 15 18:17:09 2005 E_DM1101_CPP_WRITE_ERROR Error writing
checkpoint.
Tue Mar 15 18:17:09 2005 E_DM110B_CPP_FAILED Error occurred
checkpointing the database.

Here is what infodb says:

==================Tue Mar 15 18:19:54 2005 Database
Information=================

Database : (test_olcp,mal) ID : 0x42376CB8 Collation : default
Extents : 5 Last Table Id : 239
Config File Version Id : 0x00060001 Database Version Id : 7
Mode : DDL ALLOWED, ONLINE CHECKPOINT ENABLED
Status : VALID,JOURNAL,DUMP,CFG_BACKUP

The Database has been Checkpointed.
The Database is Journaled.

Journals are valid from checkpoint sequence : 1

----Journal
information---------------------------------------------------------
Checkpoint sequence : 2 Journal sequence :
1
Current journal block : 161 Journal block size :
16384
Initial journal size : 4 Target journal size :
512
Last Log Address Journaled : <1110927916:68089:2560
----Dump
information------------------------------------------------------------
Checkpoint sequence : 2 Dump sequence :
1
Current dump block : 0 Dump block size :
16384
Initial dump size : 4 Target dump size :
512
Last Log Address Dumped : <0:0:0
----Checkpoint History for
Journal----------------------------------------------
Date Ckp_sequence First_jnl Last_jnl
valid
mode

-----------------------------------------------------------------------
-
----
Tue Mar 15 18:17:06 2005 1 1 1
1 OFFLINE
----Checkpoint History for
Dump-------------------------------------------------
Date Ckp_sequence First_dmp Last_dmp
valid
mode

-----------------------------------------------------------------------
-
----
Tue Mar 15 18:17:07 2005 2 0 0
0
ONLINE
----Cluster Journal
History-----------------------------------------------------
Node ID Current Journal Current Block Last Log Address
------------------------------------------------------------
None.
----Extent
directory------------------------------------------------------------
Location Flags Physical_path

------------------------------------------------------------------
ii_database ROOT,DATA
/opt/CA/IngresII/ingres/data/default/test_olcp
ii_journal JOURNAL
/opt/CA/IngresII/ingres/jnl/default/test_olcp
ii_checkpoint CHECKPOINT
/opt/CA/IngresII/ingres/ckp/default/test_olcp
ii_dump DUMP
/opt/CA/IngresII/ingres/dmp/default/test_olcp
ii_work WORK
/opt/CA/IngresII/ingres/work/default/test_olcp
================================================== =====================
=
========

You can see that it doesn't record the status of checkpoint #2, but
it does record the dump status of checkpoint #2.

Anyone else seen this. Got a better suggestion than using tar for
backup on RH 3.0?

Cheers,

--
Michael Leo Java, J2EE, BEA WebLogic,
Caribou Lake LLC Oracle, Open Source, Ingres,
mleo (AT) cariboulake (DOT) com Real Enterprise Applications

Climate is what we expect. Weather is what we get.
- Mark Twain
_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres



Reply With Quote
  #5  
Old   
Paul White
 
Posts: n/a

Default RE: [Info-ingres] Problem with online checkpoints for r3 and Linux - 03-15-2005 , 10:02 PM



I think this is what you need
TEC317591

Paul

-----Original Message-----
From: Michael Leo [mailto:mleo (AT) cariboulake (DOT) com]
Sent: Wednesday, 16 March 2005 12:04 PM
To: Armand Pirvu (yahoo); 'Michael Leo'; info-ingres (AT) cariboulake (DOT) com
Subject: RE: [Info-ingres] Problem with online checkpoints for r3 and
Linux



Armand,

That is the perfect solution, except the iitar is not shipped
with Ingres II r3 on Linux.

Methinks I will need to open a call with CA and see if they can
slip that file into an official version so the customer is
comfortable it will be supported.

Thanks!

Mike Leo


At 6:03 PM -0600 3/15/05, Armand Pirvu \(yahoo\) wrote:
Quote:
Hi Mike,

I got into some similar things, on AIX though.
Here is what it would be worthwhile trying.
In ckptmpl.def, I changed this way

BSTD: /bin/echo beginning checkpoint to tape %C of %N locations
BSDD: /bin/echo beginning checkpoint to disk %C of %N locations
ESTD: /bin/echo ending checkpoint to tape %C of %N locations
ESDD: /bin/echo ending checkpoint to disk %C of %N locations
PSTD: /bin/echo mount tape %M and press return; read foo;
WSTD: cd %D; iitar cbf 20 %C .
PSDD: /bin/echo executing checkpoint to disk
WSDD: cd %D; iitar cf %A .


And in $II_SYSTEM/ingres/utility it is iitar which overcomes a tar
stupid message that says file has changed size.

#!/usr/bin/ksh
#
# Script to set the exit code on tar
# when a file size change event occurs
#
# Set locale to C so error will be found
#
LC_ALL=C
LANG=C
#
# Set source directories for commands
#
TARSOURCE=/usr/bin/tar
GREPSOURCE=/usr/bin/grep
#
# Create file for errors to be written too.
#
touch /tmp/errorlog
#
# Start tar command with errors redirected to log file
#
$TARSOURCE "$@" 2> /tmp/errorlog
if [ -f /tmp/errorlog ] ;then
$GREPSOURCE "file changed size" /tmp/errorlog
echo $? > /tmp/status
if [ -f /tmp/status -a /tmp/status = 0 ] ;then
exit 0
fi
fi
exit

Maybe it works for you as well.

Cheers,
Armand

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Michael Leo
Sent: Tuesday, March 15, 2005 5:40 PM
To: info-ingres (AT) cariboulake (DOT) com
Subject: [Info-ingres] Problem with online checkpoints for r3 and Linux

Gang,

Working with one of our clients, we have discovered that Ingres r3
online
checkpoints on Linux (RH Enterprise 3.0, supported) can occasionally
fail
if a table's file is extended while it is being "tarred" up.

We are using the default checkpoint template, so we aren't doing
anything "funny" in cktmpl.def.

Good news here is that you really have to work to make this happen.
You have to insert a ton of records into a fairly large table while
the online checkpoint is backing up that table.

Bad news is the checkpoint doesn't appear to be recorded in the config
file and doesn't show up in the "infodb" output. The failed "dump"
attempt seems to be logged there. Very odd.

Here are the scripts we used to do this:

ftp://ftp.cariboulake.com/test_olcp/test_olcp.zip

You may need to run the script several times to get the error to
occur, but thre

We used a pretty small system and pretty big system to test this:

Small AMD Athon 1GHz single processor, 1.5GB RAM, 70GB IDE disk
Small Pentium Xeon 2.3Ghz dual HT processor, 4GB RAM, 160GB USCSI
disk

Last year we saw this error in Solaris with Ingres II 2.6, which
can be mitigated by using the 'correct' version of tar. This
was worked through Ingres technical support under CA Startrak Issue
12399005.

Here is the output of the checkpoint attempt:

Tue Mar 15 18:17:07 2005 CPP: Preparing to checkpoint database:
test_olcp
Tue Mar 15 18:17:07 2005 CPP: Preparing stall of database, active xact
cnt: 0
Tue Mar 15 18:17:07 2005 CPP: Finished stall of database
beginning checkpoint to disk
/opt/CA/IngresII/ingres/ckp/default/test_olcp of 1 locations
Tue Mar 15 18:17:08 2005 CPP: Start checkpoint of location:
ii_database to disk:
path = '/opt/CA/IngresII/ingres/ckp/default/test_olcp'
file = 'c0002001.ckp'
executing checkpoint to disk
/bin/tar: aaaaaaop.t00: file changed as we read it
/bin/tar: Error exit delayed from previous errors
Tue Mar 15 18:17:09 2005 E_DM1101_CPP_WRITE_ERROR Error writing
checkpoint.
Tue Mar 15 18:17:09 2005 E_DM110B_CPP_FAILED Error occurred
checkpointing the database.

Here is what infodb says:

==================Tue Mar 15 18:19:54 2005 Database
Information=================

Database : (test_olcp,mal) ID : 0x42376CB8 Collation : default
Extents : 5 Last Table Id : 239
Config File Version Id : 0x00060001 Database Version Id : 7
Mode : DDL ALLOWED, ONLINE CHECKPOINT ENABLED
Status : VALID,JOURNAL,DUMP,CFG_BACKUP

The Database has been Checkpointed.
The Database is Journaled.

Journals are valid from checkpoint sequence : 1

----Journal
information---------------------------------------------------------
Checkpoint sequence : 2 Journal sequence :
1
Current journal block : 161 Journal block size :
16384
Initial journal size : 4 Target journal size :
512
Last Log Address Journaled : <1110927916:68089:2560
----Dump
information------------------------------------------------------------
Checkpoint sequence : 2 Dump sequence :
1
Current dump block : 0 Dump block size :
16384
Initial dump size : 4 Target dump size :
512
Last Log Address Dumped : <0:0:0
----Checkpoint History for
Journal----------------------------------------------
Date Ckp_sequence First_jnl Last_jnl valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:06 2005 1 1 1
1 OFFLINE
----Checkpoint History for
Dump-------------------------------------------------
Date Ckp_sequence First_dmp Last_dmp valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:07 2005 2 0 0 0
ONLINE
----Cluster Journal
History-----------------------------------------------------
Node ID Current Journal Current Block Last Log Address
------------------------------------------------------------
None.
----Extent
directory------------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------------------
ii_database ROOT,DATA
/opt/CA/IngresII/ingres/data/default/test_olcp
ii_journal JOURNAL
/opt/CA/IngresII/ingres/jnl/default/test_olcp
ii_checkpoint CHECKPOINT
/opt/CA/IngresII/ingres/ckp/default/test_olcp
ii_dump DUMP
/opt/CA/IngresII/ingres/dmp/default/test_olcp
ii_work WORK
/opt/CA/IngresII/ingres/work/default/test_olcp
================================================== ======================
========

You can see that it doesn't record the status of checkpoint #2, but
it does record the dump status of checkpoint #2.

Anyone else seen this. Got a better suggestion than using tar for
backup on RH 3.0?

Cheers,

--
Michael Leo Java, J2EE, BEA WebLogic,
Caribou Lake LLC Oracle, Open Source, Ingres,
mleo (AT) cariboulake (DOT) com Real Enterprise Applications

Climate is what we expect. Weather is what we get.
- Mark Twain
_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres



Reply With Quote
  #6  
Old   
Michael Leo
 
Posts: n/a

Default RE: [Info-ingres] Problem with online checkpoints for r3 and Linux - 03-16-2005 , 08:51 AM



Gang,

Thanks to Paul White and Armand Pirvu, I was led by the nose to this
document:

http://supportconnect.ca.com/sc/kb/t...reen=kbresults

I would suggest you worry about this on your platform if you use
online checkpoints.

Cheers,

Mike Leo

At 3:02 PM +1100 3/16/05, Paul White wrote:
Quote:
I think this is what you need
TEC317591

Paul

-----Original Message-----
From: Michael Leo [mailto:mleo (AT) cariboulake (DOT) com]
Sent: Wednesday, 16 March 2005 12:04 PM
To: Armand Pirvu (yahoo); 'Michael Leo'; info-ingres (AT) cariboulake (DOT) com
Subject: RE: [Info-ingres] Problem with online checkpoints for r3 and
Linux



Armand,

That is the perfect solution, except the iitar is not shipped
with Ingres II r3 on Linux.

Methinks I will need to open a call with CA and see if they can
slip that file into an official version so the customer is
comfortable it will be supported.

Thanks!

Mike Leo


At 6:03 PM -0600 3/15/05, Armand Pirvu \(yahoo\) wrote:
Hi Mike,

I got into some similar things, on AIX though.
Here is what it would be worthwhile trying.
In ckptmpl.def, I changed this way

BSTD: /bin/echo beginning checkpoint to tape %C of %N locations
BSDD: /bin/echo beginning checkpoint to disk %C of %N locations
ESTD: /bin/echo ending checkpoint to tape %C of %N locations
ESDD: /bin/echo ending checkpoint to disk %C of %N locations
PSTD: /bin/echo mount tape %M and press return; read foo;
WSTD: cd %D; iitar cbf 20 %C .
PSDD: /bin/echo executing checkpoint to disk
WSDD: cd %D; iitar cf %A .


And in $II_SYSTEM/ingres/utility it is iitar which overcomes a tar
stupid message that says file has changed size.

#!/usr/bin/ksh
#
# Script to set the exit code on tar
# when a file size change event occurs
#
# Set locale to C so error will be found
#
LC_ALL=C
LANG=C
#
# Set source directories for commands
#
TARSOURCE=/usr/bin/tar
GREPSOURCE=/usr/bin/grep
#
# Create file for errors to be written too.
#
touch /tmp/errorlog
#
# Start tar command with errors redirected to log file
#
$TARSOURCE "$@" 2> /tmp/errorlog
if [ -f /tmp/errorlog ] ;then
$GREPSOURCE "file changed size" /tmp/errorlog
echo $? > /tmp/status
if [ -f /tmp/status -a /tmp/status = 0 ] ;then
exit 0
fi
fi
exit

Maybe it works for you as well.

Cheers,
Armand

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Michael Leo
Sent: Tuesday, March 15, 2005 5:40 PM
To: info-ingres (AT) cariboulake (DOT) com
Subject: [Info-ingres] Problem with online checkpoints for r3 and Linux

Gang,

Working with one of our clients, we have discovered that Ingres r3
online
checkpoints on Linux (RH Enterprise 3.0, supported) can occasionally
fail
if a table's file is extended while it is being "tarred" up.

We are using the default checkpoint template, so we aren't doing
anything "funny" in cktmpl.def.

Good news here is that you really have to work to make this happen.
You have to insert a ton of records into a fairly large table while
the online checkpoint is backing up that table.

Bad news is the checkpoint doesn't appear to be recorded in the config
file and doesn't show up in the "infodb" output. The failed "dump"
attempt seems to be logged there. Very odd.

Here are the scripts we used to do this:

ftp://ftp.cariboulake.com/test_olcp/test_olcp.zip

You may need to run the script several times to get the error to
occur, but thre

We used a pretty small system and pretty big system to test this:

Small AMD Athon 1GHz single processor, 1.5GB RAM, 70GB IDE disk
Small Pentium Xeon 2.3Ghz dual HT processor, 4GB RAM, 160GB USCSI
disk

Last year we saw this error in Solaris with Ingres II 2.6, which
can be mitigated by using the 'correct' version of tar. This
was worked through Ingres technical support under CA Startrak Issue
12399005.

Here is the output of the checkpoint attempt:

Tue Mar 15 18:17:07 2005 CPP: Preparing to checkpoint database:
test_olcp
Tue Mar 15 18:17:07 2005 CPP: Preparing stall of database, active xact
cnt: 0
Tue Mar 15 18:17:07 2005 CPP: Finished stall of database
beginning checkpoint to disk
/opt/CA/IngresII/ingres/ckp/default/test_olcp of 1 locations
Tue Mar 15 18:17:08 2005 CPP: Start checkpoint of location:
ii_database to disk:
path = '/opt/CA/IngresII/ingres/ckp/default/test_olcp'
file = 'c0002001.ckp'
executing checkpoint to disk
/bin/tar: aaaaaaop.t00: file changed as we read it
/bin/tar: Error exit delayed from previous errors
Tue Mar 15 18:17:09 2005 E_DM1101_CPP_WRITE_ERROR Error writing
checkpoint.
Tue Mar 15 18:17:09 2005 E_DM110B_CPP_FAILED Error occurred
checkpointing the database.

Here is what infodb says:

==================Tue Mar 15 18:19:54 2005 Database
Information=================

Database : (test_olcp,mal) ID : 0x42376CB8 Collation : default
Extents : 5 Last Table Id : 239
Config File Version Id : 0x00060001 Database Version Id : 7
Mode : DDL ALLOWED, ONLINE CHECKPOINT ENABLED
Status : VALID,JOURNAL,DUMP,CFG_BACKUP

The Database has been Checkpointed.
The Database is Journaled.

Journals are valid from checkpoint sequence : 1

----Journal
information---------------------------------------------------------
Checkpoint sequence : 2 Journal sequence :
1
Current journal block : 161 Journal block size :
16384
Initial journal size : 4 Target journal size :
512
Last Log Address Journaled : <1110927916:68089:2560
----Dump
information------------------------------------------------------------
Checkpoint sequence : 2 Dump sequence :
1
Current dump block : 0 Dump block size :
16384
Initial dump size : 4 Target dump size :
512
Last Log Address Dumped : <0:0:0
----Checkpoint History for
Journal----------------------------------------------
Date Ckp_sequence First_jnl Last_jnl valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:06 2005 1 1 1
1 OFFLINE
----Checkpoint History for
Dump-------------------------------------------------
Date Ckp_sequence First_dmp Last_dmp valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:07 2005 2 0 0 0
ONLINE
----Cluster Journal
History-----------------------------------------------------
Node ID Current Journal Current Block Last Log Address
------------------------------------------------------------
None.
----Extent
directory------------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------------------
ii_database ROOT,DATA
/opt/CA/IngresII/ingres/data/default/test_olcp
ii_journal JOURNAL
/opt/CA/IngresII/ingres/jnl/default/test_olcp
ii_checkpoint CHECKPOINT
/opt/CA/IngresII/ingres/ckp/default/test_olcp
ii_dump DUMP
/opt/CA/IngresII/ingres/dmp/default/test_olcp
ii_work WORK
/opt/CA/IngresII/ingres/work/default/test_olcp
================================================== ======================
========

You can see that it doesn't record the status of checkpoint #2, but
it does record the dump status of checkpoint #2.

Anyone else seen this. Got a better suggestion than using tar for
backup on RH 3.0?

Cheers,

--
Michael Leo Java, J2EE, BEA WebLogic,
Caribou Lake LLC Oracle, Open Source, Ingres,
mleo (AT) cariboulake (DOT) com Real Enterprise Applications

Climate is what we expect. Weather is what we get.
- Mark Twain
_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres


_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres



Reply With Quote
  #7  
Old   
Armand Pirvu
 
Posts: n/a

Default RE: [Info-ingres] Problem with online checkpoints for r3 and Linux - 03-16-2005 , 09:25 AM



Mike,

The iitar script is something that overcomes that file size changed thing when running on-line ckp.
The ckptmpl.def suffered the following modifications:
1. instead of * use .
2. Replace tar with iitar

Fair to say, I use it on Ingres II 2.0 on AIX . So far runs OK. I can only assume it works fine on 2.6 as well. It would have to be tested though for 2.6.

Cheers,
Armand

Michael Leo <mleo (AT) cariboulake (DOT) com> wrote:
Gang,

Thanks to Paul White and Armand Pirvu, I was led by the nose to this
document:

http://supportconnect.ca.com/sc/kb/t...reen=kbresults

I would suggest you worry about this on your platform if you use
online checkpoints.

Cheers,

Mike Leo

At 3:02 PM +1100 3/16/05, Paul White wrote:
Quote:
I think this is what you need
TEC317591

Paul

-----Original Message-----
From: Michael Leo [mailto:mleo (AT) cariboulake (DOT) com]
Sent: Wednesday, 16 March 2005 12:04 PM
To: Armand Pirvu (yahoo); 'Michael Leo'; info-ingres (AT) cariboulake (DOT) com
Subject: RE: [Info-ingres] Problem with online checkpoints for r3 and
Linux



Armand,

That is the perfect solution, except the iitar is not shipped
with Ingres II r3 on Linux.

Methinks I will need to open a call with CA and see if they can
slip that file into an official version so the customer is
comfortable it will be supported.

Thanks!

Mike Leo


At 6:03 PM -0600 3/15/05, Armand Pirvu \(yahoo\) wrote:
Hi Mike,

I got into some similar things, on AIX though.
Here is what it would be worthwhile trying.
In ckptmpl.def, I changed this way

BSTD: /bin/echo beginning checkpoint to tape %C of %N locations
BSDD: /bin/echo beginning checkpoint to disk %C of %N locations
ESTD: /bin/echo ending checkpoint to tape %C of %N locations
ESDD: /bin/echo ending checkpoint to disk %C of %N locations
PSTD: /bin/echo mount tape %M and press return; read foo;
WSTD: cd %D; iitar cbf 20 %C .
PSDD: /bin/echo executing checkpoint to disk
WSDD: cd %D; iitar cf %A .


And in $II_SYSTEM/ingres/utility it is iitar which overcomes a tar
stupid message that says file has changed size.

#!/usr/bin/ksh
#
# Script to set the exit code on tar
# when a file size change event occurs
#
# Set locale to C so error will be found
#
LC_ALL=C
LANG=C
#
# Set source directories for commands
#
TARSOURCE=/usr/bin/tar
GREPSOURCE=/usr/bin/grep
#
# Create file for errors to be written too.
#
touch /tmp/errorlog
#
# Start tar command with errors redirected to log file
#
$TARSOURCE "$@" 2> /tmp/errorlog
if [ -f /tmp/errorlog ] ;then
$GREPSOURCE "file changed size" /tmp/errorlog
echo $? > /tmp/status
if [ -f /tmp/status -a /tmp/status = 0 ] ;then
exit 0
fi
fi
exit

Maybe it works for you as well.

Cheers,
Armand

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Michael Leo
Sent: Tuesday, March 15, 2005 5:40 PM
To: info-ingres (AT) cariboulake (DOT) com
Subject: [Info-ingres] Problem with online checkpoints for r3 and Linux

Gang,

Working with one of our clients, we have discovered that Ingres r3
online
checkpoints on Linux (RH Enterprise 3.0, supported) can occasionally
fail
if a table's file is extended while it is being "tarred" up.

We are using the default checkpoint template, so we aren't doing
anything "funny" in cktmpl.def.

Good news here is that you really have to work to make this happen.
You have to insert a ton of records into a fairly large table while
the online checkpoint is backing up that table.

Bad news is the checkpoint doesn't appear to be recorded in the config
file and doesn't show up in the "infodb" output. The failed "dump"
attempt seems to be logged there. Very odd.

Here are the scripts we used to do this:

ftp://ftp.cariboulake.com/test_olcp/test_olcp.zip

You may need to run the script several times to get the error to
occur, but thre

We used a pretty small system and pretty big system to test this:

Small AMD Athon 1GHz single processor, 1.5GB RAM, 70GB IDE disk
Small Pentium Xeon 2.3Ghz dual HT processor, 4GB RAM, 160GB USCSI
disk

Last year we saw this error in Solaris with Ingres II 2.6, which
can be mitigated by using the 'correct' version of tar. This
was worked through Ingres technical support under CA Startrak Issue
12399005.

Here is the output of the checkpoint attempt:

Tue Mar 15 18:17:07 2005 CPP: Preparing to checkpoint database:
test_olcp
Tue Mar 15 18:17:07 2005 CPP: Preparing stall of database, active xact
cnt: 0
Tue Mar 15 18:17:07 2005 CPP: Finished stall of database
beginning checkpoint to disk
/opt/CA/IngresII/ingres/ckp/default/test_olcp of 1 locations
Tue Mar 15 18:17:08 2005 CPP: Start checkpoint of location:
ii_database to disk:
path = '/opt/CA/IngresII/ingres/ckp/default/test_olcp'
file = 'c0002001.ckp'
executing checkpoint to disk
/bin/tar: aaaaaaop.t00: file changed as we read it
/bin/tar: Error exit delayed from previous errors
Tue Mar 15 18:17:09 2005 E_DM1101_CPP_WRITE_ERROR Error writing
checkpoint.
Tue Mar 15 18:17:09 2005 E_DM110B_CPP_FAILED Error occurred
checkpointing the database.

Here is what infodb says:

==================Tue Mar 15 18:19:54 2005 Database
Information=================

Database : (test_olcp,mal) ID : 0x42376CB8 Collation : default
Extents : 5 Last Table Id : 239
Config File Version Id : 0x00060001 Database Version Id : 7
Mode : DDL ALLOWED, ONLINE CHECKPOINT ENABLED
Status : VALID,JOURNAL,DUMP,CFG_BACKUP

The Database has been Checkpointed.
The Database is Journaled.

Journals are valid from checkpoint sequence : 1

----Journal
information---------------------------------------------------------
Checkpoint sequence : 2 Journal sequence :
1
Current journal block : 161 Journal block size :
16384
Initial journal size : 4 Target journal size :
512
Last Log Address Journaled : <1110927916:68089:2560
----Dump
information------------------------------------------------------------
Checkpoint sequence : 2 Dump sequence :
1
Current dump block : 0 Dump block size :
16384
Initial dump size : 4 Target dump size :
512
Last Log Address Dumped : <0:0:0
----Checkpoint History for
Journal----------------------------------------------
Date Ckp_sequence First_jnl Last_jnl valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:06 2005 1 1 1
1 OFFLINE
----Checkpoint History for
Dump-------------------------------------------------
Date Ckp_sequence First_dmp Last_dmp valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:07 2005 2 0 0 0
ONLINE
----Cluster Journal
History-----------------------------------------------------
Node ID Current Journal Current Block Last Log Address
------------------------------------------------------------
None.
----Extent
directory------------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------------------
ii_database ROOT,DATA
/opt/CA/IngresII/ingres/data/default/test_olcp
ii_journal JOURNAL
/opt/CA/IngresII/ingres/jnl/default/test_olcp
ii_checkpoint CHECKPOINT
/opt/CA/IngresII/ingres/ckp/default/test_olcp
ii_dump DUMP
/opt/CA/IngresII/ingres/dmp/default/test_olcp
ii_work WORK
/opt/CA/IngresII/ingres/work/default/test_olcp
================================================== ======================
========

You can see that it doesn't record the status of checkpoint #2, but
it does record the dump status of checkpoint #2.

Anyone else seen this. Got a better suggestion than using tar for
backup on RH 3.0?

Cheers,

--
Michael Leo Java, J2EE, BEA WebLogic,
Caribou Lake LLC Oracle, Open Source, Ingres,
mleo (AT) cariboulake (DOT) com Real Enterprise Applications

Climate is what we expect. Weather is what we get.
- Mark Twain
_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres


_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres


================================================== =====
Armand

"If I had only known, I would have been a locksmith."
Albert Einstein
================================================== =====


Reply With Quote
  #8  
Old   
Paul White
 
Posts: n/a

Default RE: [Info-ingres] Problem with online checkpoints for r3 and Linux - 03-16-2005 , 06:21 PM



Armand's iitar works fine on Ingres 26 sp2 with solaris 8.

I was worried that ignoring the tar warning would invalidate the checkpoint
but after hammering Mike's test script and restoring to different points, I
found I could not break it.

Ok, I am curious though. I was under the impression that the writes to the
tables were delayed until after the checkpoint was complete. I thought the
dumps recorded active transactions. The documentation in TEC317591 says:
"Note: Advantage Ingres properly handles files that are changing during
backup. All changes to files are logged in the dump location." It seems
that Ingres continues to update the physical tables during the online
checkpoints.

I wonder what happens when tar gets to a long running transaction on a big
table. While I'd be unhappy if my checkpoint to disk failed with a tar
warning after running for an hour - I'd prefer not to ignore messages if
there is a chance the snapshot is inconsistent. Take this scenario for
instance:

9.00 Checkpoint Little Txns Big Txn
9.05 table 1
9.06 update table3 update table3
9.07 commit
9.08 table 2 update table2 update table2 (tar warning ignored/skipped)
9.09 commit
9.10 table 3
9.11 update table1 update table1
9.12 commit commit
9.13 finish ckpdb

The timestamps on the tar file in the checkpoint are:
table1 9.11
table2 9.08
table3 9.06
Infodb reports the checkpoint and dumps as at 9.00.

What am I going to end up with when this checkpoint is restored. Which
transactions are going to be retored? I found with -j (no journals) I get a
copy of the database as at 9:00 so this is cool. Using +j I get a copy of
the database as at 9:13.

So, this is the interesting bit, the rollforward of an online checkpoint did
not bring back an exact replica of the database tables as at 9:00. It
restored a copy of each table it found at backup time then massaged the
tables with the dump files. (Can someone fill in the gaps if I've got this
wrong).

My rollforward log shows:
Thu Mar 17 10:37:29 2005 RFP: Start restore of location: ii_database from
disk:
path = '/data3/ingtest/ingres/ckp/default/test_olcp'
file = 'c0004001.ckp'
executing restore to disk /data3/ingtest/ingres/ckp/default/test_olcp of 1
locations
done with disk operations
beginning restore of dumps
Thu Mar 17 10:37:40 2005 RFP: Start processing dump file sequence 3.
done with disk operations
Thu Mar 17 10:37:40 2005 Rollforward completed successfully.

It confirms the dumps are critical to recovery and should be included in
your backups along with the journals, configs etc.


Paul


-----Original Message-----
From: Michael Leo [mailto:mleo (AT) cariboulake (DOT) com]
Sent: Thursday, 17 March 2005 1:51 AM
To: info-ingres (AT) cariboulake (DOT) com
Cc: Paul White; Armand Pirvu (yahoo)
Subject: RE: [Info-ingres] Problem with online checkpoints for r3 and
Linux


Gang,

Thanks to Paul White and Armand Pirvu, I was led by the nose to this
document:

http://supportconnect.ca.com/sc/kb/t...317591&docid=3
17591&bypass=yes&fromscreen=kbresults

I would suggest you worry about this on your platform if you use
online checkpoints.

Cheers,

Mike Leo

At 3:02 PM +1100 3/16/05, Paul White wrote:
Quote:
I think this is what you need
TEC317591

Paul

-----Original Message-----
From: Michael Leo [mailto:mleo (AT) cariboulake (DOT) com]
Sent: Wednesday, 16 March 2005 12:04 PM
To: Armand Pirvu (yahoo); 'Michael Leo'; info-ingres (AT) cariboulake (DOT) com
Subject: RE: [Info-ingres] Problem with online checkpoints for r3 and
Linux



Armand,

That is the perfect solution, except the iitar is not shipped
with Ingres II r3 on Linux.

Methinks I will need to open a call with CA and see if they can
slip that file into an official version so the customer is
comfortable it will be supported.

Thanks!

Mike Leo


At 6:03 PM -0600 3/15/05, Armand Pirvu \(yahoo\) wrote:
Hi Mike,

I got into some similar things, on AIX though.
Here is what it would be worthwhile trying.
In ckptmpl.def, I changed this way

BSTD: /bin/echo beginning checkpoint to tape %C of %N locations
BSDD: /bin/echo beginning checkpoint to disk %C of %N locations
ESTD: /bin/echo ending checkpoint to tape %C of %N locations
ESDD: /bin/echo ending checkpoint to disk %C of %N locations
PSTD: /bin/echo mount tape %M and press return; read foo;
WSTD: cd %D; iitar cbf 20 %C .
PSDD: /bin/echo executing checkpoint to disk
WSDD: cd %D; iitar cf %A .


And in $II_SYSTEM/ingres/utility it is iitar which overcomes a tar
stupid message that says file has changed size.

#!/usr/bin/ksh
#
# Script to set the exit code on tar
# when a file size change event occurs
#
# Set locale to C so error will be found
#
LC_ALL=C
LANG=C
#
# Set source directories for commands
#
TARSOURCE=/usr/bin/tar
GREPSOURCE=/usr/bin/grep
#
# Create file for errors to be written too.
#
touch /tmp/errorlog
#
# Start tar command with errors redirected to log file
#
$TARSOURCE "$@" 2> /tmp/errorlog
if [ -f /tmp/errorlog ] ;then
$GREPSOURCE "file changed size" /tmp/errorlog
echo $? > /tmp/status
if [ -f /tmp/status -a /tmp/status = 0 ] ;then
exit 0
fi
fi
exit

Maybe it works for you as well.

Cheers,
Armand

-----Original Message-----
From: info-ingres-admin (AT) cariboulake (DOT) com
[mailto:info-ingres-admin (AT) cariboulake (DOT) com] On Behalf Of Michael Leo
Sent: Tuesday, March 15, 2005 5:40 PM
To: info-ingres (AT) cariboulake (DOT) com
Subject: [Info-ingres] Problem with online checkpoints for r3 and Linux

Gang,

Working with one of our clients, we have discovered that Ingres r3
online
checkpoints on Linux (RH Enterprise 3.0, supported) can occasionally
fail
if a table's file is extended while it is being "tarred" up.

We are using the default checkpoint template, so we aren't doing
anything "funny" in cktmpl.def.

Good news here is that you really have to work to make this happen.
You have to insert a ton of records into a fairly large table while
the online checkpoint is backing up that table.

Bad news is the checkpoint doesn't appear to be recorded in the config
file and doesn't show up in the "infodb" output. The failed "dump"
attempt seems to be logged there. Very odd.

Here are the scripts we used to do this:

ftp://ftp.cariboulake.com/test_olcp/test_olcp.zip

You may need to run the script several times to get the error to
occur, but thre

We used a pretty small system and pretty big system to test this:

Small AMD Athon 1GHz single processor, 1.5GB RAM, 70GB IDE disk
Small Pentium Xeon 2.3Ghz dual HT processor, 4GB RAM, 160GB USCSI
disk

Last year we saw this error in Solaris with Ingres II 2.6, which
can be mitigated by using the 'correct' version of tar. This
was worked through Ingres technical support under CA Startrak Issue
12399005.

Here is the output of the checkpoint attempt:

Tue Mar 15 18:17:07 2005 CPP: Preparing to checkpoint database:
test_olcp
Tue Mar 15 18:17:07 2005 CPP: Preparing stall of database, active xact
cnt: 0
Tue Mar 15 18:17:07 2005 CPP: Finished stall of database
beginning checkpoint to disk
/opt/CA/IngresII/ingres/ckp/default/test_olcp of 1 locations
Tue Mar 15 18:17:08 2005 CPP: Start checkpoint of location:
ii_database to disk:
path = '/opt/CA/IngresII/ingres/ckp/default/test_olcp'
file = 'c0002001.ckp'
executing checkpoint to disk
/bin/tar: aaaaaaop.t00: file changed as we read it
/bin/tar: Error exit delayed from previous errors
Tue Mar 15 18:17:09 2005 E_DM1101_CPP_WRITE_ERROR Error writing
checkpoint.
Tue Mar 15 18:17:09 2005 E_DM110B_CPP_FAILED Error occurred
checkpointing the database.

Here is what infodb says:

==================Tue Mar 15 18:19:54 2005 Database
Information=================

Database : (test_olcp,mal) ID : 0x42376CB8 Collation : default
Extents : 5 Last Table Id : 239
Config File Version Id : 0x00060001 Database Version Id : 7
Mode : DDL ALLOWED, ONLINE CHECKPOINT ENABLED
Status : VALID,JOURNAL,DUMP,CFG_BACKUP

The Database has been Checkpointed.
The Database is Journaled.

Journals are valid from checkpoint sequence : 1

----Journal
information---------------------------------------------------------
Checkpoint sequence : 2 Journal sequence :
1
Current journal block : 161 Journal block size :
16384
Initial journal size : 4 Target journal size :
512
Last Log Address Journaled : <1110927916:68089:2560
----Dump
information------------------------------------------------------------
Checkpoint sequence : 2 Dump sequence :
1
Current dump block : 0 Dump block size :
16384
Initial dump size : 4 Target dump size :
512
Last Log Address Dumped : <0:0:0
----Checkpoint History for
Journal----------------------------------------------
Date Ckp_sequence First_jnl Last_jnl valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:06 2005 1 1 1
1 OFFLINE
----Checkpoint History for
Dump-------------------------------------------------
Date Ckp_sequence First_dmp Last_dmp valid
mode

------------------------------------------------------------------------
----
Tue Mar 15 18:17:07 2005 2 0 0 0
ONLINE
----Cluster Journal
History-----------------------------------------------------
Node ID Current Journal Current Block Last Log Address
------------------------------------------------------------
None.
----Extent
directory------------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------------------
ii_database ROOT,DATA
/opt/CA/IngresII/ingres/data/default/test_olcp
ii_journal JOURNAL
/opt/CA/IngresII/ingres/jnl/default/test_olcp
ii_checkpoint CHECKPOINT
/opt/CA/IngresII/ingres/ckp/default/test_olcp
ii_dump DUMP
/opt/CA/IngresII/ingres/dmp/default/test_olcp
ii_work WORK
/opt/CA/IngresII/ingres/work/default/test_olcp
================================================== ======================
========

You can see that it doesn't record the status of checkpoint #2, but
it does record the dump status of checkpoint #2.

Anyone else seen this. Got a better suggestion than using tar for
backup on RH 3.0?

Cheers,

--
Michael Leo Java, J2EE, BEA WebLogic,
Caribou Lake LLC Oracle, Open Source, Ingres,
mleo (AT) cariboulake (DOT) com Real Enterprise Applications

Climate is what we expect. Weather is what we get.
- Mark Twain
_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres


_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres

_______________________________________________
Info-ingres mailing list
Info-ingres (AT) cariboulake (DOT) com
http://mailman.cariboulake.com/mailm...py/info-ingres




Reply With Quote
  #9  
Old   
Karl & Betty Schendel
 
Posts: n/a

Default RE: [Info-ingres] Problem with online checkpoints for r3 and Linux - 03-16-2005 , 07:40 PM



At 11:21 AM +1100 3/17/2005, Paul White wrote:
Quote:
Ok, I am curious though. I was under the impression that the writes to the
tables were delayed until after the checkpoint was complete.
No, that's not how it works. When a checkpoint is running, Ingres
logs page before-images as well as the usual log records, and all
that stuff gets copied to the dumps. When you do a rollforwarddb, it
restores the table files (which might be in some random state), and
then in essence reads the dumps backwards, dropping in before-images
and thus restoring the table to the state it was in at the start of
the checkpoint.

Your conclusion that the dumps are critical is entirely correct,
of course.

Karl


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.