![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
#3
| |||
| |||
|
|
Hi guys, we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 node cluster on 2 sites. |
|
OpenVMS V7.3 site 1: MP1 sys$sysroot = DSA200:[SYS0.] MP2 sys$sysroot = DSA200:[SYS1.] site 2: OP1 sys$sysroot = DSA100:[SYS0.] OP2 sys$sysroot = DSA100:[SYS1.] QRM sys$sysroot = DSA300:[SYS0.] our application runs on node MP1 and uses the following DB database DSA618:[DB_DISK001.DB]DB.RDB but for failover scenario we need to do a RMU/OPEN DSA618:[DB_DISK001.DB]DB.RDB on node OP1, are there any problems doing this ? RDB is started on nodes MP1 and OP1 but in normal operations the DB database DSA618:[DB_DISK001.DB]DB.RDB is opened only on node MP1 thanks for your answers N.Manser SYSMAN> do rmu/show system %SYSMAN-I-OUTPUT, command execution on node QRM %DCL-W-IVVERB, unrecognized command verb - check validity and spelling \RMU\ %SYSMAN-I-OUTPUT, command execution on node OP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node OP1 Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 - AIJ Log Server is active - 2 active database users - database also open on these nodes: MP1 %SYSMAN-I-OUTPUT, command execution on node MP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node MP1 Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" database DSA618:[DB_DISK001.DB]DB.RDB;1 - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) * database is opened by an operator - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 - global buffer count is 30000; 22250 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 156 active database users database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 |
#4
| |||
| |||
|
|
Nazim wrote: Hi guys, we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 node cluster on 2 sites. make sure that you've executed RMUSTART70 (or RMONSTART) on all thenodes. If you are using multi-version Rdb, then you'll probably need to execute SYS$SHARE:RDB$SETVER prior to using RMU. If this is the case, you could build a little DCL procedure to do: $ @SYS$SHARE:RDB$SETVER 70 $ RMU/SHOW SYSTEM and execute that procedure from SYSMAN. OpenVMS V7.3 site 1: MP1 sys$sysroot = DSA200:[SYS0.] MP2 sys$sysroot = DSA200:[SYS1.] site 2: OP1 sys$sysroot = DSA100:[SYS0.] OP2 sys$sysroot = DSA100:[SYS1.] QRM sys$sysroot = DSA300:[SYS0.] our application runs on node MP1 and uses the following DB database DSA618:[DB_DISK001.DB]DB.RDB but for failover scenario we need to do a RMU/OPEN DSA618:[DB_DISK001.DB]DB.RDB on node OP1, are there any problems doing this ? RDB is started on nodes MP1 and OP1 but in normal operations the DB database DSA618:[DB_DISK001.DB]DB.RDB is opened only on node MP1 thanks for your answers N.Manser SYSMAN> do rmu/show system %SYSMAN-I-OUTPUT, command execution on node QRM %DCL-W-IVVERB, unrecognized command verb - check validity and spelling \RMU\ %SYSMAN-I-OUTPUT, command execution on node OP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node OP1 Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 - AIJ Log Server is active - 2 active database users - database also open on these nodes: MP1 %SYSMAN-I-OUTPUT, command execution on node MP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node MP1 Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" database DSA618:[DB_DISK001.DB]DB.RDB;1 - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) * database is opened by an operator - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 - global buffer count is 30000; 22250 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 156 active database users database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 |
DAL$START_TR_MON.COM DISK$DTC_COMMON:[DDAL.DATABASE] -|
-- - - - - - opinions expressed here are mine and mine alone and certainly are not intended in any way to express or represent any opinions or commitment of oracle corporation. norman lastovica / oracle rdb engineering |
#5
| |||
| |||
|
|
Hi guys, we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 node cluster on 2 sites. OpenVMS V7.3 site 1: MP1 sys$sysroot = DSA200:[SYS0.] MP2 sys$sysroot = DSA200:[SYS1.] site 2: OP1 sys$sysroot = DSA100:[SYS0.] OP2 sys$sysroot = DSA100:[SYS1.] QRM sys$sysroot = DSA300:[SYS0.] our application runs on node MP1 and uses the following DB database DSA618:[DB_DISK001.DB]DB.RDB but for failover scenario we need to do a RMU/OPEN DSA618:[DB_DISK001.DB]DB.RDB on node OP1, are there any problems doing this ? RDB is started on nodes MP1 and OP1 but in normal operations the DB database DSA618:[DB_DISK001.DB]DB.RDB is opened only on node MP1 thanks for your answers N.Manser SYSMAN> do rmu/show system %SYSMAN-I-OUTPUT, command execution on node QRM %DCL-W-IVVERB, unrecognized command verb - check validity and spelling \RMU\ %SYSMAN-I-OUTPUT, command execution on node OP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node OP1 Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 - AIJ Log Server is active - 2 active database users - database also open on these nodes: MP1 %SYSMAN-I-OUTPUT, command execution on node MP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node MP1 Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" database DSA618:[DB_DISK001.DB]DB.RDB;1 - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) * database is opened by an operator - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 - global buffer count is 30000; 22250 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 156 active database users database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 |
#6
| |||
| |||
|
|
Hi Nazim, If this system runs anything other "Mom and Dad's corner Deli VAT return" then I suspect that you (or the company you support) are in in big trouble! Get yourself a professional DBA and pay them what they ask to do the job properly. The questions turning up here (and more so in the ITRC) about Rdb are truly frightening. I wish I could find out who these companies are and turn up to their next risk-assessment or shareholders meeting :-( |
|
Anyway no one can answer your question directly unless they know a bit more about MP and OP. I suggest "yes" but if you've never tried a failover before then what are the extra machines there for. The fact that you appear to be running Data Distributor raises an eyebrow, but my advice is to open the database on *all* nodes and use them *all* *all* of the time in possibly a wide-are cluster configuration. |








|
Rdb engineering hates clusters 'cos Norm doesn't get to use his beloved Row-Ca$h, but don't let that bother you. Regards Richard Maher "Nazim" <nmanser (AT) progis (DOT) de> wrote in message news:1162918356.022076.305610 (AT) b28g2000cwb (DOT) googlegroups.com... Hi guys, we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 node cluster on 2 sites. OpenVMS V7.3 site 1: MP1 sys$sysroot = DSA200:[SYS0.] MP2 sys$sysroot = DSA200:[SYS1.] site 2: OP1 sys$sysroot = DSA100:[SYS0.] OP2 sys$sysroot = DSA100:[SYS1.] QRM sys$sysroot = DSA300:[SYS0.] our application runs on node MP1 and uses the following DB database DSA618:[DB_DISK001.DB]DB.RDB but for failover scenario we need to do a RMU/OPEN DSA618:[DB_DISK001.DB]DB.RDB on node OP1, are there any problems doing this ? RDB is started on nodes MP1 and OP1 but in normal operations the DB database DSA618:[DB_DISK001.DB]DB.RDB is opened only on node MP1 thanks for your answers N.Manser SYSMAN> do rmu/show system %SYSMAN-I-OUTPUT, command execution on node QRM %DCL-W-IVVERB, unrecognized command verb - check validity and spelling \RMU\ %SYSMAN-I-OUTPUT, command execution on node OP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node OP1 Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 - AIJ Log Server is active - 2 active database users - database also open on these nodes: MP1 %SYSMAN-I-OUTPUT, command execution on node MP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node MP1 Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" database DSA618:[DB_DISK001.DB]DB.RDB;1 - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) * database is opened by an operator - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 - global buffer count is 30000; 22250 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 156 active database users database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 |
#7
| |||
| |||
|
|
that is why i was assigned the task to ensure correct failover strategy. |
|
so as you say, the RMU/open should be done on both MP1 and OP1 as soon as they reboot. correct ? |
|
Richard Maher schrieb: Hi Nazim, If this system runs anything other "Mom and Dad's corner Deli VAT return" then I suspect that you (or the company you support) are in in big trouble! Get yourself a professional DBA and pay them what they ask to do the job properly. The questions turning up here (and more so in the ITRC) about Rdb are truly frightening. I wish I could find out who these companies are and turn up to their next risk-assessment or shareholders meeting :-( that is why i was assigned the task to ensure correct failover strategy. Anyway no one can answer your question directly unless they know a bit more about MP and OP. I suggest "yes" but if you've never tried a failover before then what are the extra machines there for. The fact that you appear to be running Data Distributor raises an eyebrow, but my advice is to open the database on *all* nodes and use them *all* *all* of the time in possibly a wide-are cluster configuration. MP1 and OP1 are on 2 sites but share the samefile system. to be precise the file layout of the RDB stuff is as follows: root file location : dsa618:[db_disk001.db] RDA & SNP files: dsa618:[db_disk001.db] dsa618:[db_disk002.db] dsa618:[db_disk003.db] AIJ files: dsa616:[db_diskA01.db] dsa616:[db_diskA02.db] RUJ files dsa617:[rdms$ruj] MP1>sh dev dsa618 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA618: Mounted 0 DMG_DB 32582436 7696 4 $1$DGA230: (OP2) ShadowSetMember 0 (member of DSA618 ![]() $1$DGA430: (MP1) ShadowSetMember 0 (member of DSA618 ![]() MP1>sh dev dsa621 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA621: Mounted 0 DMG_DB2 12936924 5 4 $1$DGA214: (OP2) ShadowSetMember 0 (member of DSA621 ![]() $1$DGA414: (MP1) ShadowSetMember 0 (member of DSA621 ![]() MP1>sh dev dsa616 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA616: Mounted 0 DMG_AIJ 8673228 100 4 $1$DGA210: (OP2) ShadowSetMember 0 (member of DSA616 ![]() $1$DGA410: (MP1) ShadowSetMember 0 (member of DSA616 ![]() MP1>sh dev dsa617 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA617: Mounted 0 DMG_RUJ 17359776 165 4 $1$DGA211: (OP2) ShadowSetMember 0 (member of DSA617 ![]() $1$DGA411: (MP1) ShadowSetMember 0 (member of DSA617 ![]() usually a RMU/open on that DB is done only on MP1. i would like to know what happens, when in case of failover (MP1 crashes) i do a RMU/open on OP1 node. as it is a mission critical production DB, i want to be sure 100% before updating our documentation. so as you say, the RMU/open should be done on both MP1 and OP1 as soon as they reboot. correct ? i an new (2 months) and i inherited, the task to support the application and its underklying RDB. regards, Nazim Manser Rdb engineering hates clusters 'cos Norm doesn't get to use his beloved Row-Ca$h, but don't let that bother you. Regards Richard Maher "Nazim" <nmanser (AT) progis (DOT) de> wrote in message news:1162918356.022076.305610 (AT) b28g2000cwb (DOT) googlegroups.com... Hi guys, we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 node cluster on 2 sites. OpenVMS V7.3 site 1: MP1 sys$sysroot = DSA200:[SYS0.] MP2 sys$sysroot = DSA200:[SYS1.] site 2: OP1 sys$sysroot = DSA100:[SYS0.] OP2 sys$sysroot = DSA100:[SYS1.] QRM sys$sysroot = DSA300:[SYS0.] our application runs on node MP1 and uses the following DB database DSA618:[DB_DISK001.DB]DB.RDB but for failover scenario we need to do a RMU/OPEN DSA618:[DB_DISK001.DB]DB.RDB on node OP1, are there any problems doing this ? RDB is started on nodes MP1 and OP1 but in normal operations the DB database DSA618:[DB_DISK001.DB]DB.RDB is opened only on node MP1 thanks for your answers N.Manser SYSMAN> do rmu/show system %SYSMAN-I-OUTPUT, command execution on node QRM %DCL-W-IVVERB, unrecognized command verb - check validity and spelling \RMU\ %SYSMAN-I-OUTPUT, command execution on node OP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node OP1 Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 - AIJ Log Server is active - 2 active database users - database also open on these nodes: MP1 %SYSMAN-I-OUTPUT, command execution on node MP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node MP1 Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" database DSA618:[DB_DISK001.DB]DB.RDB;1 - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) * database is opened by an operator - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 - global buffer count is 30000; 22250 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 156 active database users database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 |
#8
| |||||||
| |||||||
|
|
Hi Nazim, that is why i was assigned the task to ensure correct failover strategy. And you're a contractor right? (Or you boss is a contractor?) Let's hope the customers not reading this eh :-) I'd love to know how much the contract's for, but then it's Cologne and not Munich and it's none of my business. |
|
Anyway, is there not a UAT or other test environment that this can be tested in first? |
|
I'll assume not. When you do an RMU/DUMP/HEADER on the database(s) do they say number of cluster nodes is "1"? If they do then you'll have to make sure the databases are closed on MP1 before trying to open them on OP1. If not just open them up on both nodes and fire up the application on both nodes (if it's cluster tolerant) and get the application testing people involved. |
|
so as you say, the RMU/open should be done on both MP1 and OP1 as soon as they reboot. correct ? No, I was suggesting that the beauty of VMS clusters and Rdb is that you don't have to "fail-over" because, personally, I would open the database and the application on all of the nodes all of the time. If MP1 goes down then there would be a pregnant-pause followed by MP1 users having to log in again, but that's it. The cluster took a lickin' but it kept on tickin'. With Rdb partitioned lock trees and all the work VMS engineering has been doing with the DLM *and* the new interconnect stuff coming along, I see no point in restricting a database to one node. (Never have :-) |
|
The fact that you're using Data Distributor (why?) leeds me to suspect that not all disks are accessible cluster wide or there's something dodgy with the application. Our DR used to be copying RBFs over to the mirror machine and restoring them and rolloing forward AIJs. Once every couple of years we'd be forced to run in DR for a week and then switch back with no loss of data. They were *never* able to get the Unix systems to achieve the same thing! (They'd just get someone to log on and that would be that. i.e. production never shifted) VMS guys were moving to a Disaster Tolerant set up when I left. |
|
My *guess* is everything will be ok except for DNS cache flushes and hard-coded SQL/Services server names. (But then, if I was getting paid to do it, I'd make sure :-) |
|
Regards Richard Maher "Nazim" <nmanser (AT) progis (DOT) de> wrote in message news:1162981579.381769.299700 (AT) k70g2000cwa (DOT) googlegroups.com... Richard Maher schrieb: Hi Nazim, If this system runs anything other "Mom and Dad's corner Deli VAT return" then I suspect that you (or the company you support) are in in big trouble! Get yourself a professional DBA and pay them what they ask to do the job properly. The questions turning up here (and more so in the ITRC) about Rdb are truly frightening. I wish I could find out who these companies are and turn up to their next risk-assessment or shareholders meeting :-( that is why i was assigned the task to ensure correct failover strategy. Anyway no one can answer your question directly unless they know a bit more about MP and OP. I suggest "yes" but if you've never tried a failover before then what are the extra machines there for. The fact that you appear to be running Data Distributor raises an eyebrow, but my advice is to open the database on *all* nodes and use them *all* *all* of the time in possibly a wide-are cluster configuration. MP1 and OP1 are on 2 sites but share the samefile system. to be precise the file layout of the RDB stuff is as follows: root file location : dsa618:[db_disk001.db] RDA & SNP files: dsa618:[db_disk001.db] dsa618:[db_disk002.db] dsa618:[db_disk003.db] AIJ files: dsa616:[db_diskA01.db] dsa616:[db_diskA02.db] RUJ files dsa617:[rdms$ruj] MP1>sh dev dsa618 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA618: Mounted 0 DMG_DB 32582436 7696 4 $1$DGA230: (OP2) ShadowSetMember 0 (member of DSA618 ![]() $1$DGA430: (MP1) ShadowSetMember 0 (member of DSA618 ![]() MP1>sh dev dsa621 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA621: Mounted 0 DMG_DB2 12936924 5 4 $1$DGA214: (OP2) ShadowSetMember 0 (member of DSA621 ![]() $1$DGA414: (MP1) ShadowSetMember 0 (member of DSA621 ![]() MP1>sh dev dsa616 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA616: Mounted 0 DMG_AIJ 8673228 100 4 $1$DGA210: (OP2) ShadowSetMember 0 (member of DSA616 ![]() $1$DGA410: (MP1) ShadowSetMember 0 (member of DSA616 ![]() MP1>sh dev dsa617 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA617: Mounted 0 DMG_RUJ 17359776 165 4 $1$DGA211: (OP2) ShadowSetMember 0 (member of DSA617 ![]() $1$DGA411: (MP1) ShadowSetMember 0 (member of DSA617 ![]() usually a RMU/open on that DB is done only on MP1. i would like to know what happens, when in case of failover (MP1 crashes) i do a RMU/open on OP1 node. as it is a mission critical production DB, i want to be sure 100% before updating our documentation. so as you say, the RMU/open should be done on both MP1 and OP1 as soon as they reboot. correct ? i an new (2 months) and i inherited, the task to support the application and its underklying RDB. regards, Nazim Manser Rdb engineering hates clusters 'cos Norm doesn't get to use his beloved Row-Ca$h, but don't let that bother you. Regards Richard Maher "Nazim" <nmanser (AT) progis (DOT) de> wrote in message news:1162918356.022076.305610 (AT) b28g2000cwb (DOT) googlegroups.com... Hi guys, we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 node cluster on 2 sites. OpenVMS V7.3 site 1: MP1 sys$sysroot = DSA200:[SYS0.] MP2 sys$sysroot = DSA200:[SYS1.] site 2: OP1 sys$sysroot = DSA100:[SYS0.] OP2 sys$sysroot = DSA100:[SYS1.] QRM sys$sysroot = DSA300:[SYS0.] our application runs on node MP1 and uses the following DB database DSA618:[DB_DISK001.DB]DB.RDB but for failover scenario we need to do a RMU/OPEN DSA618:[DB_DISK001.DB]DB.RDB on node OP1, are there any problems doing this ? RDB is started on nodes MP1 and OP1 but in normal operations the DB database DSA618:[DB_DISK001.DB]DB.RDB is opened only on node MP1 thanks for your answers N.Manser SYSMAN> do rmu/show system %SYSMAN-I-OUTPUT, command execution on node QRM %DCL-W-IVVERB, unrecognized command verb - check validity and spelling \RMU\ %SYSMAN-I-OUTPUT, command execution on node OP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node OP1 Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 - AIJ Log Server is active - 2 active database users - database also open on these nodes: MP1 %SYSMAN-I-OUTPUT, command execution on node MP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node MP1 Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" database DSA618:[DB_DISK001.DB]DB.RDB;1 - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) * database is opened by an operator - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 - global buffer count is 30000; 22250 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 156 active database users database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 |
#9
| |||
| |||
|
|
it is neither cologne nor munich. Frankfurt? How's the Spring workload shaping up? :-) |
|
Richard Maher schrieb: Hi Nazim, that is why i was assigned the task to ensure correct failover strategy. And you're a contractor right? (Or you boss is a contractor?) Let's hope the customers not reading this eh :-) I'd love to know how much the contract's for, but then it's Cologne and not Munich and it's none of my business. it is neither cologne nor munich. yes i am contractor, my boss is permanent and only since 1 year, so he inherited the stuff as it is. My role is to implement the failover scenario of our app, including the underlying RDB. the RDB stuff was implemented long time ago, and the team left since and the handover was not done correctly to my boss. (since he was there all worked fine, last time the DB was opened is over a year ago. MP1>rmu/show system sql$database Oracle Rdb V7.0-61 on node EVAMP1 10-OCT-2006 11:22:47.97 - monitor started 6-NOV-2004 11:12:09.67 (uptime 703 00:10:38) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;77" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 11-NOV-2005 14:18:37.78 (elapsed 332 21:04:10) * database is opened by an operator - current after-image journal file is TPZH_DAL_DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1559 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 this is our prod DB database DSA618:[DB_DISK001.DB]DMG_DB.RDB;1 - first opened 15-MAY-2005 07:08:43.83 (elapsed 513 04:14:04) * database is opened by an operator - current after-image journal file is DB_DISKA02:[AIJ]AIJ18.AIJ;1 - global buffer count is 30000; 20550 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 190 active database users Anyway, is there not a UAT or other test environment that this can be tested in first? unfortunately the UAT environment is on a standalone VMS machine I'll assume not. When you do an RMU/DUMP/HEADER on the database(s) do they say number of cluster nodes is "1"? If they do then you'll have to make sure the databases are closed on MP1 before trying to open them on OP1. If not just open them up on both nodes and fire up the application on both nodes (if it's cluster tolerant) and get the application testing people involved. on the RMU/DUMP/HEADER there is no reference of cluster nodes, the only reference is in the case of the DDAL database is also open on OP1. it is the node numbers. MP1>search ddal_dump.txt node Maximum node count is 16 - WARNING: Maximum node count is 16 instead of 1 MP1>search dmg_dump.txt node Maximum node count is 1 ----> yes. but what if MP1 crashes ? is there any danger to open the database on the other node ? our application is designed to be run only on 1 node at a time, but the RDB can be opened also on OP1 as a standby solution. OK before doing this i must close DB on MP1, then open on MP1 and OP1. i have to implement the application failover scenario on the VMS side, and the testing activities can only be done in a very restricted window on the week end. i have first to implement the theoretical stuff, then schedule a test plan. so as you say, the RMU/open should be done on both MP1 and OP1 as soon as they reboot. correct ? No, I was suggesting that the beauty of VMS clusters and Rdb is that you don't have to "fail-over" because, personally, I would open the database and the application on all of the nodes all of the time. If MP1 goes down then there would be a pregnant-pause followed by MP1 users having to log in again, but that's it. The cluster took a lickin' but it kept on tickin'. With Rdb partitioned lock trees and all the work VMS engineering has been doing with the DLM *and* the new interconnect stuff coming along, I see no point in restricting a database to one node. (Never have :-) this was done by other team, and they did not document why they did that like this. The fact that you're using Data Distributor (why?) leeds me to suspect that not all disks are accessible cluster wide or there's something dodgy with the application. Our DR used to be copying RBFs over to the mirror machine and restoring them and rolloing forward AIJs. Once every couple of years we'd be forced to run in DR for a week and then switch back with no loss of data. They were *never* able to get the Unix systems to achieve the same thing! (They'd just get someone to log on and that would be that. i.e. production never shifted) VMS guys were moving to a Disaster Tolerant set up when I left. do you mean by data distributor the DDAL$TR_DB.RDB ? all the DSAn disks are accessible clustewide. My *guess* is everything will be ok except for DNS cache flushes and hard-coded SQL/Services server names. (But then, if I was getting paid to do it, I'd make sure :-) the application specific sqlservices are setup identically on MP1 and OP1 DNS cache switch needs also be checked with the downstram applications which connects to our RDB, but thats another story. regards, Nazim Manser Regards Richard Maher "Nazim" <nmanser (AT) progis (DOT) de> wrote in message news:1162981579.381769.299700 (AT) k70g2000cwa (DOT) googlegroups.com... Richard Maher schrieb: Hi Nazim, If this system runs anything other "Mom and Dad's corner Deli VAT return" then I suspect that you (or the company you support) are in in big trouble! Get yourself a professional DBA and pay them what they ask to do the job properly. The questions turning up here (and more so in the ITRC) about Rdb are truly frightening. I wish I could find out who these companies are and turn up to their next risk-assessment or shareholders meeting :-( that is why i was assigned the task to ensure correct failover strategy. Anyway no one can answer your question directly unless they know a bit more about MP and OP. I suggest "yes" but if you've never tried a failover before then what are the extra machines there for. The fact that you appear to be running Data Distributor raises an eyebrow, but my advice is to open the database on *all* nodes and use them *all* *all* of the time in possibly a wide-are cluster configuration. MP1 and OP1 are on 2 sites but share the samefile system. to be precise the file layout of the RDB stuff is as follows: root file location : dsa618:[db_disk001.db] RDA & SNP files: dsa618:[db_disk001.db] dsa618:[db_disk002.db] dsa618:[db_disk003.db] AIJ files: dsa616:[db_diskA01.db] dsa616:[db_diskA02.db] RUJ files dsa617:[rdms$ruj] MP1>sh dev dsa618 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA618: Mounted 0 DMG_DB 32582436 7696 4 $1$DGA230: (OP2) ShadowSetMember 0 (member of DSA618 ![]() $1$DGA430: (MP1) ShadowSetMember 0 (member of DSA618 ![]() MP1>sh dev dsa621 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA621: Mounted 0 DMG_DB2 12936924 5 4 $1$DGA214: (OP2) ShadowSetMember 0 (member of DSA621 ![]() $1$DGA414: (MP1) ShadowSetMember 0 (member of DSA621 ![]() MP1>sh dev dsa616 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA616: Mounted 0 DMG_AIJ 8673228 100 4 $1$DGA210: (OP2) ShadowSetMember 0 (member of DSA616 ![]() $1$DGA410: (MP1) ShadowSetMember 0 (member of DSA616 ![]() MP1>sh dev dsa617 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA617: Mounted 0 DMG_RUJ 17359776 165 4 $1$DGA211: (OP2) ShadowSetMember 0 (member of DSA617 ![]() $1$DGA411: (MP1) ShadowSetMember 0 (member of DSA617 ![]() usually a RMU/open on that DB is done only on MP1. i would like to know what happens, when in case of failover (MP1 crashes) i do a RMU/open on OP1 node. as it is a mission critical production DB, i want to be sure 100% before updating our documentation. so as you say, the RMU/open should be done on both MP1 and OP1 as soon as they reboot. correct ? i an new (2 months) and i inherited, the task to support the application and its underklying RDB. regards, Nazim Manser Rdb engineering hates clusters 'cos Norm doesn't get to use his beloved Row-Ca$h, but don't let that bother you. Regards Richard Maher "Nazim" <nmanser (AT) progis (DOT) de> wrote in message news:1162918356.022076.305610 (AT) b28g2000cwb (DOT) googlegroups.com... Hi guys, we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 node cluster on 2 sites. OpenVMS V7.3 site 1: MP1 sys$sysroot = DSA200:[SYS0.] MP2 sys$sysroot = DSA200:[SYS1.] site 2: OP1 sys$sysroot = DSA100:[SYS0.] OP2 sys$sysroot = DSA100:[SYS1.] QRM sys$sysroot = DSA300:[SYS0.] our application runs on node MP1 and uses the following DB database DSA618:[DB_DISK001.DB]DB.RDB but for failover scenario we need to do a RMU/OPEN DSA618:[DB_DISK001.DB]DB.RDB on node OP1, are there any problems doing this ? RDB is started on nodes MP1 and OP1 but in normal operations the DB database DSA618:[DB_DISK001.DB]DB.RDB is opened only on node MP1 thanks for your answers N.Manser SYSMAN> do rmu/show system %SYSMAN-I-OUTPUT, command execution on node QRM %DCL-W-IVVERB, unrecognized command verb - check validity and spelling \RMU\ %SYSMAN-I-OUTPUT, command execution on node OP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node OP1 Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 - AIJ Log Server is active - 2 active database users - database also open on these nodes: MP1 %SYSMAN-I-OUTPUT, command execution on node MP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node MP1 Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" database DSA618:[DB_DISK001.DB]DB.RDB;1 - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) * database is opened by an operator - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 - global buffer count is 30000; 22250 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 156 active database users database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 |
#10
| |||||||||||
| |||||||||||
|
|
Hi Nazim, it is neither cologne nor munich. Frankfurt? How's the Spring workload shaping up? :-) |
|
Anyway, what downtime window do you have? |
|
All I can suggest, on the information that you've given, is that you go in early one Sunday morning and shut down the application on MP1 followed by a close of all the databases. Then *before anything else* do a full off-line backup of all databases (probably followed by a complete rmu/verify if you haven't been doing them. |
|
On second thoughts, best not to ask too many questions eh :-) Then open the database(s) and applications up on OP1 and let the testers do their work. If the System Startups/UAFs/logicals/configs and specs are the same then I forsee no problems. |
|
Does the RDMS$RUJ logical point to the same place on all nodes? Anything in sys$specific? |
|
In summary Nazim, apart from the suck-it-and-see approach, I see no way forward. The one question I'd be sure to ask yourself before attempting the fail-over is "when was the last time that I've had to do a production restore in anger?". If the answer ends up "Buggered if I know!" then I suggest that you practice restoring the database to the test box, maybe rolling forward AIJs, enabling AIJs again. |
|
Are you running circular AIJs or single/extensible? |
|
ALS? |
|
You don't say you're running hot-standby but you are running DDAL; what transfers will stop when you switch over? |
|
Do you have a support contract? |
|
If so call Oracle Rdb support for help. If not, someone should bring this to the attention of the manager of the dickhead that made that decision! Probably the same dickhead that sacked all the real DBAs in the first place :-( You're on your own. Good-Luck. Regards Richard Maher $ pipe rmu/dump/head mf_personnel | sea sys$pipe node Maximum node count is 16 - WARNING: Maximum node count is 16 instead of 1 "Nazim" <nmanser (AT) progis (DOT) de> wrote in message news:1162990172.492625.213000 (AT) f16g2000cwb (DOT) googlegroups.com... Richard Maher schrieb: Hi Nazim, that is why i was assigned the task to ensure correct failover strategy. And you're a contractor right? (Or you boss is a contractor?) Let's hope the customers not reading this eh :-) I'd love to know how much the contract's for, but then it's Cologne and not Munich and it's none of my business. it is neither cologne nor munich. yes i am contractor, my boss is permanent and only since 1 year, so he inherited the stuff as it is. My role is to implement the failover scenario of our app, including the underlying RDB. the RDB stuff was implemented long time ago, and the team left since and the handover was not done correctly to my boss. (since he was there all worked fine, last time the DB was opened is over a year ago. MP1>rmu/show system sql$database Oracle Rdb V7.0-61 on node EVAMP1 10-OCT-2006 11:22:47.97 - monitor started 6-NOV-2004 11:12:09.67 (uptime 703 00:10:38) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;77" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 11-NOV-2005 14:18:37.78 (elapsed 332 21:04:10) * database is opened by an operator - current after-image journal file is TPZH_DAL_DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1559 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 this is our prod DB database DSA618:[DB_DISK001.DB]DMG_DB.RDB;1 - first opened 15-MAY-2005 07:08:43.83 (elapsed 513 04:14:04) * database is opened by an operator - current after-image journal file is DB_DISKA02:[AIJ]AIJ18.AIJ;1 - global buffer count is 30000; 20550 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 190 active database users Anyway, is there not a UAT or other test environment that this can be tested in first? unfortunately the UAT environment is on a standalone VMS machine I'll assume not. When you do an RMU/DUMP/HEADER on the database(s) do they say number of cluster nodes is "1"? If they do then you'll have to make sure the databases are closed on MP1 before trying to open them on OP1. If not just open them up on both nodes and fire up the application on both nodes (if it's cluster tolerant) and get the application testing people involved. on the RMU/DUMP/HEADER there is no reference of cluster nodes, the only reference is in the case of the DDAL database is also open on OP1. it is the node numbers. MP1>search ddal_dump.txt node Maximum node count is 16 - WARNING: Maximum node count is 16 instead of 1 MP1>search dmg_dump.txt node Maximum node count is 1 ----> yes. but what if MP1 crashes ? is there any danger to open the database on the other node ? our application is designed to be run only on 1 node at a time, but the RDB can be opened also on OP1 as a standby solution. OK before doing this i must close DB on MP1, then open on MP1 and OP1. i have to implement the application failover scenario on the VMS side, and the testing activities can only be done in a very restricted window on the week end. i have first to implement the theoretical stuff, then schedule a test plan. so as you say, the RMU/open should be done on both MP1 and OP1 as soon as they reboot. correct ? No, I was suggesting that the beauty of VMS clusters and Rdb is that you don't have to "fail-over" because, personally, I would open the database and the application on all of the nodes all of the time. If MP1 goes down then there would be a pregnant-pause followed by MP1 users having to log in again, but that's it. The cluster took a lickin' but it kept on tickin'. With Rdb partitioned lock trees and all the work VMS engineering has been doing with the DLM *and* the new interconnect stuff coming along, I see no point in restricting a database to one node. (Never have :-) this was done by other team, and they did not document why they did that like this. The fact that you're using Data Distributor (why?) leeds me to suspect that not all disks are accessible cluster wide or there's something dodgy with the application. Our DR used to be copying RBFs over to the mirror machine and restoring them and rolloing forward AIJs. Once every couple of years we'd be forced to run in DR for a week and then switch back with no loss of data. They were *never* able to get the Unix systems to achieve the same thing! (They'd just get someone to log on and that would be that. i.e. production never shifted) VMS guys were moving to a Disaster Tolerant set up when I left. do you mean by data distributor the DDAL$TR_DB.RDB ? all the DSAn disks are accessible clustewide. My *guess* is everything will be ok except for DNS cache flushes and hard-coded SQL/Services server names. (But then, if I was getting paid to do it, I'd make sure :-) the application specific sqlservices are setup identically on MP1 and OP1 DNS cache switch needs also be checked with the downstram applications which connects to our RDB, but thats another story. regards, Nazim Manser Regards Richard Maher "Nazim" <nmanser (AT) progis (DOT) de> wrote in message news:1162981579.381769.299700 (AT) k70g2000cwa (DOT) googlegroups.com... Richard Maher schrieb: Hi Nazim, If this system runs anything other "Mom and Dad's corner Deli VAT return" then I suspect that you (or the company you support) are in in big trouble! Get yourself a professional DBA and pay them what they ask to do the job properly. The questions turning up here (and more so in the ITRC) about Rdb are truly frightening. I wish I could find out who these companies are and turn up to their next risk-assessment or shareholders meeting :-( that is why i was assigned the task to ensure correct failover strategy. Anyway no one can answer your question directly unless they know a bit more about MP and OP. I suggest "yes" but if you've never tried a failover before then what are the extra machines there for. The fact that you appear to be running Data Distributor raises an eyebrow, but my advice is to open the database on *all* nodes and use them *all* *all* of the time in possibly a wide-are cluster configuration. MP1 and OP1 are on 2 sites but share the samefile system. to be precise the file layout of the RDB stuff is as follows: root file location : dsa618:[db_disk001.db] RDA & SNP files: dsa618:[db_disk001.db] dsa618:[db_disk002.db] dsa618:[db_disk003.db] AIJ files: dsa616:[db_diskA01.db] dsa616:[db_diskA02.db] RUJ files dsa617:[rdms$ruj] MP1>sh dev dsa618 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA618: Mounted 0 DMG_DB 32582436 7696 4 $1$DGA230: (OP2) ShadowSetMember 0 (member of DSA618 ![]() $1$DGA430: (MP1) ShadowSetMember 0 (member of DSA618 ![]() MP1>sh dev dsa621 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA621: Mounted 0 DMG_DB2 12936924 5 4 $1$DGA214: (OP2) ShadowSetMember 0 (member of DSA621 ![]() $1$DGA414: (MP1) ShadowSetMember 0 (member of DSA621 ![]() MP1>sh dev dsa616 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA616: Mounted 0 DMG_AIJ 8673228 100 4 $1$DGA210: (OP2) ShadowSetMember 0 (member of DSA616 ![]() $1$DGA410: (MP1) ShadowSetMember 0 (member of DSA616 ![]() MP1>sh dev dsa617 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA617: Mounted 0 DMG_RUJ 17359776 165 4 $1$DGA211: (OP2) ShadowSetMember 0 (member of DSA617 ![]() $1$DGA411: (MP1) ShadowSetMember 0 (member of DSA617 ![]() usually a RMU/open on that DB is done only on MP1. i would like to know what happens, when in case of failover (MP1 crashes) i do a RMU/open on OP1 node. as it is a mission critical production DB, i want to be sure 100% before updating our documentation. so as you say, the RMU/open should be done on both MP1 and OP1 as soon as they reboot. correct ? i an new (2 months) and i inherited, the task to support the application and its underklying RDB. regards, Nazim Manser Rdb engineering hates clusters 'cos Norm doesn't get to use his beloved Row-Ca$h, but don't let that bother you. Regards Richard Maher "Nazim" <nmanser (AT) progis (DOT) de> wrote in message news:1162918356.022076.305610 (AT) b28g2000cwb (DOT) googlegroups.com... Hi guys, we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 node cluster on 2 sites. OpenVMS V7.3 site 1: MP1 sys$sysroot = DSA200:[SYS0.] MP2 sys$sysroot = DSA200:[SYS1.] site 2: OP1 sys$sysroot = DSA100:[SYS0.] OP2 sys$sysroot = DSA100:[SYS1.] QRM sys$sysroot = DSA300:[SYS0.] our application runs on node MP1 and uses the following DB database DSA618:[DB_DISK001.DB]DB.RDB but for failover scenario we need to do a RMU/OPEN DSA618:[DB_DISK001.DB]DB.RDB on node OP1, are there any problems doing this ? RDB is started on nodes MP1 and OP1 but in normal operations the DB database DSA618:[DB_DISK001.DB]DB.RDB is opened only on node MP1 thanks for your answers N.Manser SYSMAN> do rmu/show system %SYSMAN-I-OUTPUT, command execution on node QRM %DCL-W-IVVERB, unrecognized command verb - check validity and spelling \RMU\ %SYSMAN-I-OUTPUT, command execution on node OP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node OP1 Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 - AIJ Log Server is active - 2 active database users - database also open on these nodes: MP1 %SYSMAN-I-OUTPUT, command execution on node MP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node MP1 Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" database DSA618:[DB_DISK001.DB]DB.RDB;1 - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) * database is opened by an operator - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 - global buffer count is 30000; 22250 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 156 active database users database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 |
![]() |
| Thread Tools | |
| Display Modes | |
| |