dbTalk Databases Forums  

chunks down

comp.databases.informix comp.databases.informix


Discuss chunks down in the comp.databases.informix forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Ljubica Cica
 
Posts: n/a

Default chunks down - 10-14-2003 , 11:33 AM







Hello,

I am using Solaris 7 (sun4u sparc SUNW,Ultra-4) and running Informix Dynamic
Server 2000 Version 9.21.UC1.

In the period from 22nd Sept to 3rd Oct 6 chunks went down - fortunately
Informix mirroring switching seamlessly to mirrored chunks (which are now
online). Among those that are down are physical db space and logdbs (with
logical logs)... So I have to start mirroring again.

The unusable chunk error in af* file
20:15:16 mt_aio_wait(3057f770)
20:15:16 chunkio(35, 0x1c6a5800, 0x20, 1, 0x311aa180, 34, 1) errno 5
AIOREQ *a
30c07e94: 00000000 00000000 00000000 00000000 ........ ........
30c07ea4: 00000000 00000000 3057f8e8 3057f5f8 ........ 0W..0W..
30c07eb4: 00001345 41494f72 65710000 00000000 ...EAIOr eq......
30c07ec4: 07d10000 00000000 00000003 00000501 ........ ........
30c07ed4: 0000000b 00000002 00000001 00000000 ........ ........
30c07ee4: 00000001 3057f7e4 00000000 00000001 ....0W.. ........
30c07ef4: 00000005 00000001 00000000 00000000 ........ ........
30c07f04: 00000000 ....
20:15:16
20:15:16 Informix Dynamic Server 2000 Version 9.21.UC1 Software Serial
Number AAC#J888444

20:15:16 Assert Failed: I/O error, Primary Chunk '/pathto/cms5' -- Offline
20:15:16 Who: Thread(27, flush_sub(6), 301139a8, 3)
File: rsbuff.c Line: 4032
20:15:16 Results: Chunk is now unusable
20:15:16 Action: Repair and restore from mirror or archive
20:15:16 Stack for thread: 27 flush_sub(6)

which suggests a hardware error.

On the same day when it started I got the messages in /var/adm/messages
about

Sep 22 17:13:48 dbsmuk1 Cmd (0x723f090) dump for Target 11 Lun 0:
Sep 22 17:13:48 dbsmuk1 unix: /pci@1f,4000/scsi@4 (glm8):
Sep 22 17:13:48 dbsmuk1 cdb=[ 0x2a 0x0 0x0 0x40 0x10 0x80
0x0 0x
0 0x4 0x0 ]
Sep 22 17:13:48 dbsmuk1 unix: /pci@1f,4000/scsi@4 (glm8):
Sep 22 17:13:48 dbsmuk1 pkt_flags=0x4000 pkt_statistics=0x61
pkt_state=0
x7

Sep 22 17:13:48 dbsmuk1 unix: WARNING: /pci@1f,4000/scsi@4 (glm8):
Sep 22 17:13:48 dbsmuk1 Disconnected tagged cmd(s) (1) timeout for
Targe
t 11.0
Sep 22 17:13:48 dbsmuk1 unix: WARNING: ID[SUNWpd.glm.cmd_timeout.6018]
Sep 22 17:13:48 dbsmuk1 unix: WARNING: /pci@1f,4000/scsi@4/sd@b,0 (sd130):
Sep 22 17:13:48 dbsmuk1 SCSI transport failed: reason 'timeout':
retryin
g command

Also SCSI transport failed: reason 'incomplete'

but it only brought down the chunk a couple of hours later... All raw
devices (chunks) on that disk on target 11 went down but not at the same
time... Also 2 other chunks on the other disk went down... Since 3rd of
October the instance is running with mirrored chunks online and there was no
other problems...

prtvtoc on the raw devices that are down give Unable to read the disk
geometry. (Offset for the chunks was 2 pages, 4 kB)

I will try to "recover mirrored chunks" tomorrow. I would greatly appreciate
your opinions - would it succeed - that is, are those hardware errors
temporary or fatal and should I consider the whole disk now (actually 2
disks) flaky and try something else (and what).

Many thanks for any advice


Ljubica Cica - Informix DBA
_____



This email has been scanned for viruses.

sending to informix-list

Reply With Quote
  #2  
Old   
Art S. Kagel
 
Posts: n/a

Default Re: chunks down - 10-14-2003 , 05:14 PM






On Tue, 14 Oct 2003 12:33:37 -0400, Ljubica Cica wrote:

Informix cannot mark chunks offline unless it tries to access the disk and
fails so yes there will often be a delay between hardware problems and IDS
reporting a chunk IO problem due to the buffer cache.

On your other question, SCSI timeouts can have several causes, but that in
combination with the incomplete operations would point to fundamental problems
with the drives. Perhaps the bad sector remap tables have been completely used
for most or all cylinders and the timeouts are caused by seeking to other
cylinders to try to find a usable sector to remap a flaky one. I'd definitely
take an archive for safety then trash the questionable drives and
replace/rebuild them with new ones.

Art S. Kagel

Quote:
Hello,

I am using Solaris 7 (sun4u sparc SUNW,Ultra-4) and running Informix Dynamic
Server 2000 Version 9.21.UC1.

In the period from 22nd Sept to 3rd Oct 6 chunks went down - fortunately
Informix mirroring switching seamlessly to mirrored chunks (which are now
online). Among those that are down are physical db space and logdbs (with
logical logs)... So I have to start mirroring again.

The unusable chunk error in af* file
20:15:16 mt_aio_wait(3057f770)
20:15:16 chunkio(35, 0x1c6a5800, 0x20, 1, 0x311aa180, 34, 1) errno 5 AIOREQ
*a
30c07e94: 00000000 00000000 00000000 00000000 ........ ........ 30c07ea4:
00000000 00000000 3057f8e8 3057f5f8 ........ 0W..0W.. 30c07eb4: 00001345
41494f72 65710000 00000000 ...EAIOr eq...... 30c07ec4: 07d10000 00000000
00000003 00000501 ........ ........ 30c07ed4: 0000000b 00000002 00000001
00000000 ........ ........ 30c07ee4: 00000001 3057f7e4 00000000 00000001
....0W.. ........ 30c07ef4: 00000005 00000001 00000000 00000000 ........
........ 30c07f04: 00000000 .... 20:15:16
20:15:16 Informix Dynamic Server 2000 Version 9.21.UC1 Software Serial
Number AAC#J888444

20:15:16 Assert Failed: I/O error, Primary Chunk '/pathto/cms5' -- Offline
20:15:16 Who: Thread(27, flush_sub(6), 301139a8, 3)
File: rsbuff.c Line: 4032
20:15:16 Results: Chunk is now unusable 20:15:16 Action: Repair and
restore from mirror or archive 20:15:16 Stack for thread: 27 flush_sub(6)

which suggests a hardware error.

On the same day when it started I got the messages in /var/adm/messages about

Sep 22 17:13:48 dbsmuk1 Cmd (0x723f090) dump for Target 11 Lun 0: Sep
22 17:13:48 dbsmuk1 unix: /pci@1f,4000/scsi@4 (glm8): Sep 22 17:13:48 dbsmuk1
cdb=[ 0x2a 0x0 0x0 0x40 0x10 0x80 0x0 0x
0 0x4 0x0 ]
Sep 22 17:13:48 dbsmuk1 unix: /pci@1f,4000/scsi@4 (glm8): Sep 22 17:13:48
dbsmuk1 pkt_flags=0x4000 pkt_statistics=0x61 pkt_state=0 x7

Sep 22 17:13:48 dbsmuk1 unix: WARNING: /pci@1f,4000/scsi@4 (glm8): Sep 22
17:13:48 dbsmuk1 Disconnected tagged cmd(s) (1) timeout for Targe t
11.0
Sep 22 17:13:48 dbsmuk1 unix: WARNING: ID[SUNWpd.glm.cmd_timeout.6018] Sep 22
17:13:48 dbsmuk1 unix: WARNING: /pci@1f,4000/scsi@4/sd@b,0 (sd130): Sep 22
17:13:48 dbsmuk1 SCSI transport failed: reason 'timeout': retryin g
command

Also SCSI transport failed: reason 'incomplete'

but it only brought down the chunk a couple of hours later... All raw devices
(chunks) on that disk on target 11 went down but not at the same time... Also
2 other chunks on the other disk went down... Since 3rd of October the
instance is running with mirrored chunks online and there was no other
problems...

prtvtoc on the raw devices that are down give Unable to read the disk
geometry. (Offset for the chunks was 2 pages, 4 kB)

I will try to "recover mirrored chunks" tomorrow. I would greatly appreciate
your opinions - would it succeed - that is, are those hardware errors
temporary or fatal and should I consider the whole disk now (actually 2 disks)
flaky and try something else (and what).

Many thanks for any advice


Ljubica Cica - Informix DBA
_____



This email has been scanned for viruses.

sending to informix-list

Reply With Quote
  #3  
Old   
Paul Watson
 
Posts: n/a

Default Re: chunks down - 10-17-2003 , 06:29 PM



Assuming you are patched to the latest Solaris level then the
errors/warnings are for real and I'd change the disk(s)

Ljubica Cica wrote:
Quote:
Hello,

I am using Solaris 7 (sun4u sparc SUNW,Ultra-4) and running Informix Dynamic
Server 2000 Version 9.21.UC1.

[cutting]
which suggests a hardware error.

On the same day when it started I got the messages in /var/adm/messages
about

Sep 22 17:13:48 dbsmuk1 Cmd (0x723f090) dump for Target 11 Lun 0:
Sep 22 17:13:48 dbsmuk1 unix: /pci@1f,4000/scsi@4 (glm8):
Sep 22 17:13:48 dbsmuk1 cdb=[ 0x2a 0x0 0x0 0x40 0x10 0x80
0x0 0x
0 0x4 0x0 ]
Sep 22 17:13:48 dbsmuk1 unix: /pci@1f,4000/scsi@4 (glm8):
Sep 22 17:13:48 dbsmuk1 pkt_flags=0x4000 pkt_statistics=0x61
pkt_state=0
x7

Sep 22 17:13:48 dbsmuk1 unix: WARNING: /pci@1f,4000/scsi@4 (glm8):
Sep 22 17:13:48 dbsmuk1 Disconnected tagged cmd(s) (1) timeout for
Targe
t 11.0
Sep 22 17:13:48 dbsmuk1 unix: WARNING: ID[SUNWpd.glm.cmd_timeout.6018]
Sep 22 17:13:48 dbsmuk1 unix: WARNING: /pci@1f,4000/scsi@4/sd@b,0 (sd130):
Sep 22 17:13:48 dbsmuk1 SCSI transport failed: reason 'timeout':
retryin
g command

Also SCSI transport failed: reason 'incomplete'

but it only brought down the chunk a couple of hours later... All raw
devices (chunks) on that disk on target 11 went down but not at the same
time... Also 2 other chunks on the other disk went down... Since 3rd of
October the instance is running with mirrored chunks online and there was no
other problems...

prtvtoc on the raw devices that are down give Unable to read the disk
geometry. (Offset for the chunks was 2 pages, 4 kB)

I will try to "recover mirrored chunks" tomorrow. I would greatly appreciate
your opinions - would it succeed - that is, are those hardware errors
temporary or fatal and should I consider the whole disk now (actually 2
disks) flaky and try something else (and what).

Many thanks for any advice

Ljubica Cica - Informix DBA
_____

This email has been scanned for viruses.

sending to informix-list
--
Paul Watson #
Oninit Ltd # Growing old is mandatory
Tel: +44 1436 672201 # Growing up is optional
Fax: +44 1436 678693 #
Mob: +44 7818 003457 #
www.oninit.com #


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.