dbTalk Databases Forums  

Corrupt records

comp.databases.pick comp.databases.pick


Discuss Corrupt records in the comp.databases.pick forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
MAV
 
Posts: n/a

Default Corrupt records - 08-04-2011 , 09:38 AM






Hi all!

We are founding some corrupt records in some files. These records
contain corrupted characters below ASCII code 32 causing some
attributes to be displaced. The records contains ASCII codes under 32
and some attributes are moved (For example, the data in attribute 4
are in the attribute 5 because the data in attribute 3 contains bad
characters and insert the rest of the data to the attribute 4)

Wa are working with D3NT 7.5.4 in Windows 2003 Server. The dimension
of the files are correct and there is no program that affects these
attributes.

Any idea what could be corrupting the data?

(we have proven that no antivurus access the directories of databases)

Thanks,

Marcos Alonso Vega

Reply With Quote
  #2  
Old   
Tony Gravagno
 
Posts: n/a

Default Re: Corrupt records - 08-04-2011 , 10:21 AM






I believe many of us with experience in this area would agree that
there could be many reasons for the errors, and these issues are often
very hard to diagnose. Only some more indepth diagnoses will
determine the cause. Here are some anecdotes:

Back in the 80's I was working on an Ultimate system that had the
letter 'e' appearing randomly throughout the system, not enough to be
catastrophic, but enough to be very annoying. This went on for about
a month. The 'e' error even got the name "Ernie". I believe the
error was an occasional but very consistent voltage spike in the disk
controller. That had to be replaced to isolate the issue.

A few years later, after 3 months of replacing hardware to diagnose
similar issues in a GA system, we found the NY Port Authority had
built a radio tower next to our client's office and was bathing them
with RF noise.

At another site we found disk corruption occurred at specific times of
day - when the building air conditioners would turn on.

At another site corruption only occurred at night - when the cleanup
lady plugged her vacuum cleaner into the wall in the same socket where
the server was plugged in.

On my development system I stated experiencing weird software issues.
At some point I decided it seemed to be a hardware issue so I opened
the case. The capacitors on my video card had blown open, probably
when the room got too hot. Replacing the card fixed the issues. The
worst problem to diagnose is a bad motherboard because that's the last
device you're inclined to replace. IMO a bad PC power supply is in
the same category.

So....

- It could be bad power, voltage drops or spikes, new radio equipment,
an air conditioner that zaps the power line, a failing power supply.
- It could be a bad memory stick. Reboot the system with BIOS set to
test RAM.
- It could be periodic disk errors. Use chkdsk and similar tools.
- It could be a user hitting the arrow keys to navigate a character
screen.

Try to find a pattern with regard to location, time, specific
characters, etc. Consistent patterns point to human issues and not
random hardware failings - but that's not always the case as we saw
with Ernie.

If you fix the data, are records corrupted again later? If not then
this might have been a single, isolated incident.

How often does this happen? Only after some specific time during the
day? Only on specific days? If you're not sure, you may need to run
file scans more often to see how often this is happening.

If you fix the data, are the same records corrupted, or is it always
different data? If different data is corrupted at random times,
chances are that you have a hardware error that's just spraying the
data indiscriminantly. But is anything else in the Windows file
system getting corrupted?

You said no program affects that data, so users navigating with arrow
keys, while common, is almost certainly not the issue. However, if
the same records get corrupted, then chances are the issue is either
programmatic, or there is a bad sector of disk under that data.
Rename that file, don't delete it because you want to isolate that
disk area, then re-create the file and copy in the new, fixed items.
If the error occurs in the same place, chances are that this is not a
hardware issue.

Did you recently load any ABS patches for D3? Any other software or
hardware changes?

I hope that helps you to diagnose the nature of the issue.

T

MAV wrote:

Quote:
Hi all!

We are founding some corrupt records in some files. These records
contain corrupted characters below ASCII code 32 causing some
attributes to be displaced. The records contains ASCII codes under 32
and some attributes are moved (For example, the data in attribute 4
are in the attribute 5 because the data in attribute 3 contains bad
characters and insert the rest of the data to the attribute 4)

Wa are working with D3NT 7.5.4 in Windows 2003 Server. The dimension
of the files are correct and there is no program that affects these
attributes.

Any idea what could be corrupting the data?

(we have proven that no antivurus access the directories of databases)

Thanks,

Marcos Alonso Vega

Reply With Quote
  #3  
Old   
Jeff Caspari
 
Posts: n/a

Default Re: Corrupt records - 08-04-2011 , 11:16 AM



Hi Marcos,
We have found this mostly caused by users hitting an escape or tab key in a
data entry field. You can usually tell by using a program to convert the
character and see if it is a common keyboard character.

If that's true then some terminal emulators, such as AccuTerm, have the
ability to strip out control characters before 'feeding' them to your
program.
Jeff

"MAV" <alonvega (AT) hotmail (DOT) com> wrote

Quote:
Hi all!

We are founding some corrupt records in some files. These records
contain corrupted characters below ASCII code 32 causing some
attributes to be displaced. The records contains ASCII codes under 32
and some attributes are moved (For example, the data in attribute 4
are in the attribute 5 because the data in attribute 3 contains bad
characters and insert the rest of the data to the attribute 4)

Wa are working with D3NT 7.5.4 in Windows 2003 Server. The dimension
of the files are correct and there is no program that affects these
attributes.

Any idea what could be corrupting the data?

(we have proven that no antivurus access the directories of databases)

Thanks,

Marcos Alonso Vega

Reply With Quote
  #4  
Old   
MAV
 
Posts: n/a

Default Re: Corrupt records - 08-09-2011 , 04:05 AM



hank you Tony and Jeff and sorry for not answering sooner.Now, I'm
reading this group with "SeaMonkey" because I can not read it with
Google Groups. I thought no one responded.

I found no disk errors. Nor do I find a common pattern.

When I fix the data, the records corrupted are different and there is
nothing corrupt in the Windows file system. No program affects that data
and if users navigating with arrows keys (or TAB or Esc key) the
affected data are different. We filter these keys and confirm the
existence of strange characteres (under ASCII 32) bofore writing the record.


Original record

034: 123454]124332
035:
036:
037: H
038: 941


Corrupted record

034: 123454]124.......
035: 941

The "." character is the ASCII 0. There is a "point" (ASCII 0) for each
character lost, including also marks attribute. The affected attributes
are random as the number of lost characters.

Some of these files are also affected by "Remote file Errors" that we
are not able to solve. I don´t know what is the main cause for the
occurence of a "Remote File Error". This is the other problem with this
system (7.5.4, 145 users, Windows 2003 server).

Thank you very much for the help

Marcos Alonso Vega

Reply With Quote
  #5  
Old   
Tony Gravagno
 
Posts: n/a

Default Re: Corrupt records - 08-09-2011 , 10:38 AM



MAV wrote:
Quote:
...Nor do I find a common pattern.

Original record

034: 123454]124332
035:
036:
037: H
038: 941


Corrupted record

034: 123454]124.......
035: 941

The "." character is the ASCII 0. There is a "point" (ASCII 0) for each
character lost, including also marks attribute. The affected attributes
are random as the number of lost characters.
It looks like there is some pattern there. Do you always lose 7
characters?

Reasoning: char0 is used in C/C++ to initialize and terminate strings.
This could be a byte count issue in code, or maybe a buffer overrun
situation, rather than random splattering of bytes. If the byte count
is different from the original it would indicate to me that the D3
item write routine was corrupted. One-for-one corruption could also
point to that, but a different kind of error, where the address of a
pointer is wrong when it's trying to create or pad a buffer somewhere.
A consistent number of bytes could identify a specific routine: for
example, which mode nulls a field of 7 bytes before it does something?

Quote:
Some of these files are also affected by "Remote file Errors" that we
are not able to solve.
I'd guess the file control block or item headers are getting zapped to
make the file inaccessible, in whole or in part.


Quote:
... (7.5.4, 145 users, Windows 2003 server).
Since this is only occurring in D3 and not in the Windows file system,
it does seem the issue is in the DBMS code. However:
- Are both D3 VME and FSI files affected? (Obviously FSI is with the
"remote file error", but VME too?)
- Is the D3 VME and/or FSI stored on a different physical or virtual
disk from the primary Windows system? In other words, if you have
C:\Program Files, etc, is D3 in D: or E:?

Do you have JET, perhaps ported from an older release?
I'm wondering if there is a %malloc or %write, or similar %functions
which are being used where a bug in the current D3 release.

Can you turn off Flash to see if the problems go away? Try Flashed
code for a while, then Unflashed. Hint: Rather than recompiling
everything, change all application catalog entries to set atb1 to VR1.
That turns flash off. Change back to VR to turn flash on again. If
the main entry point to the system is a menu in a login proc/macro,
just set the very first program to VR1 and everyhing called through
that chain will be unflashed.

Was anything recently changed? An upgrade? Patch? Some new process
implemented? Have you recently started testing something new like
FlashCONNECT, ODBC, or another connectivity product?

Reasoning: Software written for a different release might have some
incompatibility that corrupts workspace.

What does TL support say on this so far?

HTH
T

Reply With Quote
  #6  
Old   
MAV
 
Posts: n/a

Default Re: Corrupt records - 08-09-2011 , 11:08 AM



Tony Gravagno escribió:

Quote:
The "." character is the ASCII 0. There is a "point" (ASCII 0) for each
character lost, including also marks attribute. The affected attributes
are random as the number of lost characters.

It looks like there is some pattern there. Do you always lose 7
characters?
No, we don't always lose 7 caracteres. The number is variable.

Quote:
I'd guess the file control block or item headers are getting zapped to
make the file inaccessible, in whole or in part.
Interesting! We always suspected that some software (like antivirus)
could be doing this.


Quote:
... (7.5.4, 145 users, Windows 2003 server).

Since this is only occurring in D3 and not in the Windows file system,
it does seem the issue is in the DBMS code. However:
- Are both D3 VME and FSI files affected? (Obviously FSI is with the
"remote file error", but VME too?)
All files are in FSI. We have no files in VME

Quote:
- Is the D3 VME and/or FSI stored on a different physical or virtual
disk from the primary Windows system? In other words, if you have
C:\Program Files, etc, is D3 in D: or E:?
Program Files\D3 is in E: and E: is a second partition of the same
physical disk. We usually install D3 in a second partition.

Quote:
Do you have JET, perhaps ported from an older release?
I'm wondering if there is a %malloc or %write, or similar %functions
which are being used where a bug in the current D3 release.
We have not JET. However, some programs use the %socket functions,
although these programs do not affect the files with corrupt records.
Only programs with socket functions are flashed.

We will put more controls to try to find out exactly when data
corruption occurs. Another problem will be finding out why they occur,
the "Remote File Error".

Thank you very much Tony!

Marcos Alonso Vega

Reply With Quote
  #7  
Old   
Frank Winans
 
Posts: n/a

Default Re: Corrupt records - 08-09-2011 , 06:27 PM



"MAV" wrote


Quote:
Some of these files are also affected by "Remote file Errors" that we are
not able to solve. I don´t know what is the main cause for the occurence
of a "Remote File Error". This is the other problem with this system
(7.5.4, 145 users, Windows 2003 server).
I seem to recall there is a similar catchall message that you get when
windows security has the file locked by another user,
as for instance at tcl CT C:/ foo.txt
while wordpad is editing the c:\foo.txt file
{I think that is a locked situation; if not try

copy myfile foo
to: c:d:/

where d:\ drive is your cdrom,
which unless you've done something intricate,
is a readonly device. I guess you could tell windows
to map a drive letter Q: to some windows disk share
on your LAN that you don't have write access to,
and tell tcl to: c:q:/ instead of to: c:d:/
you might wonder why not just do
to: d:/
but I am not sure there is a d: setup in tcl.
I'm certain you cannot do to: q:/
'cause dm's DEVICES {? or DEVS?} file
only had a handfull of entries, of which one or
two were c and d last time I was on d3/nt.

Though it looks klunky,
the c:d:/ syntax means use the general c:
driver of tcl to access the windows
d: drive letter and the trailing
/ means the root directory of that drive letter.
D3 will _ not_ accept \ here--must use / instead.

Heck, windows 7 has a strong tendancy to deny even
Administrator write access to their own c:\ directory.
You may be refused or may have to answer a popup
that yes, you really really want to write to that directory.
How paranoid can ya get, anyway...?

Reply With Quote
  #8  
Old   
Frank Winans
 
Posts: n/a

Default Re: Corrupt records - 08-09-2011 , 06:31 PM



"Frank Winans"
wrote
Quote:
you might wonder why not just do
to: d:/
but I am not sure there is a d: setup in tcl.
I'm certain you cannot do to: q:/
'cause dm's DEVICES {? or DEVS?} file
only had a handfull of entries, of which one or
two were c and d last time I was on d3/nt.
Ah! neither Devices nor Devs,
the dm account HOSTS file has those tcl
prefixes like c: and dos: and bin:
as item ids.

Reply With Quote
  #9  
Old   
Tony Gravagno
 
Posts: n/a

Default Re: Corrupt records - 08-09-2011 , 09:07 PM



"Frank Winans" wrote:

Quote:
"Frank Winans"
wrote
you might wonder why not just do
to: d:/
but I am not sure there is a d: setup in tcl.
I'm certain you cannot do to: q:/
'cause dm's DEVICES {? or DEVS?} file
only had a handfull of entries, of which one or
two were c and d last time I was on d3/nt.

Ah! neither Devices nor Devs,
the dm account HOSTS file has those tcl
prefixes like c: and dos: and bin:
as item ids.
Uh, they're just items. You can create an item to represent any
drive, E, F, Q, R Z... See the documentation on the File HOSTS for
details.

T

>

Reply With Quote
  #10  
Old   
Tony Gravagno
 
Posts: n/a

Default Re: Corrupt records - 08-09-2011 , 09:28 PM



MAV wrote:

Quote:
Tony wrote:
It looks like there is some pattern there. Do you always lose 7
characters?

No, we don't always lose 7 caracteres. The number is variable.
OK, that doesn't solve the problem but it helps to define it better.


Quote:
Since this is only occurring in D3 and not in the Windows file system,
it does seem the issue is in the DBMS code. However:
- Are both D3 VME and FSI files affected? (Obviously FSI is with the
"remote file error", but VME too?)

All files are in FSI. We have no files in VME
You mean you have no data files for your application, but you DO have
a VME, and there ARE files in there. So I understand that you have
found no corruption in the VME, perhaps when you do a file-save?


Quote:
- Is the D3 VME and/or FSI stored on a different physical or virtual
disk from the primary Windows system? In other words, if you have
C:\Program Files, etc, is D3 in D: or E:?

Program Files\D3 is in E: and E: is a second partition of the same
physical disk. We usually install D3 in a second partition.
D3 data can be in a separate location from D3 program files. It
sounds like both program files and Data are in E:. You said other
Windows programs aren't affected. Have you done a chkdsk or other
diagnostics on the E drive?

Please respond to the other questions. It's important for you to
determine what New event triggered the problem, whether a patch,
upgrade, etc.

So far, without complete answers yet, it sounds like there is a
localized problem in the D3 FSI, which means you need to get TL
involved. That may change with more information. But at this point
I'd really like to know if you've called support, what they said, and
what they are doing to help diagnose this.

T

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.