dbTalk Databases Forums  

[BUGS] Urgent: Key constraints behaving weirdly

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] Urgent: Key constraints behaving weirdly in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Russell Garrett
 
Posts: n/a

Default [BUGS] Urgent: Key constraints behaving weirdly - 12-18-2003 , 05:55 AM






Your name : Russ Garrett
Your email address : russ (AT) last (DOT) fm

System Configuration
---------------------
Architecture (example: Intel Pentium) : P4 Xeon
Operating System (example: Linux 2.0.26 ELF) : Linux 2.4.22 ELF
PostgreSQL version (example: PostgreSQL-7.4) : PostgreSQL-7.4
Compiler used (example: gcc 2.95.2) : gcc version 3.2.2 20030222
(Red Hat Linux 3.2.2-5)

Please enter a FULL description of your problem:
------------------------------------------------

Constraints are being weird. The reproduction instructions speak for
themselves.
We're using this on a production database and we're understandably getting a
bit edgy.


Please describe a way to repeat the problem. Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------

last=> \d profile
....
Indexes:
"profile_pkey" primary key, btree (id)

last=> \di+ profile_pkey
List of relations
Schema | Name | Type | Owner | Description | Table
--------+--------------+-------+--------+-------------+---------
public | profile_pkey | index | lastfm | | profile
(1 row)

last=> UPDATE profile SET lastsongtime=now(), r1=r1+0, r2=r2+0, r3=r3+0,
r4=r4+0, r5=r5+0, scrobcount=scrobcount+1 WHERE id=1017057;
ERROR: duplicate key violates unique constraint "profile_pkey"

last=> SELECT * FROM profile WHERE id=1017057;
id | user | name | description | songspaminterval |
lastsongtime | entropy | newbie | creationdate | r1 | r2 |
r3 | r4 | r5 | scrobcount | newbielistenas | e1 | e2 | e3 | e4 | e5 |
numdisconnections | totalstreamtime
---------+---------+---------+-------------+------------------+-------------
--------+---------+--------+----------------------------+----+----+----+----
+----+------------+----------------+----+----+----+-----+----+--------------
-----+-----------------
1017057 | 1017075 | Default | | 60 | 2003-12-18
04:32:34 | 1 | f | 2003-12-17 23:57:01.582757 | 0 | 0 | 0 | 0
Quote:
0 | 23 | | 2 | 0 | 0 | 219 | 0 |
0 | 0
(1 row)

last=>


If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

No idea .


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo (AT) postgresql (DOT) org)


Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] Urgent: Key constraints behaving weirdly - 12-18-2003 , 09:52 AM






"Russell Garrett" <rg (AT) tcslon (DOT) com> writes:
Quote:
Constraints are being weird. The reproduction instructions speak for
themselves.
You haven't really provided enough info to let anyone do anything about
this. Certainly no one else is going to be able to reproduce the
problem based on what you've provided.

I can think of a number of theories:

1. There's a rule or trigger ON UPDATE that is modifying the behavior of
the update you showed, and is causing the duplicate-key error (ie,
there's a bug in the rule or trigger logic and the error report is
legitimate).

2. The index is corrupt, possibly due to a hardware glitch. (This seems
unlikely because the SELECT result appears normal, but I can't rule it
out entirely.)

3. You've stumbled across a corner-case bug that no one has seen before.

If the problem is #2 then the likely recovery method would be a REINDEX,
hopefully followed by some intensive hardware testing.

If the problem is #3 then it's fairly likely that a REINDEX would make
the failure go away, but we'd lose this opportunity to find the bug.

If you want to pursue theory #3, I would recommend the following
procedure:
A. Shut down postmaster for long enough to make a tarball copy
of the whole $PGDATA directory tree. You should then be able
to investigate the problem at leisure by loading the tarball
on a playpen machine.
B. REINDEX to (hopefully) make the problem go away on the
production machine, so it can get back to work.
C. Build a debug-enabled Postgres installation on your playpen
machine, and start tracing through the failure with gdb.

I'd be willing to help out if you don't feel competent to carry through
step C.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Reply With Quote
  #3  
Old   
Russell Garrett
 
Posts: n/a

Default Re: [BUGS] Urgent: Key constraints behaving weirdly - 12-18-2003 , 06:17 PM



Quote:
2. The index is corrupt, possibly due to a hardware glitch. (This
seems unlikely because the SELECT result appears normal, but I can't
rule it out entirely.)

We had a table error a few weeks back, however we re-imported the
table from scratch. So it may well be this. I doubt it's a actually a
Postgres bug now, since we aren't doing anything particularly
unusual, and it's been working fine for several weeks. Still, I'm not
ruling it out.
OK, I'm not sure whether this is index corruption or a postgres bug or both:

last=> select id, count(id) AS a FROM profile GROUP BY id ORDER BY a DESC;
id | a
---------+---
1017057 | 2
1018316 | 2
0 | 1
80 | 1
....

last=> SELECT id from profile WHERE id = 1017057;
id
---------
1017057
(1 row)

last=> SELECT id from profile WHERE id = 1018316;
id
---------
1018316
(1 row)

last=> reindex index profile_pkey;
ERROR: could not create unique index
DETAIL: Table contains duplicated values.


Does REINDEX use the current index to check for duplicates?

Many thanks for all the help,

Russ


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #4  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] Urgent: Key constraints behaving weirdly - 12-18-2003 , 09:57 PM



"Russell Garrett" <rg (AT) tcslon (DOT) com> writes:
Quote:
last=> reindex index profile_pkey;
ERROR: could not create unique index
DETAIL: Table contains duplicated values.
Okay, it looks like the table actually contains duplicate rows but the
index has managed to lose track of that fact. You can see the
duplicates in the GROUP BY query (which isn't using the index) but
not when you do "select * from profile where id = 1017057", because that
query will use the index.

If you did "set enable_indexscan = off" then the "select * from profile
where id = 1017057" query would probably show you two rows. I'd be
interested to see the results of

select ctid, oid, xmin, xmax, cmin, cmax from profile where id = 1017057;

with enable_indexscan off.

Quote:
Does REINDEX use the current index to check for duplicates?
No, it's building a new index from scratch, and so it notices the
duplicates.

What you've got here is definitely a case of index corruption that has
led to logical corruption of the table (ie, duplicate rows). To get
back to a valid state you will need to delete whichever of the
duplicates seems to be out-of-date, and then do a REINDEX. But I think
it is important first to try to determine what caused the corruption
(software error or hardware?). Again, if you can take a physical dump
of the data directory, that would provide an opportunity to study the
problem later after you get the production machine back on its feet.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Reply With Quote
  #5  
Old   
Russell Garrett
 
Posts: n/a

Default Re: [BUGS] Urgent: Key constraints behaving weirdly - 12-19-2003 , 07:36 AM



Tom Lane wrote:
Quote:
"Russell Garrett" <rg (AT) tcslon (DOT) com> writes:
last=> reindex index profile_pkey;
ERROR: could not create unique index
DETAIL: Table contains duplicated values.

Okay, it looks like the table actually contains duplicate rows but the
index has managed to lose track of that fact. You can see the
duplicates in the GROUP BY query (which isn't using the index) but
not when you do "select * from profile where id = 1017057", because
that query will use the index.
Ah, it went so quick I was sure it was using the index .

Quote:
If you did "set enable_indexscan = off" then the "select * from
profile where id = 1017057" query would probably show you two rows.
I'd be interested to see the results of

select ctid, oid, xmin, xmax, cmin, cmax from profile where id =
1017057;
with enable_indexscan off.
last=> set enable_indexscan = off;
SET
last=> select ctid, oid, xmin, xmax, cmin, cmax from profile where id =
1017057;
ctid | oid | xmin | xmax | cmin | cmax
-----------+----------+----------+----------+----------+------
(482,27) | 65196187 | 21095783 | 21953819 | 21953819 | 0
(3095,56) | 64140531 | 20037571 | 20037571 | 545 | 555
(2 rows)

Quote:
Does REINDEX use the current index to check for duplicates?

No, it's building a new index from scratch, and so it notices the
duplicates.
Yeah, I see now.

Quote:
What you've got here is definitely a case of index corruption that has
led to logical corruption of the table (ie, duplicate rows). To get
back to a valid state you will need to delete whichever of the
duplicates seems to be out-of-date, and then do a REINDEX. But I
think it is important first to try to determine what caused the
corruption (software error or hardware?). Again, if you can take a
physical dump of the data directory, that would provide an
opportunity to study the problem later after you get the production
machine back on its feet.
OK, I've kept a copy of the db and then fixed the problem.

We had an incidence of table corruption a few weeks back, but we just
ignored it (wishful thinking, maybe). Postgres is compiled normally, with
gcc3. OS is Red Hat 9.

Hardware is a Dell Poweredge 2.4Ghz Xeon (can't remember the model number),
DB is using XFS on a MegaRaid U320 controller, running the MegaRaid 1.18j
drivers. Naturally, the drivers don't report any errors. I doubt it's memory
corruption as the system has been rock-solid. So I'm guessing it's probably
MegaRaid, or XFS, or the kernel I suppose (although I'm using the latest).

Problem is, it's quite hard to reproduce the corruption, since it seems to
be fairly rare.

Maybe this should be moved to pgsql-general now?

Russ


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.