![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
|
I'v find out that this error occurs in: dependency.c file 2004-04-26 11:09:34 ERROR: dependency.c 1621: cache lookup of relation 149064743 failed 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist in getRelationDescription(StringInfo buffer, Oid relid) function. Any ideas what can cause this errors. |
#2
| |||
| |||
|
|
But, I am suspecting that it's a race condition with the new background writer code. |
|
psql:test-end2.sql:3: ERROR: cache lookup failed for relation 398033 CONTEXT: SQL query "SELECT TRUE FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname = 'tmptbl'::TEXT AND c.relkind = 'r'::TEXT AND pg_catalog.pg_table_is_visible(c.oid)" |
#3
| |||
| |||
|
|
But, I am suspecting that it's a race condition with the new background writer code. Why? Have you demonstrated that the failure does not occur in 7.4? |
|
psql:test-end2.sql:3: ERROR: cache lookup failed for relation 398033 CONTEXT: SQL query "SELECT TRUE FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname = 'tmptbl'::TEXT AND c.relkind = 'r'::TEXT AND pg_catalog.pg_table_is_visible(c.oid)" I think that pg_table_is_visible() will inspect the catalogs using SnapshotNow semantics, while the above query will feed it with OIDs that were valid under a start-of-query snapshot. So I'd expect failures in any recent PG version, if tables are being dynamically created/deleted by concurrent transactions. |
#4
| |||
| |||
|
|
I'v find out that this error occurs in: dependency.c file 2004-04-26 11:09:34 ERROR: dependency.c 1621: cache lookup of relation 149064743 failed 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist in getRelationDescription(StringInfo buffer, Oid relid) function. Any ideas what can cause this errors. aol>Me too.</aol But, I am suspecting that it's a race condition with the new background writer code. I've started testing a new database design and was able to reproduce this on my laptop nearly 90% of the time, but could only reproduce it about 10% of the time on my production databases until I figured out what the difference was, fsync. temp tables don't use the shared buffer cache, how can this be related to the BG writer? |
#5
| |||
| |||
|
|
temp tables don't use the shared buffer cache, how can this be related to the BG writer? Don't the system catalogs use the shared buffer cache? BEGIN; SELECT create_temp_table_func(); -- Inserts a row into pg_class via CREATE TEMP TABLE -- Do other stuff COMMIT; -- After the commit, the row is now visible to other backends -- disconnect -- If the delay between the disconnect and reconnect is small enough -- reconnect -- It's as though there is a race condition that allows the function -- pg_table_is_visible() to assert the "cache lookup of relation" -- error. BEGIN; SELECT create_temp_table_func(); -- Before the CREATE TEMP TABLE, I call /* SELECT TRUE FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname = ''footmp''::TEXT AND c.relkind = ''r''::TEXT AND pg_catalog.pg_table_is_visible(c.oid); */ -- But the query fails My guess was that the series of events went something like: proc 0) COMMIT's and the row in pg_class is committed proc 1) bgwriter writer code removes a page for the cache proc 2) queries for the page[*] proc 1) writes it to disk proc 2) queries for the page[*] proc 1) sync's the fd[*] proc 2 queries for the page at either of these points In 7.4, there is no bgwriter or background process mucking with cache, Except for the checkpoint process, which does exactly the same as the bgwriter does, and ALL concurrent backends whenever they feel the need to evict a dirty buffer. |
|
If it makes a difference if a pg_class page is dirty in the buffer or copied out to disk with respect to visibility rules of the tuples contained in it, then the whole thing is a way larger bug than the one in MIB. First of all, committed or not, a temp object from one session should NEVER be visible in any other. |
#6
| |||
| |||
|
|
I'v find out that this error occurs in: dependency.c file 2004-04-26 11:09:34 ERROR: dependency.c 1621: cache lookup of relation 149064743 failed 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist in getRelationDescription(StringInfo buffer, Oid relid) function. Any ideas what can cause this errors. aol>Me too.</aol But, I am suspecting that it's a race condition with the new background writer code. I've started testing a new database design and was able to reproduce this on my laptop nearly 90% of the time, but could only reproduce it about 10% of the time on my production databases until I figured out what the difference was, fsync. |
#7
| |||
| |||
|
|
I'v find out that this error occurs in: dependency.c file 2004-04-26 11:09:34 ERROR: dependency.c 1621: cache lookup of relation 149064743 failed 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist in getRelationDescription(StringInfo buffer, Oid relid) function. Any ideas what can cause this errors. aol>Me too.</aol But, I am suspecting that it's a race condition with the new background writer code. I've started testing a new database design and was able to reproduce this on my laptop nearly 90% of the time, but could only reproduce it about 10% of the time on my production databases until I figured out what the difference was, fsync. temp tables don't use the shared buffer cache, how can this be related to the BG writer? Don't the system catalogs use the shared buffer cache? BEGIN; SELECT create_temp_table_func(); -- Inserts a row into pg_class via CREATE TEMP TABLE -- Do other stuff COMMIT; -- After the commit, the row is now visible to other backends -- disconnect -- If the delay between the disconnect and reconnect is small enough -- reconnect -- It's as though there is a race condition that allows the function -- pg_table_is_visible() to assert the "cache lookup of relation" -- error. BEGIN; SELECT create_temp_table_func(); -- Before the CREATE TEMP TABLE, I call /* SELECT TRUE FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname = ''footmp''::TEXT AND c.relkind = ''r''::TEXT AND pg_catalog.pg_table_is_visible(c.oid); */ -- But the query fails My guess was that the series of events went something like: proc 0) COMMIT's and the row in pg_class is committed proc 1) bgwriter writer code removes a page for the cache proc 2) queries for the page[*] proc 1) writes it to disk proc 2) queries for the page[*] proc 1) sync's the fd [*] proc 2 queries for the page at either of these points In 7.4, there is no bgwriter or background process mucking with cache, |
|
which is why this works 100% of the time. In 7.5, however, there's a 200ms gap where a race condition appears and pg_table_is_visible() fails its PointerIsValid() check. If I put a sleep in, the sleep gives the bgwriter enough time to commit the pages to disk so that the queries for the page happen after the fd's been sync()'ed. I have no other clue as to why this would be happening though, so believe me when I say, I could very well be quite wrong.... but this is my best, quasi-educated/grep(1)'ed guess. -sc |
![]() |
| Thread Tools | |
| Display Modes | |
| |