dbTalk Databases Forums  

merge two big tables

comp.databases.postgresql comp.databases.postgresql


Discuss merge two big tables in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Pet
 
Posts: n/a

Default merge two big tables - 06-18-2009 , 03:11 AM






I have one table A, which should be updated sometimes. I.e. I download
copy of A with some new records. I cannot just drop table A, because
there may be records from other sources. What is the best way to do
such update?

One option could be to create temporary table and do UNION on this two
tables. That would append new records without duplicates. Than create
all indexes on temp table, drop original table and rename temporary
table to original one.

I wonder if there is better way doing this.

Thanks, Pet

Reply With Quote
  #2  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: merge two big tables - 06-18-2009 , 03:46 AM






Pet wrote:
Quote:
I have one table A, which should be updated sometimes. I.e. I download
copy of A with some new records. I cannot just drop table A, because
there may be records from other sources. What is the best way to do
such update?

One option could be to create temporary table and do UNION on this two
tables. That would append new records without duplicates. Than create
all indexes on temp table, drop original table and rename temporary
table to original one.

I wonder if there is better way doing this.
Let's call the original table which you want to update "A" and
the copy of A that has some additional records "B".
You want to copy the additional records from B to A.

A manual way to do it would be:
- Load B into the same database as A.
- INSERT INTO a
(SELECT b.* FROM b LEFT OUTER JOIN
a USING(id)
WHERE a.id IS NULL);

Here "id" is the primary key.

If the amount of data is too large, or if you want it
in a more automated fashion, you might consider alternate
solutions, e.g. storing an update timestamp in each row
and extracting only the newly changed ones, or keeping a
boolean flag per row that is TRUE if the row was already
exported for import in A.

For the latter idea, you could use a statement like
UPDATE b SET exported=TRUE WHERE exported=FALSE RETURNING b.*
to export the new rows in B.

The really tough problem in a setup like yours is dealing
with conflicts: what if there is already a row in A coming
from some other source that has the same primary key as the
new row from B? Have you thought of that?

Yours,
Laurenz Albe

Reply With Quote
  #3  
Old   
Tep
 
Posts: n/a

Default Re: merge two big tables - 06-18-2009 , 04:27 AM



On 18 Jun., 10:46, "Laurenz Albe" <inv... (AT) spam (DOT) to.invalid> wrote:
Quote:
Pet wrote:
I have one table A, which should be updated sometimes. I.e. I download
copy of A with some new records. I cannot just drop table A, because
there may be records from other sources. What is the best way to do
such update?

One option could be to create temporary table and do UNION on this two
tables. That would append new records without duplicates. Than create
all indexes on temp table, drop original table and rename temporary
table to original one.

I wonder if there is better way doing this.

Let's call the original table which you want to update "A" and
the copy of A that has some additional records "B".
You want to copy the additional records from B to A.

A manual way to do it would be:
- Load B into the same database as A.
- INSERT INTO a
* * *(SELECT b.* FROM b LEFT OUTER JOIN
* * * * * * * * * * * a USING(id)
* * * WHERE a.id IS NULL);

Here "id" is the primary key.
That's it probably what I am looking for

Quote:
If the amount of data is too large, or if you want it
in a more automated fashion, you might consider alternate
solutions, e.g. storing an update timestamp in each row
and extracting only the newly changed ones, or keeping a
boolean flag per row that is TRUE if the row was already
exported for import in A.
I have no control on table B. There are no timestamps

Quote:
For the latter idea, you could use a statement like
UPDATE b SET exported=TRUE WHERE exported=FALSE RETURNING b.*
to export the new rows in B.

The really tough problem in a setup like yours is dealing
with conflicts: what if there is already a row in A coming
from some other source that has the same primary key as the
new row from B? Have you thought of that?
Yes, it is really a problem. Table A is not so big and wouldn't grow
fast. So, Id's or records are all below say 5 50000000. I insert
records from other sources with ID's above this upper bound. It is
very unlikely, that table to be imported ever reaches this upper bound
of rows. I have no idea how to solve this in other way...


Thanks for your help!!!

Pet

Quote:
Yours,
Laurenz Albe

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.