dbTalk Databases Forums  

sql to delete rows from joining a table to itself

comp.databases.oracle.misc comp.databases.oracle.misc


Discuss sql to delete rows from joining a table to itself in the comp.databases.oracle.misc forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Jeff Calico
 
Posts: n/a

Default sql to delete rows from joining a table to itself - 06-28-2009 , 07:51 PM






Hello everyone

I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.

Query:

SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker <
b.pos_marker) );

One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"

Any help would be greatly appreciated!
Jeff C.

Reply With Quote
  #2  
Old   
Ed Prochak
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-29-2009 , 08:03 AM






On Jun 28, 7:51*pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:
Quote:
Hello everyone

I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.

Query:

SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker
b.pos_marker) );

One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"

Any help would be greatly appreciated!
Jeff C.
Do you have a primary key on this Trafficdata table?
Assuming your results are the rows your really want to delete then a
delete of the form:

DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;

I suggest you do this AFTER doing a backup.

Ed

Reply With Quote
  #3  
Old   
Mark D Powell
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-29-2009 , 09:27 AM



On Jun 29, 8:03*am, Ed Prochak <edproc... (AT) gmail (DOT) com> wrote:
Quote:
On Jun 28, 7:51*pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:





Hello everyone

I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.

Query:

SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker
b.pos_marker) );

One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"

Any help would be greatly appreciated!
Jeff C.

Do you have a primary key on this Trafficdata table?
Assuming your results are the rows your really want to delete then a
delete of the form:

DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;

I suggest you do this AFTER doing a backup.

Ed- Hide quoted text -

- Show quoted text -
Assuming you want to keep one copy of each row the following will work

delete from table_name
where (key_list, rowid) in
( select keys, rowid from table_name
minus
select keys, min(rowid) from table_name
group by keys
)
/

HTH -- Mark D Powell --

Reply With Quote
  #4  
Old   
Jeff Calico
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-29-2009 , 12:41 PM



On Jun 29, 7:03*am, Ed Prochak <edproc... (AT) gmail (DOT) com> wrote:
Quote:
On Jun 28, 7:51*pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:



Hello everyone

I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.

Query:

SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker
b.pos_marker) );

One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"

Any help would be greatly appreciated!
Jeff C.

Do you have a primary key on this Trafficdata table?
Assuming your results are the rows your really want to delete then a
delete of the form:

DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;

I suggest you do this AFTER doing a backup.

Ed
Thanks for your reply. No, there isn't don't have a primary key on
this data table. I also can't make a backup of the TrafficData table,
due to its huge size, but I did make a smaller table to experiment on.

Jeff

Reply With Quote
  #5  
Old   
Peter Nilsson
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-30-2009 , 12:55 AM



On Jun 29, 9:51*am, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:
Quote:
Hello everyone

I am trying to delete rows from a massive data table that
are "almost" duplicates. I can do a nice query to find
them when I join the table with itself, but I can't
figure out how to form a delete statement out of this.

Query:

SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
Sic?

Quote:
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND
a.pos_marker < b.pos_marker) );
Note that if there are rows with the same (id, date,
flag = 'Q', dstop) tuple, but different batch numbers,
then you'll get every row for that tuple.

Is that really what you want? Or do you only want to
remove all but the the top pos_markers for every (id,
date, flag = 'Q', dstop, batch) tuple? Or something
else...?

Note also that this will potentially give you multiple
copies of some rows from a.

Quote:
One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly
one key preserved table"
Hint: A better select query might be...

select a.*
from trafficdata a
where a.flag = 'Q'
and exists
(select 1
from trafficdata b
where b.id = a.id
and b.date = a.date
and b.flag = a.flag
and b.dstop = a.dstop
and ( b.batch != a.batch
or b.batch = a.batch
and b.pos_marker > a.pos_marker));

--
Peter

Reply With Quote
  #6  
Old   
Shakespeare
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-30-2009 , 01:25 AM



Peter Nilsson schreef:
Quote:
On Jun 29, 9:51 am, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:
Hello everyone

I am trying to delete rows from a massive data table that
are "almost" duplicates. I can do a nice query to find
them when I join the table with itself, but I can't
figure out how to form a delete statement out of this.

Query:

SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date

Sic?

AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND
a.pos_marker < b.pos_marker) );

Note that if there are rows with the same (id, date,
flag = 'Q', dstop) tuple, but different batch numbers,
then you'll get every row for that tuple.

Is that really what you want? Or do you only want to
remove all but the the top pos_markers for every (id,
date, flag = 'Q', dstop, batch) tuple? Or something
else...?

Note also that this will potentially give you multiple
copies of some rows from a.

One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly
one key preserved table"

Hint: A better select query might be...

select a.*
from trafficdata a
where a.flag = 'Q'
and exists
(select 1
from trafficdata b
where b.id = a.id
and b.date = a.date
and b.flag = a.flag
and b.dstop = a.dstop
and ( b.batch != a.batch
or b.batch = a.batch
and b.pos_marker > a.pos_marker));

--
Peter
I took a quick look, but if there are corresponding rows on (id, date,
flag, dstop) where a.batch <> b.batch it looks like BOTH rows will be
deleted. I don't know if that is what you really wanted...

Shakespeare

Reply With Quote
  #7  
Old   
Mark D Powell
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-30-2009 , 09:26 AM



On Jun 29, 12:41*pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:
Quote:
On Jun 29, 7:03*am, Ed Prochak <edproc... (AT) gmail (DOT) com> wrote:





On Jun 28, 7:51*pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:

Hello everyone

I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.

Query:

SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker
b.pos_marker) );

One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"

Any help would be greatly appreciated!
Jeff C.

Do you have a primary key on this Trafficdata table?
Assuming your results are the rows your really want to delete then a
delete of the form:

DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;

I suggest you do this AFTER doing a backup.

Ed

Thanks for your reply. No, there isn't don't have a primary key on
this data table. I also can't make a backup of the TrafficData table,
due to its huge size, but I did make a smaller table to experiment on.

Jeff- Hide quoted text -

- Show quoted text -
The code I post will work when no PK or UK exists. The key_list is
just the list of columns that qualify the data to be considered
duplicate. If could be one column or all the columns in the row.

HTH -- Mark D Powell --

Reply With Quote
  #8  
Old   
Jeff Calico
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-30-2009 , 10:42 AM



On Jun 30, 8:26*am, Mark D Powell <Mark.Pow... (AT) eds (DOT) com> wrote:
Quote:
On Jun 29, 12:41*pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:



On Jun 29, 7:03*am, Ed Prochak <edproc... (AT) gmail (DOT) com> wrote:

On Jun 28, 7:51*pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:

Hello everyone

I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.

Query:

SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker
b.pos_marker) );

One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"

Any help would be greatly appreciated!
Jeff C.

Do you have a primary key on this Trafficdata table?
Assuming your results are the rows your really want to delete then a
delete of the form:

DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;

I suggest you do this AFTER doing a backup.

Ed

Thanks for your reply. No, there isn't don't have a primary key on
this data table. I also can't make a backup of the TrafficData table,
due to its huge size, but I did make a smaller table to experiment on.

Jeff- Hide quoted text -

- Show quoted text -

The code I post will work when no PK or UK exists. *The key_list is
just the list of columns that qualify the data to be considered
duplicate. *If could be one column or all the columns in the row.

HTH -- Mark D Powell --
Thanks, Mark. I was wondering about exactly that. I'll have another
go at it today.

--Jeff

Reply With Quote
  #9  
Old   
Shakespeare
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-30-2009 , 11:00 AM



Mark D Powell schreef:
Quote:
On Jun 29, 8:03 am, Ed Prochak <edproc... (AT) gmail (DOT) com> wrote:
On Jun 28, 7:51 pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:





Hello everyone
I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.
Query:
SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker
b.pos_marker) );
One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"
Any help would be greatly appreciated!
Jeff C.
Do you have a primary key on this Trafficdata table?
Assuming your results are the rows your really want to delete then a
delete of the form:

DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;

I suggest you do this AFTER doing a backup.

Ed- Hide quoted text -

- Show quoted text -

Assuming you want to keep one copy of each row the following will work

delete from table_name
where (key_list, rowid) in
( select keys, rowid from table_name
minus
select keys, min(rowid) from table_name
group by keys
)
/

HTH -- Mark D Powell --
Looks like you are skipping the condition that the record with the
largest pos_marker should be preserved, not the lowest rowid.
Furthermore, if a.batch != b.batch, both records would be deleted in the
original query (if it worked).

And is there no mutating table problem here? Won't this lead to
'snapshot too old' problems with large tables? (I'm not sure about that)

Shakespeare

Reply With Quote
  #10  
Old   
Sebastian Kolski
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-30-2009 , 03:22 PM



Shakespeare wrote:
Quote:
Mark D Powell schreef:
On Jun 29, 8:03 am, Ed Prochak <edproc... (AT) gmail (DOT) com> wrote:
On Jun 28, 7:51 pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:





Hello everyone
I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.
Query:
SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker
b.pos_marker) );
One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"
Any help would be greatly appreciated!
Jeff C.
Do you have a primary key on this Trafficdata table?
Assuming your results are the rows your really want to delete then a
delete of the form:

DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;

I suggest you do this AFTER doing a backup.

Ed- Hide quoted text -

- Show quoted text -

Assuming you want to keep one copy of each row the following will work

delete from table_name
where (key_list, rowid) in
( select keys, rowid from table_name
minus
select keys, min(rowid) from table_name
group by keys
)
/

HTH -- Mark D Powell --

Looks like you are skipping the condition that the record with the
largest pos_marker should be preserved, not the lowest rowid.
Furthermore, if a.batch != b.batch, both records would be deleted in the
original query (if it worked).

And is there no mutating table problem here? Won't this lead to
'snapshot too old' problems with large tables? (I'm not sure about that)

Shakespeare
I had similar problem. Large table with many rows to delete.
My solution was to write a short program which would connect to db, open full
table scan, reading all important columns and rowids. While reading i was making
in memory list of good rows to keep (actualy it was hash map) and for each row i
read i would check if i had row with similar important columns on my list, if no
i would add it to list, if yes i would check if it should replace row on list or
if it should be deleted. On second list i kept rowids rows to be deleted and
every time when its size reached 10k i was sending rowids from it to separate
deleter thread which would just loop through that rowids and issue delete from x
where rowid = y for each one.
So total cost of that solution was one full table scan plus numer of deletes by
rowid.
Since i was commiting every 10k deletes, snapshot too old was not a problem
because i was able to restart that job any time, and after each restart there
would be less rows in table.
Anyway in my case "cleaning" job duration droped from 120h (it was writen in
PL/SQL loop on the beginging) to 5h for table with ~150m rows (25m after
cleanup) and the bottleneck is now at redo log writers, how fast they are able
to log changes.

Sebastian K.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.