dbTalk Databases Forums  

sql to delete rows from joining a table to itself

comp.databases.oracle.misc comp.databases.oracle.misc


Discuss sql to delete rows from joining a table to itself in the comp.databases.oracle.misc forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Peter Nilsson
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-30-2009 , 07:45 PM






Shakespeare <what... (AT) xs4all (DOT) nl> wrote:
Quote:
Peter Nilsson schreef:
...
Note that if there are rows with the same (id, date,
flag = 'Q', dstop) tuple, but different batch numbers,
then you'll get every row for that tuple.

Is that really what you want?...
snip
Hint: A better select query might be...
select a.*
from trafficdata a
where a.flag = 'Q'
and exists
(select 1
from trafficdata b
where b.id = a.id
and b.date = a.date
and b.flag = a.flag
and b.dstop = a.dstop
and ( b.batch != a.batch
or b.batch = a.batch
and b.pos_marker > a.pos_marker));

I took a quick look, but if there are corresponding rows
on (id, date, flag, dstop) where a.batch <> b.batch it
looks like BOTH rows will be deleted. I don't know if
that is what you really wanted...
Is there an echo in here?! :-)

The query I posted replicates the semantics of the OP's
original select, except that it doesn't repeat matching
rows from the target table.

The hint about my script maybe being better was that it's
a one line change to make it delete instead of select rows.

That sort of select is useful in situations like formal
data change requests where you have to confirm the affected
rows with pre and post selects. Performing a review of the
change script prior to running in production is much easier
if the select and delete (or update) are all but identical.

Of course, it's important to get the select right in the
first place. ;-)

--
Peter

Reply With Quote
  #12  
Old   
Ed Prochak
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 06-30-2009 , 08:14 PM






On Jun 29, 12:41*pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:
Quote:
On Jun 29, 7:03*am, Ed Prochak <edproc... (AT) gmail (DOT) com> wrote:



On Jun 28, 7:51*pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:

Hello everyone

I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.

Query:

SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker
b.pos_marker) );

One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"

Any help would be greatly appreciated!
Jeff C.

Do you have a primary key on this Trafficdata table?
Assuming your results are the rows your really want to delete then a
delete of the form:

DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;

I suggest you do this AFTER doing a backup.

Ed

Thanks for your reply. No, there isn't don't have a primary key on
this data table. I also can't make a backup of the TrafficData table,
due to its huge size, but I did make a smaller table to experiment on.

Jeff
As you saw, ROWID works for this case.

But about backup, you are saying you (a developer) cannot back up the
table, right? I meant for the DBA to do a backup before changing a
table that is so massive. (And if the DBA cannot do the backup you
have bigger problems than duplicates!)

Good luck.
Ed

Reply With Quote
  #13  
Old   
Shakespeare
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 07-01-2009 , 01:30 AM



Peter Nilsson schreef:
Quote:
Shakespeare <what... (AT) xs4all (DOT) nl> wrote:
Peter Nilsson schreef:
...
Note that if there are rows with the same (id, date,
flag = 'Q', dstop) tuple, but different batch numbers,
then you'll get every row for that tuple.

Is that really what you want?...
snip
Hint: A better select query might be...
select a.*
from trafficdata a
where a.flag = 'Q'
and exists
(select 1
from trafficdata b
where b.id = a.id
and b.date = a.date
and b.flag = a.flag
and b.dstop = a.dstop
and ( b.batch != a.batch
or b.batch = a.batch
and b.pos_marker > a.pos_marker));
I took a quick look, but if there are corresponding rows
on (id, date, flag, dstop) where a.batch <> b.batch it
looks like BOTH rows will be deleted. I don't know if
that is what you really wanted...

Is there an echo in here?! :-)
No, not in here, but in my other post. I noticed the deletion of too
many rows in your query, and later found out that the original query did
the same, and luckily did not work!

Quote:
The query I posted replicates the semantics of the OP's
original select, except that it doesn't repeat matching
rows from the target table.

The hint about my script maybe being better was that it's
a one line change to make it delete instead of select rows.

That sort of select is useful in situations like formal
data change requests where you have to confirm the affected
rows with pre and post selects. Performing a review of the
change script prior to running in production is much easier
if the select and delete (or update) are all but identical.

Of course, it's important to get the select right in the
first place. ;-)
Right, especially when there is no space for backups!

Quote:
--
Peter
Shakespeare

Reply With Quote
  #14  
Old   
Mark D Powell
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 07-03-2009 , 03:12 PM



On Jun 30, 11:00*am, Shakespeare <what... (AT) xs4all (DOT) nl> wrote:
Quote:
Mark D Powell schreef:





On Jun 29, 8:03 am, Ed Prochak <edproc... (AT) gmail (DOT) com> wrote:
On Jun 28, 7:51 pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:

Hello everyone
I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.
Query:
SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker
b.pos_marker) );
One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"
Any help would be greatly appreciated!
Jeff C.
Do you have a primary key on this Trafficdata table?
Assuming your results are the rows your really want to delete then a
delete of the form:

DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;

I suggest you do this AFTER doing a backup.

Ed- Hide quoted text -

- Show quoted text -

Assuming you want to keep one copy of each row the following will work

delete from table_name
* where (key_list, rowid) in
* * * * ( select *keys, rowid from table_name
* * * * * minus
* * * * * select *keys, min(rowid) from table_name
* * * * * group by keys
* * * * )
/

HTH -- Mark D Powell --

Looks like you are skipping the condition that the record with the
largest pos_marker should be preserved, not the lowest rowid.
Furthermore, if a.batch != b.batch, both records would be deleted in the
original query (if it worked).

And is there no mutating table problem here? Won't this lead to
'snapshot too old' problems with large tables? (I'm not sure about that)

Shakespeare- Hide quoted text -

- Show quoted text -
The delete is a generic template. It would be very easy to modify it
to keep the max whatever instead of the min rowid.

Mutating tables happen only when a trigger is involved and you attempt
to modify the base table from the trigger. There is no mention of a
trigger; however there is mention of a view error. The delete should
probably be aimed at one or more of the base tables in the view.
Remove the extra rows from the base table then the view will display
the right results since the data has been fixed. Either that or the
duplcate data should exist and it is the view definition which needs
fixing to filter out what is being called duplicate. I have seen both
issues over my career.

HTH -- Mark D Powell --

Reply With Quote
  #15  
Old   
Shakespeare
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 07-04-2009 , 03:32 AM



Mark D Powell schreef:
Quote:
On Jun 30, 11:00 am, Shakespeare <what... (AT) xs4all (DOT) nl> wrote:
Mark D Powell schreef:





On Jun 29, 8:03 am, Ed Prochak <edproc... (AT) gmail (DOT) com> wrote:
On Jun 28, 7:51 pm, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:
Hello everyone
I am trying to delete rows from a massive data table that are "almost"
duplicates. I can do a nice query
to find them when I join the table with itself, but I can't figure out
how to form a delete statement out
of this.
Query:
SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date
AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker
b.pos_marker) );
One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly one key preserved
table"
Any help would be greatly appreciated!
Jeff C.
Do you have a primary key on this Trafficdata table?
Assuming your results are the rows your really want to delete then a
delete of the form:
DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;
I suggest you do this AFTER doing a backup.
Ed- Hide quoted text -
- Show quoted text -
Assuming you want to keep one copy of each row the following will work
delete from table_name
where (key_list, rowid) in
( select keys, rowid from table_name
minus
select keys, min(rowid) from table_name
group by keys
)
/
HTH -- Mark D Powell --
Looks like you are skipping the condition that the record with the
largest pos_marker should be preserved, not the lowest rowid.
Furthermore, if a.batch != b.batch, both records would be deleted in the
original query (if it worked).

And is there no mutating table problem here? Won't this lead to
'snapshot too old' problems with large tables? (I'm not sure about that)

Shakespeare- Hide quoted text -

- Show quoted text -

The delete is a generic template. It would be very easy to modify it
to keep the max whatever instead of the min rowid.

Mutating tables happen only when a trigger is involved and you attempt
to modify the base table from the trigger.
Right, missed that part.
But how about the snapshot too old issue? I think to remember that a
'delete from a where exists (select .. from a)' with large tables can
cause that error.. or was that only with use of cursors?


Quote:
There is no mention of a
trigger; however there is mention of a view error. The delete should
probably be aimed at one or more of the base tables in the view.
Yes. Because there is not exactly one key preserved table, it can not
determined from which table the row should be deleted (eventhough they
are actually the same one)

Quote:
Remove the extra rows from the base table then the view will display
the right results since the data has been fixed. Either that or the
duplcate data should exist and it is the view definition which needs
fixing to filter out what is being called duplicate. I have seen both
issues over my career.

HTH -- Mark D Powell --



Shakespeare

Reply With Quote
  #16  
Old   
Jeff Calico
 
Posts: n/a

Default Re: sql to delete rows from joining a table to itself - 07-08-2009 , 02:28 PM



On Jun 29, 11:55*pm, Peter Nilsson <ai... (AT) acay (DOT) com.au> wrote:
Quote:
On Jun 29, 9:51*am, Jeff Calico <jeffcal... (AT) gmail (DOT) com> wrote:

Hello everyone

I am trying to delete rows from a massive data table that
are "almost" duplicates. I can do a nice query to find
them when I join the table with itself, but I can't
figure out how to form a delete statement out of this.

Query:

SELECT a.*
FROM Trafficdata a, Trafficdata b
WHERE a.id = b.id
AND a.date = b.date

Sic?

AND a.flag = 'Q'
AND b.flag = 'Q'
AND a.dstop = b.dstop
AND (a.batch != b.batch OR (a.batch = b.batch AND
a.pos_marker < b.pos_marker) );

Note that if there are rows with the same (id, date,
flag = 'Q', dstop) tuple, but different batch numbers,
then you'll get every row for that tuple.

Is that really what you want? Or do you only want to
remove all but the the top pos_markers for every (id,
date, flag = 'Q', dstop, batch) tuple? Or something
else...?

Note also that this will potentially give you multiple
copies of some rows from a.

One idea I tried after searching the web was
DELETE FROM ( <above query> ); but that gives
ORA-01752 "Cannot delete from view without exactly
one key preserved table"

Hint: A better select query might be...

* select a.*
* * from trafficdata a
* *where a.flag = 'Q'
* * *and exists
* * * * * *(select 1
* * * * * * * from trafficdata b
* * * * * * *where b.id = a.id
* * * * * * * *and b.date = a.date
* * * * * * * *and b.flag = a.flag
* * * * * * * *and b.dstop = a.dstop
* * * * * * * *and ( * b.batch != a.batch
* * * * * * * * * * or b.batch *= a.batch
* * * * * * * * * * * * *and b.pos_marker > a.pos_marker));

--
Peter
Thanks for your reply, Peter (and thanks to everyone else who
responded!). Sorry for the delay in getting back to this thread, but
as you may have experienced yourselves, an issue that is really
important to management one day can be replaced by another hot potato
issue the next day.

You are correct that the original query was flawed, as I saw by
creating a small table and using rowids to pinpoint exactly which row
I was returning. I made the query work by getting all duplicate rows
with rowid created than the min(rowid) for that set of data. It works
great on a small table, but for 30+ million rows, it churns away for
many hours and isn't practical to do in the production enviroment.

--Jeff

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.