dbTalk Databases Forums  

Deleting duplcated rows in Oracle

comp.database.oracle comp.database.oracle


Discuss Deleting duplcated rows in Oracle in the comp.database.oracle forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
gjeff@chariot.net.au
 
Posts: n/a

Default Deleting duplcated rows in Oracle - 08-26-2006 , 06:39 AM






Hi,

I have a problem and would invite any assistance available.

I am working on a database table, with approx 2.5 m rows, the
structure of the table is :


cntl_data varchar2(15),
cntl_no varchar2(10),
field1 varchar2(60),
field2 varchar2(60),
ETC..
field20 varchar2(60).


cntl_data is a date & time stamp - yy-mm-dd-hh:mm:ss format and is
unique.
cntl_no contains an integer value, and is non unique.

field1 to field20 contains text data, and could contain nulls in the
entire field.

What I would like to do is to delete all duplicated rows, with same
cntl_no, and field1 to fields20 having he same values, but retaining
the earliest row based on the value in cntl_data.

I have been trying different ways to do it but is unable to get the
correct result, and I as hoping that someone more knowledgable would
point me in the right direction.

I could be contacted via email for any questions, and my email adress
is gjeff (AT) chariot (DOT) net.au.

Thanking you all in anticipation.

gjeff.

Reply With Quote
  #2  
Old   
Ariel
 
Posts: n/a

Default Re: Deleting duplcated rows in Oracle - 09-07-2006 , 03:47 AM






gjeff (AT) chariot (DOT) net.au wrote:

Quote:
cntl_data varchar2(15),
cntl_no varchar2(10),
field1 varchar2(60),
field2 varchar2(60),
ETC..
field20 varchar2(60).

cntl_data is a date & time stamp - yy-mm-dd-hh:mm:ss format and is
unique.
unique - that's good.

Quote:
cntl_no contains an integer value, and is non unique.

field1 to field20 contains text data, and could contain nulls in the
entire field.
NULLs - possible problem.

Quote:
What I would like to do is to delete all duplicated rows, with same
cntl_no, and field1 to fields20 having he same values, but retaining
the earliest row based on the value in cntl_data.
Ignoring NULLs for now:

Method 1:

DELETE FROM table WHERE cntl_data IN

(SELECT b.cntl_data from table a, table b
WHERE a.cntl_data < b.cntl_data AND
a.cntl_no = b.cntl_no AND
a.field1 = b.field1 AND
etc....
)

What it does is join to itself, and then match all records having
identical data fields, as you mentioned, then it only returns the rows
where the a.cntl_data does not equal the b.cntl_data (to avoid a row
matching itself).

By using a.cntl_data < b.cntl_data I always make sure that the date in
the 'a' column will be less the one in the 'b' column, then I use the b
column in the select, and use that to delete values.

If cntl_data is not globally unique, then you can also return b.rowid
and delete based on that.

Now to deal with NULL's:

You can maybe use the NVL function to 'de-null' all the data. This is
assuming you have some value you can use as a substitute for null, that
will not otherwise be in the data.

Or, if not, then do this:

( a.field1 = b.field1 OR (a.field1 IS NULL AND b.field1 IS NULL) ) AND
( a.field2 = b.field2 OR (a.field2 IS NULL AND b.field2 IS NULL) ) AND
etc...

Method 2:

SELECT MAX(cntl_data) FROM table
GROUP BY cntl_no, field1, field2, etc.
HAVING COUNT(*) > 1

Then use this as a subquery for the delete. But the problem with this is
that if you have 3 duplicates for a row, only one of them will be
deleted. I suppose you can reverse the logic: use MIN(cntl_data), and
use that list as a list of rows to KEEP (NOT IN (subquery) assuming
cntl_data is never NULL). Or you can run the query multiple times till
it stops deleting stuff.

The main benefit of the first method is more flexibility in what counts
as a duplicate (use whatever logic you like for matching the columns),
with the second method you have to have an exact match for the group by.

-Ariel

PS. I didn't double check the SQL here, so please do some manual sanity
checking before running the code.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.