![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
I need to delete the dupes across bu's. *The records are unique based on bu+ndc but there are 400+ duplicates.. There is a sequential id that is unique. *I am trying to remove these 400+ records. 1. *SELECT * FROM ci GROUP BY bu,ndc HAVING COUNT(concat(bu,ndc) )>1 This query yields 400+ dupes which I need to remove. 2. I could do this query in Oracle but not in MySQL. You cannot delete from the same table as in the subquery. *DELETE FROM ci *b WHERE EXISTS (( SELECT bu,ndc, count(1) FROM ci GROUP BY bu,ndc HAVING *COUNT(1)>1 ) a where *a.bu=b.bu and a.ndc = b.ndc 3. * I need to create a temp table with the unique ids in order to compare the ci table to the duplicates. I create the temp table with the SELECT from above that now has the duplicate records in it with the id. CREATE TABLE *ci_tmp AS (SELECT c.id,c.bu,c.zonenm,c.ndc FROM ci c GROUP BY c.bu,c.ndc HAVING *COUNT(concat(c.bu,c.ndc) )>1) 4. *Next I have to run my delete query matching against the ids in the temp table. AT this point I am not sure exactly how to do this or what I need to delete. This is what I have but it isn't right because it doesn't delete any records. DELETE FROM cart_inventory WHERE id IN (select id FROM cart_inventory_tmp GROUP BY bu,nationaldrugcode HAVING COUNT(concat(bu,nationaldrugcode) )>1) I also tried * deleting using max(id) but I don't know if it requires a self-join and I think the above process might be easier? delete *FROM cart_inventory as as a max(id) where *concat(a.bu, a.ndc)=concat(b.bu,b.ndc) If you could explain the process of Query#4 what I am trying to match against that might help but I basically understand that you need to get the id of one of the duplicates from the temp table and match it against the ci table and delete that id. It seems like it should work but it doesn't because I run query # 4 and do the select for the dupes and they are all still there.. thanks, |
#3
| |||
| |||
|
|
On Jan 3, 7:37*pm, jr <jlro... (AT) yahoo (DOT) com> wrote: I need to delete the dupes across bu's. *The records are unique based on bu+ndc but there are 400+ duplicates.. There is a sequential id that is unique. *I am trying to remove these 400+ records. 1. *SELECT * FROM ci GROUP BY bu,ndc HAVING COUNT(concat(bu,ndc) )>1 This query yields 400+ dupes which I need to remove. 2. I could do this query in Oracle but not in MySQL. You cannot delete from the same table as in the subquery. *DELETE FROM ci *b WHERE EXISTS (( SELECT bu,ndc, count(1) FROM ci GROUP BY bu,ndc HAVING *COUNT(1)>1 ) a where *a.bu=b.bu and a.ndc = b.ndc 3. * I need to create a temp table with the unique ids in order to compare the ci table to the duplicates. I create the temp table with the SELECT from above that now has the duplicate records in it with the id. CREATE TABLE *ci_tmp AS (SELECT c.id,c.bu,c.zonenm,c.ndc FROM ci c GROUP BY c.bu,c.ndc HAVING *COUNT(concat(c.bu,c.ndc) )>1) 4. *Next I have to run my delete query matching against the ids in the temp table. AT this point I am not sure exactly how to do this or what I need to delete. This is what I have but it isn't right because it doesn't delete any records. DELETE FROM cart_inventory WHERE id IN (select id FROM cart_inventory_tmp GROUP BY bu,nationaldrugcode HAVING COUNT(concat(bu,nationaldrugcode) )>1) I also tried * deleting using max(id) but I don't know if it requires a self-join and I think the above process might be easier? delete *FROM cart_inventory as as a max(id) where *concat(a.bu, a.ndc)=concat(b.bu,b.ndc) If you could explain the process of Query#4 what I am trying to match against that might help but I basically understand that you need to get the id of one of the duplicates from the temp table and match it against the ci table and delete that id. It seems like it should work but it doesn't because I run query # 4 and do the select for the dupes and they are all still there.. thanks, 1) are the rows truely DUPLICATE or do you just have a bu,ndc duplicate - are there other columns and are they also duplicate or are they unique? *If "unique", which one is correct? See:http://dev.mysql.com/doc/refman/5.5/...-table.htmland search for IGNORE "IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key, The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value." ALTER IGNORE TABLE ci ADD UNIQUE KEY (bu,ndc ) ; How it knows which one is "first" I do not know... I would test this before actually executingto ensure you are purging the correct data. Next step - figure out why you are getting unwanted dupes and fix the problem. |
#4
| |||
| |||
|
|
On Jan 3, 7:37*pm, jr <jlro... (AT) yahoo (DOT) com> wrote: I need to delete the dupes across bu's. *The records are unique based on bu+ndc but there are 400+ duplicates.. There is a sequential id that is unique. *I am trying to remove these 400+ records. 1. *SELECT * FROM ci GROUP BY bu,ndc HAVING COUNT(concat(bu,ndc) )>1 This query yields 400+ dupes which I need to remove. 2. I could do this query in Oracle but not in MySQL. You cannot delete from the same table as in the subquery. *DELETE FROM ci *b WHERE EXISTS (( SELECT bu,ndc, count(1) FROM ci GROUP BY bu,ndc HAVING *COUNT(1)>1 ) a where *a.bu=b.bu and a.ndc = b.ndc 3. * I need to create a temp table with the unique ids in order to compare the ci table to the duplicates. I create the temp table with the SELECT from above that now has the duplicate records in it with the id. CREATE TABLE *ci_tmp AS (SELECT c.id,c.bu,c.zonenm,c.ndc FROM ci c GROUP BY c.bu,c.ndc HAVING *COUNT(concat(c.bu,c.ndc) )>1) 4. *Next I have to run my delete query matching against the ids in the temp table. AT this point I am not sure exactly how to do this or what I need to delete. This is what I have but it isn't right because it doesn't delete any records. DELETE FROM cart_inventory WHERE id IN (select id FROM cart_inventory_tmp GROUP BY bu,nationaldrugcode HAVING COUNT(concat(bu,nationaldrugcode) )>1) I also tried * deleting using max(id) but I don't know if it requires a self-join and I think the above process might be easier? delete *FROM cart_inventory as as a max(id) where *concat(a.bu, a.ndc)=concat(b.bu,b.ndc) If you could explain the process of Query#4 what I am trying to match against that might help but I basically understand that you need to get the id of one of the duplicates from the temp table and match it against the ci table and delete that id. It seems like it should work but it doesn't because I run query # 4 and do the select for the dupes and they are all still there.. thanks, 1) are the rows truely DUPLICATE or do you just have a bu,ndc duplicate - are there other columns and are they also duplicate or are they unique? *If "unique", which one is correct? See:http://dev.mysql.com/doc/refman/5.5/...-table.htmland search for IGNORE "IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key, The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value." ALTER IGNORE TABLE ci ADD UNIQUE KEY (bu,ndc ) ; How it knows which one is "first" I do not know... I would test this before actually executingto ensure you are purging the correct data. Next step - figure out why you are getting unwanted dupes and fix the problem. |
#5
| |||
| |||
|
|
2. I could do this query in Oracle but not in MySQL. You cannot delete from the same table as in the subquery. DELETE FROM ci b WHERE EXISTS (( SELECT bu,ndc, count(1) FROM ci GROUP BY bu,ndc HAVING COUNT(1)>1 ) a where a.bu=b.bu and a.ndc = b.ndc |
#6
| |||
| |||
|
|
On 2011-01-04 01:37, jr wrote: [...] 2. I could do this query in Oracle but not in MySQL. You cannot delete from the same table as in the subquery. *DELETE FROM ci *b WHERE EXISTS (( SELECT bu,ndc, count(1) FROM ci GROUP BY bu,ndc HAVING *COUNT(1)>1 ) a where *a.bu=b.bu and a.ndc = b.ndc You can hide the table at a deeper nesting level, but it will probably be very slow. Another option is to delete from a join (non-standard) like (untested): delete b.* from ci as b join ( * SELECT bu,ndc * FROM ci * GROUP BY bu,ndc * HAVING *COUNT(1)>1 ) as a * * on a.bu=b.bu * *and a.ndc = b.ndc If bu, ndu is a candidate key, you might want to add a unique constraint: * * alter table ci add constraint ... unique (bu, ndc); to enforce this. [...] /Lennart |
#7
| |||||
| |||||
|
|
The query is beautiful although it deleted 1085 rows. When I did the first query to locate duplicates it found only 487. Why is that? ????? Where did the extra This is the query I used to find the dupes. Is that wrong or is your query wrong? Did it delete the dupe and the original record as well? select bu,ndc, count(*) from ci group by bu,ndc having COUNT(concat(bu,nationaldrugcode)) > 1 |
|
I wouldn't hide the table at a deeper level because users are using the web page and it is somewhat slow already because of a UNION query. I could try it if all else fails. |
|
id | bu | ndc | +----+----+-----+ 1 | 1 | 1 | 2 | 1 | 1 | 4 | 2 | 2 | 5 | 2 | 2 | 6 | 1 | 1 | +----+----+-----+ |
|
id | bu | ndc | +----+----+-----+ 2 | 1 | 1 | 5 | 2 | 2 | 6 | 1 | 1 | 6 | 1 | 1 | +----+----+-----+ |
|
id | bu | ndc | +----+----+-----+ 1 | 1 | 1 | 3 | 2 | 1 | 4 | 2 | 2 | +----+----+-----+ |
#8
| |||
| |||
|
|
SELECT * FROM ci b WHERE EXISTS ( SELECT bu,ndc, count(1) FROM ci GROUP BY bu,ndc HAVING COUNT(1)>1 ) a where a.bu=b.bu and a.ndc = b.ndc |
|
id | bu | ndc | +----+----+-----+ 1 | 1 | 1 | 2 | 1 | 1 | 4 | 2 | 2 | 5 | 2 | 2 | 6 | 1 | 1 | +----+----+-----+ |
#9
| |||
| |||
|
|
On 2011-01-04 22:19, Lennart Jonsson wrote: [...] SELECT * FROM ci b WHERE EXISTS ( SELECT bu,ndc, count(1) FROM ci GROUP BY bu,ndc HAVING COUNT(1)>1 ) a where a.bu=b.bu and a.ndc = b.ndc I get an syntax error here |
#10
| |||
| |||
|
|
On 2011-01-04 22:38, Lennart Jonsson wrote: On 2011-01-04 22:19, Lennart Jonsson wrote: [...] SELECT * FROM ci *b WHERE EXISTS ( * * SELECT bu,ndc, count(1) * * FROM ci * * GROUP BY bu,ndc * * HAVING COUNT(1)>1 ) a where a.bu=b.bu and a.ndc = b.ndc I get an syntax error here Never mind, there where several other reasons for the syntax error, I just did not notice them. |
![]() |
| Thread Tools | |
| Display Modes | |
| |