![]() | |
![]() |
| | Thread Tools | Display Modes |
#11
| |||
| |||
|
|
On 2011-01-04 22:19, Lennart Jonsson wrote: [...] SELECT * FROM ci *b WHERE EXISTS ( * * SELECT bu,ndc, count(1) * * FROM ci * * GROUP BY bu,ndc * * HAVING COUNT(1)>1 ) a where a.bu=b.bu and a.ndc = b.ndc I get an syntax error here, so I guess uncorrelated exists subqueries are not allowed either. You can rewrite this to: SELECT b.* FROM ci as b where exists ( select bu,ndc from ci as a where a.bu = b.bu and a.ndc = b.ndc group by *bu,ndc having count(1) > 1); +----+----+-----+ | id | bu | ndc | +----+----+-----+ | *1 | *1 | * 1 | | *2 | *1 | * 1 | | *4 | *2 | * 2 | | *5 | *2 | * 2 | | *6 | *1 | * 1 | +----+----+-----+ /Lennart |
#12
| |||
| |||
|
|
The 3rd query, I did get a syntax error on the select: select * from ci b where exists( select bu,ndc, count(1) from ci group by bu, ndc having count(1)>1 )a where a.bu=b.bu and a.ndc=b.ndc [...] I got syntax errors. You mentioned some other reason other than a syntax error? |
#13
| |||
| |||
|
|
On 2011-01-04 16:37, jr wrote: [...] The query is beautiful although it deleted 1085 rows. * When I did the first query to locate duplicates it found only 487. *Why is that? ????? Where did the extra This is the query I used to find the dupes. *Is that wrong or is your query wrong? *Did it delete the dupe and the original record as well? select bu,ndc, count(*) from ci group by bu,ndc having COUNT(concat(bu,nationaldrugcode)) > 1 I should have look more carefully, I just translated your attempt: DELETE FROM ci *b WHERE EXISTS (( SELECT bu,ndc, count(1) FROM ci GROUP BY bu,ndc HAVING *COUNT(1)>1 ) a where *a.bu=b.bu and a.ndc = b.ndc into delete b.* from ci as b join ( * SELECT bu,ndc * FROM ci * GROUP BY bu,ndc * HAVING *COUNT(1)>1 ) as a * * on a.bu=b.bu * *and a.ndc = b.ndc It is probably not what you want because it will delete *all* rows that are duplicated (including the first occurrence). Compare the result from: SELECT * FROM ci *b WHERE EXISTS ( * * SELECT bu,ndc, count(1) * * FROM ci * * GROUP BY bu,ndc * * HAVING COUNT(1)>1 ) a where a.bu=b.bu and a.ndc = b.ndc with: select b.* from ci as b join ( * SELECT bu,ndc * FROM ci * GROUP BY bu,ndc * HAVING *COUNT(1)>1 ) as a * * on a.bu=b.bu * *and a.ndc = b.ndc I wouldn't hide the table at a deeper level because users are using the web page and it is somewhat slow already because of a UNION query. *I could try it if all else fails. No, you misunderstood me. What I meant was that you can hide the table inside the delete stmt as in: SELECT * FROM ci *b WHERE EXISTS ( * * select * FROM ( * * * * SELECT bu,ndc, count(1) * * * * FROM ci * * * * GROUP BY bu,ndc * * * * HAVING COUNT(1)>1 * * ) AS X ) a where a.bu=b.bu and a.ndc = b.ndc but this will also delete more rows than you want (I guess). You will have to enumerate the duplicate rows somehow (the sequential unique id is probably your best option). Here's one attempt: delete b.* from ci as b where exists ( * * select 1 * * from ci as c * * where c.bu = b.bu * * * and c.ndc = b.ndc * * * and c.id < b.id ) But mysql does not allow this, so we'll rewrite that as a join: delete b.* from ci as b join ci as c * * on c.bu = b.bu * * and c.ndc = b.ndc * * and c.id < b.id Example: create table ci ( * * id int not null primary key, * * bu int not null, * * ndc int not null ) engine = innodb; insert into ci (id, bu, ndc) values (1,1,1),(2,1,1),(3,2,1),(4,2,2),(5,2,2),(6,1,1); The first query will delete *all* duplicates, i.e. all rows but the one with id = 3: mysql> select b.* * * -> from ci as b * * -> join ( * * -> * SELECT bu,ndc * * -> * FROM ci * * -> * GROUP BY bu,ndc * * -> * HAVING *COUNT(1)>1 * * -> ) as a * * -> * * on a.bu=b.bu * * -> * *and a.ndc = b.ndc; +----+----+-----+ | id | bu | ndc | +----+----+-----+ | *1 | *1 | * 1 | | *2 | *1 | * 1 | | *4 | *2 | * 2 | | *5 | *2 | * 2 | | *6 | *1 | * 1 | +----+----+-----+ The rewritten query leaves the duplicates with the lowest id: mysql> select b.* from ci as b * * -> join ci as c * * -> * * on c.bu = b.bu * * -> * * and c.ndc = b.ndc * * -> * * and c.id < b.id; +----+----+-----+ | id | bu | ndc | +----+----+-----+ | *2 | *1 | * 1 | | *5 | *2 | * 2 | | *6 | *1 | * 1 | | *6 | *1 | * 1 | +----+----+-----+ 4 rows in set (0.05 sec) mysql> delete b.* * * -> from ci as b * * -> join ci as c * * -> * *on c.bu = b.bu * * -> * and c.ndc = b.ndc * * -> * and c.id < b.id; Query OK, 3 rows affected (0.05 sec) mysql> select * from ci; +----+----+-----+ | id | bu | ndc | +----+----+-----+ | *1 | *1 | * 1 | | *3 | *2 | * 1 | | *4 | *2 | * 2 | +----+----+-----+ 3 rows in set (0.00 sec) Don't just take the code and execute it, try understand the similarities and differences from the first attempt. HTH /Lennart |
#14
| |||
| |||
|
|
On 2011-01-13 00:36, jr wrote: [...] The 3rd query, *I did get a syntax error on the select: select * from ci *b where exists( select bu,ndc, count(1) from ci group by bu, ndc having count(1)>1 )a where a.bu=b.bu and a.ndc=b.ndc [...] I got syntax errors. *You mentioned some other reason other than a syntax error? You already mentioned the two where clauses, fixing that leaves us with: ... exists ( * * select bu,ndc, count(1) * * from ci * * group by bu, ndc * * having count(1)>1 ) a exists is a predicate ( sometimes referred to as propositional function, you can think of it as an anonymous function returning true/false ) and you can't give it a name ( there is no point in doing so either ). Furthermore bu, ndc, etc inside the predicate is not visible outside the predicate, so you can't reference them like you do in: * where a.bu=b.bu * * and a.ndc=b.ndc You can however reference variables outside of the predicate from within the predicate, so select * from ci *b where exists( * * select bu,ndc, count(1) * * from ci as a * * where a.bu=b.bu * * and a.ndc=b.ndc * * group by bu, ndc * * having count(1)>1 ) is syntactically correct (will still give you the wrong result). /Lennart |
#15
| |||
| |||
|
|
Lennart, Regarding this last delete query, it worked on your small sample database but I tried it on my db of 46825 records and it didn't work. It left me with only 24,571 records. I did a query to get the num of duplicates and there were only 9996. Most of those were only duplicates, but there were some triplicates etc. So it deleted way too many and I had to kill the server eventually. I came up with 3 other queries that does the deleting in passes by comparing a table with the duplicate ids with a table containing the max(id)s. It seemed to work except something is wrong there also. It deleted all the duplicates, triplcates, quadruplicates, etc. It should have only deleted one of the duplicates with the max(id) and left the original duplicate. 1.create table ci_tmp1 as (select id,bu,ndc, count(*) from ci group by bu,ndc having COUNT(concat(bu,ndc)) > 1 ) |
#16
| |||
| |||
|
|
On 2011-01-17 22:41, jr wrote: [...] Lennart, Regarding this last delete query, it worked on your small sample database but *I tried it on my db of 46825 records and it didn't work.. It left me with only 24,571 records. *I did a query to get the num of duplicates and there were only 9996. *Most of those were only duplicates, but there were some triplicates etc. *So it deleted way too many and I had to kill the server eventually. * I came up with 3 other queries that does the deleting in passes by comparing a table with the duplicate ids with a table containing the max(id)s. *It seemed to work except something is wrong there also. It deleted all the duplicates, triplcates, quadruplicates, etc. *It should have only deleted one of the duplicates with the max(id) and left the original duplicate. 1.create table ci_tmp1 as (select id,bu,ndc, count(*) from ci group by bu,ndc having COUNT(concat(bu,ndc)) > 1 ) This query is illegal and will give you random results. It will only work if id is functionally dependent of bu, ndc. You can prevent this misbehaviour by enabling ONLY_FULL_GROUP_BY in sql_mode, mysql will then return an error instead of a wrong result. /Lennart [...] |
#17
| |||
| |||
|
|
This query is illegal and will give you random results. It will only work if id is functionally dependent of bu, ndc. You can prevent this misbehaviour by enabling ONLY_FULL_GROUP_BY in sql_mode, mysql will then return an error instead of a wrong result. /Lennart [...] thanks for explaining it. It was very useful. |
|
id | bu | ndc | count(1) | +----+----+-----+----------+ 1 | 1 | 1 | 3 | +----+----+-----+----------+ |
![]() |
| Thread Tools | |
| Display Modes | |
| |