![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
#3
| |||
| |||
|
#4
| |||
| |||
|
#5
| ||||
| ||||
|
|
1) Could you explain the rules to judge "near duplicates". I couldn't understand the rules from your examples. |
|
2) You showed... near: 301 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT BLAU in L (eBay Shop jeanspoint74) 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74) near: 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74) 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT WEISS in M (eBay Shop jeanspoint74) In the two example, following text is common. 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74) Is it mean follwing two texts are near? near: 301 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT BLAU in L (eBay Shop jeanspoint74) 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT WEISS in M (eBay Shop jeanspoint74) yes they are near: same article in different sizes and colors (even in |
|
3) Anyhow, SOUNDEX came in my mind first. But, SOUNDEX may not be a solution. Because it returns different results for your second case. |
|
Like this: ------------------------------ Commands Entered ------------------------------ SELECT k , CAST( CASE FLOOR(k) WHEN k THEN text ELSE SOUNDEX(text) END AS VARCHAR(10) ) "Judgement" FROM (VALUES (1.0, 'not near:' ) ,(1.1, 'Rugby Polo - black/white - S; (Angebot von Kabelmeister)' ) ,(1.2, 'Rugby Shirt Striped - aqua/white - S; (Angebot von Kabelmeister)' ) ,(2.0, 'near:' ) ,(2.1, 'Rugby Shirt Striped - aqua/white - S; (Angebot von Kabelmeister)' ) ,(2.2, 'Shirt Striped - aqua/white - S; (Angebot von)' ) ,(3.0, 'near:' ) ,(3.1, '301 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT BLAU in L (eBay Shop jeanspoint74)' ) ,(3.2, '482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74)' ) ,(4.0, 'near:' ) ,(4.1, '482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74)' ) ,(4.2, '482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT WEISS in M (eBay Shop eanspoint74)' ) ) T(k, text) ; ------------------------------------------------------------------------------ K Judgement ---- ---------- 1.0 not near: 1.1 R211 1.2 R212 2.0 near: 2.1 R212 2.2 S632 3.0 near: 3.1 L621 3.2 L621 4.0 near: 4.1 L621 4.2 L621 12 record(s) selected. |
#6
| ||||
| ||||
|
|
1) Could you explain the rules to judge "near duplicates". I couldn't understand the rules from your examples. |
|
2) You showed... near: 301 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT BLAU in L (eBay Shop jeanspoint74) 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74) near: 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74) 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT WEISS in M (eBay Shop jeanspoint74) In the two example, following text is common. 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74) Is it mean follwing two texts are near? near: 301 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT BLAU in L (eBay Shop jeanspoint74) 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT WEISS in M (eBay Shop jeanspoint74) yes they are near: same article in different sizes and colors (even in |
|
3) Anyhow, SOUNDEX came in my mind first. But, SOUNDEX may not be a solution. Because it returns different results for your second case. |
|
Like this: ------------------------------ Commands Entered ------------------------------ SELECT k , CAST( CASE FLOOR(k) WHEN k THEN text ELSE SOUNDEX(text) END AS VARCHAR(10) ) "Judgement" FROM (VALUES (1.0, 'not near:' ) ,(1.1, 'Rugby Polo - black/white - S; (Angebot von Kabelmeister)' ) ,(1.2, 'Rugby Shirt Striped - aqua/white - S; (Angebot von Kabelmeister)' ) ,(2.0, 'near:' ) ,(2.1, 'Rugby Shirt Striped - aqua/white - S; (Angebot von Kabelmeister)' ) ,(2.2, 'Shirt Striped - aqua/white - S; (Angebot von)' ) ,(3.0, 'near:' ) ,(3.1, '301 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT BLAU in L (eBay Shop jeanspoint74)' ) ,(3.2, '482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74)' ) ,(4.0, 'near:' ) ,(4.1, '482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74)' ) ,(4.2, '482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT WEISS in M (eBay Shop eanspoint74)' ) ) T(k, text) ; ------------------------------------------------------------------------------ K Judgement ---- ---------- 1.0 not near: 1.1 R211 1.2 R212 2.0 near: 2.1 R212 2.2 S632 3.0 near: 3.1 L621 3.2 L621 4.0 near: 4.1 L621 4.2 L621 12 record(s) selected. |
#7
| ||||
| ||||
|
|
1) Could you explain the rules to judge "near duplicates". I couldn't understand the rules from your examples. |
|
2) You showed... near: 301 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT BLAU in L (eBay Shop jeanspoint74) 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74) near: 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74) 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT WEISS in M (eBay Shop jeanspoint74) In the two example, following text is common. 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74) Is it mean follwing two texts are near? near: 301 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT BLAU in L (eBay Shop jeanspoint74) 482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT WEISS in M (eBay Shop jeanspoint74) yes they are near: same article in different sizes and colors (even in |
|
3) Anyhow, SOUNDEX came in my mind first. But, SOUNDEX may not be a solution. Because it returns different results for your second case. |
|
Like this: ------------------------------ Commands Entered ------------------------------ SELECT k , CAST( CASE FLOOR(k) WHEN k THEN text ELSE SOUNDEX(text) END AS VARCHAR(10) ) "Judgement" FROM (VALUES (1.0, 'not near:' ) ,(1.1, 'Rugby Polo - black/white - S; (Angebot von Kabelmeister)' ) ,(1.2, 'Rugby Shirt Striped - aqua/white - S; (Angebot von Kabelmeister)' ) ,(2.0, 'near:' ) ,(2.1, 'Rugby Shirt Striped - aqua/white - S; (Angebot von Kabelmeister)' ) ,(2.2, 'Shirt Striped - aqua/white - S; (Angebot von)' ) ,(3.0, 'near:' ) ,(3.1, '301 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT BLAU in L (eBay Shop jeanspoint74)' ) ,(3.2, '482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74)' ) ,(4.0, 'near:' ) ,(4.1, '482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74)' ) ,(4.2, '482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT WEISS in M (eBay Shop eanspoint74)' ) ) T(k, text) ; ------------------------------------------------------------------------------ K Judgement ---- ---------- 1.0 not near: 1.1 R211 1.2 R212 2.0 near: 2.1 R212 2.2 S632 3.0 near: 3.1 L621 3.2 L621 4.0 near: 4.1 L621 4.2 L621 12 record(s) selected. |
#8
| |||
| |||
|
#9
| |||
| |||
|
#10
| |||
| |||
|
![]() |
| Thread Tools | |
| Display Modes | |
| |