dbTalk Databases Forums  

Partial Fuzzy Lookup?

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss Partial Fuzzy Lookup? in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
dtbascent
 
Posts: n/a

Default Partial Fuzzy Lookup? - 06-03-2008 , 12:02 PM






I need to do a partial fuzzy lookup in SSIS. This means that some columns
need to match exactly, while others are fuzzy matched. Picture an incoming
data stream T1 with two columns T1_C1 and T1_C2, and a lookup table T2 with
columns T2_C1 and T2_C2. Stream T1 has 16,000 rows, and lookup T2 has
215,000 rows. I want to tell the system to fuzzy match T1_C2 to T2_C2 with a
threshold of 80%, but only where T1_C1 and T2_C1 match exactly (100%).

That's the basic problem. To complicate this slightly, T1_C1 and T2_C1
actually will be matched using a SQL "LIKE" operator. T1_C1 contains values,
and T2_C1 contains a pattern which we store. FYI, I'm trying this in SQL
2008, though I don't believe there's been any change in fuzzy capabilities.

Reply With Quote
  #2  
Old   
jhofmeyr@googlemail.com
 
Posts: n/a

Default Re: Partial Fuzzy Lookup? - 06-04-2008 , 04:18 AM






Hi dtbascent,

The best method would be to create a view (V1) on top of T2, which
decomposed the T2_C1 "pattern" into a column that will exactly match
T1_C1. Then you can simply do a fuzzy lookup with exact match on
T1_C1 = V1_C1

If this is not possible I think you'll probably have to split this
into two tasks - 1 regular lookup and 1 fuzzy lookup. The bad news is
the only way to do "LIKE" checks (afaik) is to turn on memory
restrictions in the regular lookup task, and write your own SQL
statement. This will mean that your lookup matches are done row-at-a-
time instead of buffered and done in memory. The number of rows
you're working with isn't too high though, so you might be alright.

Once you've retrieved the T2_C1 match rows, you can feed them into
your fuzzy lookup and do an exact match on that column.

Good luck!
J

Reply With Quote
  #3  
Old   
jhofmeyr@googlemail.com
 
Posts: n/a

Default Re: Partial Fuzzy Lookup? - 06-04-2008 , 04:18 AM



Hi dtbascent,

The best method would be to create a view (V1) on top of T2, which
decomposed the T2_C1 "pattern" into a column that will exactly match
T1_C1. Then you can simply do a fuzzy lookup with exact match on
T1_C1 = V1_C1

If this is not possible I think you'll probably have to split this
into two tasks - 1 regular lookup and 1 fuzzy lookup. The bad news is
the only way to do "LIKE" checks (afaik) is to turn on memory
restrictions in the regular lookup task, and write your own SQL
statement. This will mean that your lookup matches are done row-at-a-
time instead of buffered and done in memory. The number of rows
you're working with isn't too high though, so you might be alright.

Once you've retrieved the T2_C1 match rows, you can feed them into
your fuzzy lookup and do an exact match on that column.

Good luck!
J

Reply With Quote
  #4  
Old   
jhofmeyr@googlemail.com
 
Posts: n/a

Default Re: Partial Fuzzy Lookup? - 06-04-2008 , 04:18 AM



Hi dtbascent,

The best method would be to create a view (V1) on top of T2, which
decomposed the T2_C1 "pattern" into a column that will exactly match
T1_C1. Then you can simply do a fuzzy lookup with exact match on
T1_C1 = V1_C1

If this is not possible I think you'll probably have to split this
into two tasks - 1 regular lookup and 1 fuzzy lookup. The bad news is
the only way to do "LIKE" checks (afaik) is to turn on memory
restrictions in the regular lookup task, and write your own SQL
statement. This will mean that your lookup matches are done row-at-a-
time instead of buffered and done in memory. The number of rows
you're working with isn't too high though, so you might be alright.

Once you've retrieved the T2_C1 match rows, you can feed them into
your fuzzy lookup and do an exact match on that column.

Good luck!
J

Reply With Quote
  #5  
Old   
jhofmeyr@googlemail.com
 
Posts: n/a

Default Re: Partial Fuzzy Lookup? - 06-04-2008 , 04:18 AM



Hi dtbascent,

The best method would be to create a view (V1) on top of T2, which
decomposed the T2_C1 "pattern" into a column that will exactly match
T1_C1. Then you can simply do a fuzzy lookup with exact match on
T1_C1 = V1_C1

If this is not possible I think you'll probably have to split this
into two tasks - 1 regular lookup and 1 fuzzy lookup. The bad news is
the only way to do "LIKE" checks (afaik) is to turn on memory
restrictions in the regular lookup task, and write your own SQL
statement. This will mean that your lookup matches are done row-at-a-
time instead of buffered and done in memory. The number of rows
you're working with isn't too high though, so you might be alright.

Once you've retrieved the T2_C1 match rows, you can feed them into
your fuzzy lookup and do an exact match on that column.

Good luck!
J

Reply With Quote
  #6  
Old   
jhofmeyr@googlemail.com
 
Posts: n/a

Default Re: Partial Fuzzy Lookup? - 06-04-2008 , 04:18 AM



Hi dtbascent,

The best method would be to create a view (V1) on top of T2, which
decomposed the T2_C1 "pattern" into a column that will exactly match
T1_C1. Then you can simply do a fuzzy lookup with exact match on
T1_C1 = V1_C1

If this is not possible I think you'll probably have to split this
into two tasks - 1 regular lookup and 1 fuzzy lookup. The bad news is
the only way to do "LIKE" checks (afaik) is to turn on memory
restrictions in the regular lookup task, and write your own SQL
statement. This will mean that your lookup matches are done row-at-a-
time instead of buffered and done in memory. The number of rows
you're working with isn't too high though, so you might be alright.

Once you've retrieved the T2_C1 match rows, you can feed them into
your fuzzy lookup and do an exact match on that column.

Good luck!
J

Reply With Quote
  #7  
Old   
jhofmeyr@googlemail.com
 
Posts: n/a

Default Re: Partial Fuzzy Lookup? - 06-04-2008 , 04:18 AM



Hi dtbascent,

The best method would be to create a view (V1) on top of T2, which
decomposed the T2_C1 "pattern" into a column that will exactly match
T1_C1. Then you can simply do a fuzzy lookup with exact match on
T1_C1 = V1_C1

If this is not possible I think you'll probably have to split this
into two tasks - 1 regular lookup and 1 fuzzy lookup. The bad news is
the only way to do "LIKE" checks (afaik) is to turn on memory
restrictions in the regular lookup task, and write your own SQL
statement. This will mean that your lookup matches are done row-at-a-
time instead of buffered and done in memory. The number of rows
you're working with isn't too high though, so you might be alright.

Once you've retrieved the T2_C1 match rows, you can feed them into
your fuzzy lookup and do an exact match on that column.

Good luck!
J

Reply With Quote
  #8  
Old   
jhofmeyr@googlemail.com
 
Posts: n/a

Default Re: Partial Fuzzy Lookup? - 06-04-2008 , 04:18 AM



Hi dtbascent,

The best method would be to create a view (V1) on top of T2, which
decomposed the T2_C1 "pattern" into a column that will exactly match
T1_C1. Then you can simply do a fuzzy lookup with exact match on
T1_C1 = V1_C1

If this is not possible I think you'll probably have to split this
into two tasks - 1 regular lookup and 1 fuzzy lookup. The bad news is
the only way to do "LIKE" checks (afaik) is to turn on memory
restrictions in the regular lookup task, and write your own SQL
statement. This will mean that your lookup matches are done row-at-a-
time instead of buffered and done in memory. The number of rows
you're working with isn't too high though, so you might be alright.

Once you've retrieved the T2_C1 match rows, you can feed them into
your fuzzy lookup and do an exact match on that column.

Good luck!
J

Reply With Quote
  #9  
Old   
jhofmeyr@googlemail.com
 
Posts: n/a

Default Re: Partial Fuzzy Lookup? - 06-04-2008 , 04:18 AM



Hi dtbascent,

The best method would be to create a view (V1) on top of T2, which
decomposed the T2_C1 "pattern" into a column that will exactly match
T1_C1. Then you can simply do a fuzzy lookup with exact match on
T1_C1 = V1_C1

If this is not possible I think you'll probably have to split this
into two tasks - 1 regular lookup and 1 fuzzy lookup. The bad news is
the only way to do "LIKE" checks (afaik) is to turn on memory
restrictions in the regular lookup task, and write your own SQL
statement. This will mean that your lookup matches are done row-at-a-
time instead of buffered and done in memory. The number of rows
you're working with isn't too high though, so you might be alright.

Once you've retrieved the T2_C1 match rows, you can feed them into
your fuzzy lookup and do an exact match on that column.

Good luck!
J

Reply With Quote
  #10  
Old   
dtbascent
 
Posts: n/a

Default RE: Partial Fuzzy Lookup? - 06-04-2008 , 03:54 PM



I discovered that the basic functionality I need is supported in the fuzzy
lookup task. You can specify requiring an exact match on some columns, and,
at the same time, a fuzzy match on others. To handle the "Like", I
essentially did create a view that did the like matching first and tied the
matching key identifiers in both tables. Then in the fuzzy look up I exact
matched on the keys and fuzzy matched on the remaining columns.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.