![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Operating System: Windows 2000 Server Server Brand: DELL (Intel Penium 111, 1133 MHz) Sql Server Standadrd Edition Ram: 2GB Processor: 1 Fulltext Index Size: 52MB Number Of Rows In Table: 1.2 Million Aprox SQL SERVER MEMORY SETTINGS: FIXED, 1945 MB MINIMUM QUERY MEMORY SETTING: 1048576 KB Average Records Processing Per Minute : 7.2 Average Records Processing Per Hour : 72 Target NUmber Of Rows: 1.1 Million stored procedure i am using is used to remove duplications from the contacts list, I altered the few parts of stored procedure and checked every statement using profiler,the only one taking long time to execute is the FULL TEXT SEARCHES one, SELECT Contact_ID From Contact22 con WHERE con.Gender = @vOuterGender AND (Con.Reported = 0 or con.reported is null) AND Contains(con.zip,@vOuterZip) AND Freetext(Address, @vOuterAddress) AND Freetext(Con.Full_Name,@vOuterName) as i mentioned that I have 1.1 million rows table on which fulltext is enabled on three cloumns Address,zip and Fullname (total size after full population is 52MB) so what all my stored procedure does is take every record from the contact table and matches it against others presnet in the table(1.1 million rows) using fulltext searches, now these full text searches are taking time!!! Currrenlty with all the settings I mentioned above my virtual memory is set to 1.5 times then the actual RAM, my database and full text indexes are on the same drive(thats the only choice) and it is execting at the speed of 20000 rows in 24 hours which means it required almost 55 days to finish, unacceptable..... can any one tell me that how to improve the performance in a senario like mine for full text seaches? or is it the standard speed every one gets from full text searches ? |
#3
| |||
| |||
|
|
Thanks for the reply Hilary, Yes I am using this procedure to catch possible duplicates, as our web site allows a customer to enter its details more then one time, this leads to duplicate records, and they can have transactions related against different customers ID's, My goal is to give marketing department a clean mailing list for their mail shots and merge related records of customers (as a customer can be involved in differ rent processes under different ID's) so they can query using different criteria's, for this purpose I have created Full Text Indexes on 3 columns of the table in which My Contact information is present, Address Zip FullName Now this table contains multiple entries for one customer ( as it is allowed by design and according to business requirements) Example: CUTOMER ID NAME ADDRESS ZIP 1 xyz 88 Corp Road SE4 1FG 2 xyz 88 Corp RD SE4 1 FG Note: Road and RD, SE4 1NF and SE4 1 NF so My try is to catch these matched and other records through full text search, query through their related records in other tables Like their orders, brochure requests and their participations in different events and create two final tables FinalContact (Master) DenormalizedContactDetails (Detail) With FinalContact populated with unique customers(no duplications) and DenormalizedContactDetails with the activity related detail along with their duplicates ID's, I have tried this stored procedure with sample records and it works really fine and I am getting my desired output, the only problem is that full text search under cursor loop is taking to much time. As you can see in the store procedure that I consolidates the matches as well. As I mentioned in records example that data quality for Zip is not good so I have to use Contain for it We are already dealing with a company to remove duplication from our CRM, but I am just working on providing mailing list so I have to deal with it as quick as possible STORE PROCEDURE: IF EXISTS (SELECT name FROM sysobjects WHERE name = 'stp_PopulateDedupedContactsAlteredWithDetail' AND type = 'P') DROP PROCEDURE stp_PopulateDedupedContactsAlteredWithDetail GO CREATE PROCEDURE stp_PopulateDedupedContactsAlteredWithDetail AS SET QUOTED_IDENTIFIER OFF SET NOCOUNT ON DECLARE @vOuterContactID binary(8) DECLARE @vOuterAddress varchar(50) DECLARE @vOuterZip varchar(14) DECLARE @vOuterName varchar(30) DECLARE @vOuterGender varchar(6) DECLARE @vOuterZipTemp varchar(14) DECLARE @vContactID binary(8) DECLARE @vPriorityCode int DECLARE @vTitle varchar(9) DECLARE @vFirstName varchar(30) DECLARE @vLastName varchar(30) DECLARE @vCity varchar(40) DECLARE @vState varchar(30) DECLARE @vCountry varchar(35) DECLARE @vEmail varchar(50) DECLARE @vPhone varchar(25) DECLARE @vCell varchar(25) DECLARE @vDoNotMail tinyint DECLARE @vDoNotMobile tinyint DECLARE @vDoNotAll tinyint DECLARE @vDoNotEmail tinyint DECLARE @vDPAConsent tinyint DECLARE @vDPAConsentTP tinyint DECLARE @vDPAQasked tinyint Create Table #tTemp ( DedupId int IDENTITY (1, 1) NOT NULL, Address varchar(50), Zip varchar(12), FullName varchar(30), NTimes int, Gender varchar(10), Reported tinyint NULL ) Insert into #tTemp select con.Address, con.ZIP, con.full_name As "Full Name", COUNT(con1.contact_Id) as "Number of Times In Contact Table", con.Gender from contact22 con FUll Outer JOIN Contact22 con1 ON con.zip = con1.zip and con.address = con1.address and con.full_name = con1.full_name WHERE con.contact_id = con1.contact_Id AND con.reported = 0 group by con.Address,con.zip,con.full_name,Con.Gender order by Con.Address,COUNT(con1.contact_Id) desc Update #tTemp SET Reported = 0 CREATE NONCLUSTERED INDEX Temp_dedup_Index ON #tTemp (DedupId) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Address_Index ON #tTemp (Address) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Zip_Index ON #tTemp (Zip) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_FullName_Index ON #tTemp (FullName) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Reported_Index ON #tTemp (Reported) WITH FILLFACTOR = 100 DECLARE @vLoop tinyint SET @vLoop = 0 WHILE @vLoop =0 BEGIN IF EXISTS (select count(contact_id) from contact22 where Reported= 0) BEGIN DECLARE cr_OuterLoop CURSOR FOR SELECT TOP 1 Address,ZIP,FullName,Gender FROM #tTemp where Reported = 0 OPEN cr_OuterLoop FETCH NEXT FROM cr_OuterLoop INTO @vOuterAddress,@vOuterZip,@vOuterName,@vOuterGende r WHILE (@@FETCH_STATUS <> -1) BEGIN SET @vOuterZipTemp = @vOuterZip SET @vOuterZip = '"' + @vOuterZip + '"' IF (@@FETCH_STATUS <> -2) -- SET @vTemp =(select count(DedupId) from #tTemp where Reported reported = 0) -- Update DedupProcessesExecutionTime SET TimesExecuted = TimesExecuted + 1 --, Reported = @vTemp DELETE FROM TempDdpSup INSERT INTO TempDdpSup SELECT Contact_ID From Contact22 con WHERE con.Gender = @vOuterGender AND (Con.Reported = 0 or con.reported is null) AND Contains(con.zip,@vOuterZip) AND Freetext(Address, @vOuterAddress) AND Freetext(Con.Full_Name,@vOuterName) SELECT TOP 1 @vContactID = Con.Contact_Id,@vPriorityCode = Con.Priority_Code,@vTitle = Con.Title,@vFirstName = Con.First_Name, @vLastName = Con.Last_Name,@vCity = Con.City,@vState = Con.State_,@vCountry=Con.Country, @vEmail = Con.Email,@vPhone = Con.Phone,@vCell = Con.Cell,@vDoNotEmail = Con.Do_Not_Use_Email, @vDoNotMobile = Con.Do_Not_Use_Mobile, @vDoNotAll = Con.Do_Not_Use_All, @vDoNotMail = Con.Do_Not_Mail,@vDPAConsent = Con.DPA_Consent, @vDPAConsentTP = Con.DPA_Consent_TP,@vDPAQasked = Con.DPA_QAsked FROM Contact22 Con WHERE Con.Contact_id In (SELECT contact_id FROm TempDdpSup) ORDER BY Con.Rn_Create_Date DESC INSERT INTO FinalContact SELECT TOP 1 @vContactID,@vPriorityCode,@vTitle,@vFirstName,@vL astName,@vOuterName,@vOute |
| @vState,@vCountry,@vEmail,@vPhone,@vCell,@vDoNotEm ail,@vDoNotMobile,@vDoNotA |
|
@vDPAConsentTP,@vDPAQasked, NULL,NULL,0, MAX(Ord.Order_Date) AS RecentPurchaseDate, CASE Ord.Status WHEN 'Payment Taken' THEN Count(Ord.Order__Id) ELSE 0 END AS NumberOfOrders, -- MAX(Bro.Sent_Date) AS RecentBrochureDate, MAX(Book.Start_Date) AS RecentparticipantDate, MAX(Bro.Request_Date) AS RecentBrochureRequestDate, 0 FROM TempDdpSup Con LEFT OUTER JOIN Order_ Ord ON Con.Contact_Id = Ord.Bill_To_Contact_Id LEFT OUTER JOIN Brochure Bro ON Con.Contact_Id = Bro.Contact_Id LEFT OUTER JOIN Booking Book ON Con.Contact_Id = Book.Contact_Id GROUP BY Ord.Status ORDER BY MAX(Ord.Order_Date) DESC, -- MAX(Bro.Sent_Date) DESC, MAX(Book.Start_Date), MAX(Bro.Request_Date) --update #tTemp so next time looped row will not be selected UPDATE #tTemp SET Reported = 1 WHERE zip = @vOuterZipTemp AND Address=@vOuterAddress AND FullName =@vOuterName AND Gender = @vOuterGender --Update contact2 so next time these rows will not be selected UPDATE Contact22 SET Reported = 1 WHERE Contact_Id IN (SELECT contact_id FROm TempDdpSup) --CODE FOR CONTACT DEATIL STARTS HERE INSERT INTO DenormalizedContactDetails SELECT @vContactID ,TempCon.contact_Id,Ord.Order_Date,Cha.Channel_Ide ntifier, NULL,NULL,Ord.Sub_Total FROM TempDdpSup TempCon INNER JOIN Order_ Ord ON TempCon.Contact_Id = Ord.Bill_To_Contact_Id INNER JOIN RLD_Order_Channels Cha ON Ord.Order_Channel_Id = Cha.RLD_Order_Channels_Id --insert related Brochure INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL,NULL, Bro.Request_Date,0 FROM TempDdpSup TempCon INNER JOIN Brochure Bro ON TempCon.Contact_Id = Bro.Contact_Id --insert related bookings (participent) INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL, Book.Start_Date,NULL,0 FROM TempDdpSup TempCon INNER JOIN Booking Book ON TempCon.Contact_Id = Book.Contact_Id /* --insert related recipient dates INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL, NULL,NULL,OrdD.Rn_Create_Date,0 FROM TempDdpSup TempCon INNER JOIN _Order_Detail OrdD ON TempCon.Contact_Id = OrdD.Recipient */ --CODE FOR CONTACT DEATIL ENDS HERE FETCH NEXT FROM cr_OuterLoop INTO @vOuterAddress,@vOuterZip,@vOuterName,@vOuterGende r END CLOSE cr_OuterLoop DEALLOCATE cr_OuterLoop END ELSE BEGIN SET @vLoop = 1 END END GO I open for all the suggestions Ahsan "Hilary Cotter" wrote: I'm not sure if full text search is the best tool for what you are doing. The type of fuzzy search that SQL FTS uses while doing FreeText searches will do linguistic stemming and expand the search to singular and plural forms of the search phrase. So it seems to me you are doing a freetext search to match Mr. John Q. Public with J.Q.Public, J. Q. Public, John Public, etc. It simply won't work for address and name. It also seems that you are executing the query once for each entry you have in your table. After you return matches, do you prune/consolidate these matches from the table? I also don't understand why you are doing a Contains query on the zip code? Surely this should be an equality match or perhaps a match with left(zip,5) =left(@vOuterZip, 5), or perhaps a case statement so you could avoid the left(zip,5) and merely search on zip. Lastly I strongly urge you to have a look at commercial products which do exactly what you are looking for and have logic built into them to handle all the possible cases of street name combination, or name variants, as well as compensating for spelling errors, like the product in the below link. http://www.name-searching.com/Correct_Address.html -- Hilary Cotter Looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html "Ahsan" <Ahsan (AT) discussions (DOT) microsoft.com> wrote in message news:3DB4A82E-023B-43D5-BE38-042B41846B71 (AT) microsoft (DOT) com... Operating System: Windows 2000 Server Server Brand: DELL (Intel Penium 111, 1133 MHz) Sql Server Standadrd Edition Ram: 2GB Processor: 1 Fulltext Index Size: 52MB Number Of Rows In Table: 1.2 Million Aprox SQL SERVER MEMORY SETTINGS: FIXED, 1945 MB MINIMUM QUERY MEMORY SETTING: 1048576 KB Average Records Processing Per Minute : 7.2 Average Records Processing Per Hour : 72 Target NUmber Of Rows: 1.1 Million stored procedure i am using is used to remove duplications from the contacts list, I altered the few parts of stored procedure and checked every statement using profiler,the only one taking long time to execute is the FULL TEXT SEARCHES one, SELECT Contact_ID From Contact22 con WHERE con.Gender = @vOuterGender AND (Con.Reported = 0 or con.reported is null) AND Contains(con.zip,@vOuterZip) AND Freetext(Address, @vOuterAddress) AND Freetext(Con.Full_Name,@vOuterName) as i mentioned that I have 1.1 million rows table on which fulltext is enabled on three cloumns Address,zip and Fullname (total size after full population is 52MB) so what all my stored procedure does is take every record from the contact table and matches it against others presnet in the table(1.1 million rows) using fulltext searches, now these full text searches are taking time!!! Currrenlty with all the settings I mentioned above my virtual memory is set to 1.5 times then the actual RAM, my database and full text indexes are on the same drive(thats the only choice) and it is execting at the speed of 20000 rows in 24 hours which means it required almost 55 days to finish, unacceptable..... can any one tell me that how to improve the performance in a senario like mine for full text seaches? or is it the standard speed every one gets from full text searches ? |
#4
| |||
| |||
|
|
this is absolutely the wrong way to do this. It will never match rd with road, or street with st or steert with street, ave with avenue, John Smith, with J Smith, etc. The strategy for handling this sort of a problem is to first clean up your data, by changing all occurences of St. or st, to Street, similar with Road, Avenue, Highway, etc. You will have to do something to put the names into a common format. After you have done this, you should start to group data by zip code, and then examine the groupings to scrub further and remove duplicates. Then group by city, examine the groupings and scrub further and remove duplicates. Then group by street and scrub further and remove duplicates. Then you can do the same procedure on last name to try to find people with the same names, but who have moved. Again I believe SQL FTS is the wrong tool for this, and there are commercial programs out there which can do much better jobs faster then you could. -- Hilary Cotter Looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html "Ahsan" <Ahsan (AT) discussions (DOT) microsoft.com> wrote in message news:CB88EA56-710F-43E8-AFC1-BEDD35CD4A50 (AT) microsoft (DOT) com... Thanks for the reply Hilary, Yes I am using this procedure to catch possible duplicates, as our web site allows a customer to enter its details more then one time, this leads to duplicate records, and they can have transactions related against different customers ID's, My goal is to give marketing department a clean mailing list for their mail shots and merge related records of customers (as a customer can be involved in differ rent processes under different ID's) so they can query using different criteria's, for this purpose I have created Full Text Indexes on 3 columns of the table in which My Contact information is present, Address Zip FullName Now this table contains multiple entries for one customer ( as it is allowed by design and according to business requirements) Example: CUTOMER ID NAME ADDRESS ZIP 1 xyz 88 Corp Road SE4 1FG 2 xyz 88 Corp RD SE4 1 FG Note: Road and RD, SE4 1NF and SE4 1 NF so My try is to catch these matched and other records through full text search, query through their related records in other tables Like their orders, brochure requests and their participations in different events and create two final tables FinalContact (Master) DenormalizedContactDetails (Detail) With FinalContact populated with unique customers(no duplications) and DenormalizedContactDetails with the activity related detail along with their duplicates ID's, I have tried this stored procedure with sample records and it works really fine and I am getting my desired output, the only problem is that full text search under cursor loop is taking to much time. As you can see in the store procedure that I consolidates the matches as well. As I mentioned in records example that data quality for Zip is not good so I have to use Contain for it We are already dealing with a company to remove duplication from our CRM, but I am just working on providing mailing list so I have to deal with it as quick as possible STORE PROCEDURE: IF EXISTS (SELECT name FROM sysobjects WHERE name = 'stp_PopulateDedupedContactsAlteredWithDetail' AND type = 'P') DROP PROCEDURE stp_PopulateDedupedContactsAlteredWithDetail GO CREATE PROCEDURE stp_PopulateDedupedContactsAlteredWithDetail AS SET QUOTED_IDENTIFIER OFF SET NOCOUNT ON DECLARE @vOuterContactID binary(8) DECLARE @vOuterAddress varchar(50) DECLARE @vOuterZip varchar(14) DECLARE @vOuterName varchar(30) DECLARE @vOuterGender varchar(6) DECLARE @vOuterZipTemp varchar(14) DECLARE @vContactID binary(8) DECLARE @vPriorityCode int DECLARE @vTitle varchar(9) DECLARE @vFirstName varchar(30) DECLARE @vLastName varchar(30) DECLARE @vCity varchar(40) DECLARE @vState varchar(30) DECLARE @vCountry varchar(35) DECLARE @vEmail varchar(50) DECLARE @vPhone varchar(25) DECLARE @vCell varchar(25) DECLARE @vDoNotMail tinyint DECLARE @vDoNotMobile tinyint DECLARE @vDoNotAll tinyint DECLARE @vDoNotEmail tinyint DECLARE @vDPAConsent tinyint DECLARE @vDPAConsentTP tinyint DECLARE @vDPAQasked tinyint Create Table #tTemp ( DedupId int IDENTITY (1, 1) NOT NULL, Address varchar(50), Zip varchar(12), FullName varchar(30), NTimes int, Gender varchar(10), Reported tinyint NULL ) Insert into #tTemp select con.Address, con.ZIP, con.full_name As "Full Name", COUNT(con1.contact_Id) as "Number of Times In Contact Table", con.Gender from contact22 con FUll Outer JOIN Contact22 con1 ON con.zip = con1.zip and con.address = con1.address and con.full_name = con1.full_name WHERE con.contact_id = con1.contact_Id AND con.reported = 0 group by con.Address,con.zip,con.full_name,Con.Gender order by Con.Address,COUNT(con1.contact_Id) desc Update #tTemp SET Reported = 0 CREATE NONCLUSTERED INDEX Temp_dedup_Index ON #tTemp (DedupId) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Address_Index ON #tTemp (Address) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Zip_Index ON #tTemp (Zip) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_FullName_Index ON #tTemp (FullName) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Reported_Index ON #tTemp (Reported) WITH FILLFACTOR = 100 DECLARE @vLoop tinyint SET @vLoop = 0 WHILE @vLoop =0 BEGIN IF EXISTS (select count(contact_id) from contact22 where Reported= 0) BEGIN DECLARE cr_OuterLoop CURSOR FOR SELECT TOP 1 Address,ZIP,FullName,Gender FROM #tTemp where Reported = 0 OPEN cr_OuterLoop FETCH NEXT FROM cr_OuterLoop INTO @vOuterAddress,@vOuterZip,@vOuterName,@vOuterGende r WHILE (@@FETCH_STATUS <> -1) BEGIN SET @vOuterZipTemp = @vOuterZip SET @vOuterZip = '"' + @vOuterZip + '"' IF (@@FETCH_STATUS <> -2) -- SET @vTemp =(select count(DedupId) from #tTemp where Reported reported = 0) -- Update DedupProcessesExecutionTime SET TimesExecuted = TimesExecuted + 1 --, Reported = @vTemp DELETE FROM TempDdpSup INSERT INTO TempDdpSup SELECT Contact_ID From Contact22 con WHERE con.Gender = @vOuterGender AND (Con.Reported = 0 or con.reported is null) AND Contains(con.zip,@vOuterZip) AND Freetext(Address, @vOuterAddress) AND Freetext(Con.Full_Name,@vOuterName) SELECT TOP 1 @vContactID = Con.Contact_Id,@vPriorityCode = Con.Priority_Code,@vTitle = Con.Title,@vFirstName = Con.First_Name, @vLastName = Con.Last_Name,@vCity = Con.City,@vState = Con.State_,@vCountry=Con.Country, @vEmail = Con.Email,@vPhone = Con.Phone,@vCell = Con.Cell,@vDoNotEmail = Con.Do_Not_Use_Email, @vDoNotMobile = Con.Do_Not_Use_Mobile, @vDoNotAll = Con.Do_Not_Use_All, @vDoNotMail = Con.Do_Not_Mail,@vDPAConsent = Con.DPA_Consent, @vDPAConsentTP = Con.DPA_Consent_TP,@vDPAQasked = Con.DPA_QAsked FROM Contact22 Con WHERE Con.Contact_id In (SELECT contact_id FROm TempDdpSup) ORDER BY Con.Rn_Create_Date DESC INSERT INTO FinalContact SELECT TOP 1 @vContactID,@vPriorityCode,@vTitle,@vFirstName,@vL astName,@vOuterName,@vOute rGender,@vOuterAddress,@vOuterZipTemp,@vCity, @vState,@vCountry,@vEmail,@vPhone,@vCell,@vDoNotEm ail,@vDoNotMobile,@vDoNotA ll,@vDoNotMail,@vDPAConsent, @vDPAConsentTP,@vDPAQasked, NULL,NULL,0, MAX(Ord.Order_Date) AS RecentPurchaseDate, CASE Ord.Status WHEN 'Payment Taken' THEN Count(Ord.Order__Id) ELSE 0 END AS NumberOfOrders, -- MAX(Bro.Sent_Date) AS RecentBrochureDate, MAX(Book.Start_Date) AS RecentparticipantDate, MAX(Bro.Request_Date) AS RecentBrochureRequestDate, 0 FROM TempDdpSup Con LEFT OUTER JOIN Order_ Ord ON Con.Contact_Id = Ord.Bill_To_Contact_Id LEFT OUTER JOIN Brochure Bro ON Con.Contact_Id = Bro.Contact_Id LEFT OUTER JOIN Booking Book ON Con.Contact_Id = Book.Contact_Id GROUP BY Ord.Status ORDER BY MAX(Ord.Order_Date) DESC, -- MAX(Bro.Sent_Date) DESC, MAX(Book.Start_Date), MAX(Bro.Request_Date) --update #tTemp so next time looped row will not be selected UPDATE #tTemp SET Reported = 1 WHERE zip = @vOuterZipTemp AND Address=@vOuterAddress AND FullName =@vOuterName AND Gender = @vOuterGender --Update contact2 so next time these rows will not be selected UPDATE Contact22 SET Reported = 1 WHERE Contact_Id IN (SELECT contact_id FROm TempDdpSup) --CODE FOR CONTACT DEATIL STARTS HERE INSERT INTO DenormalizedContactDetails SELECT @vContactID ,TempCon.contact_Id,Ord.Order_Date,Cha.Channel_Ide ntifier, NULL,NULL,Ord.Sub_Total FROM TempDdpSup TempCon INNER JOIN Order_ Ord ON TempCon.Contact_Id = Ord.Bill_To_Contact_Id INNER JOIN RLD_Order_Channels Cha ON Ord.Order_Channel_Id = Cha.RLD_Order_Channels_Id --insert related Brochure INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL,NULL, Bro.Request_Date,0 FROM TempDdpSup TempCon INNER JOIN Brochure Bro ON TempCon.Contact_Id = Bro.Contact_Id --insert related bookings (participent) INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL, Book.Start_Date,NULL,0 FROM TempDdpSup TempCon INNER JOIN Booking Book ON TempCon.Contact_Id = Book.Contact_Id /* --insert related recipient dates INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL, NULL,NULL,OrdD.Rn_Create_Date,0 FROM TempDdpSup TempCon INNER JOIN _Order_Detail OrdD ON TempCon.Contact_Id = OrdD.Recipient */ --CODE FOR CONTACT DEATIL ENDS HERE FETCH NEXT FROM cr_OuterLoop INTO @vOuterAddress,@vOuterZip,@vOuterName,@vOuterGende r END CLOSE cr_OuterLoop DEALLOCATE cr_OuterLoop END ELSE BEGIN SET @vLoop = 1 END END GO I open for all the suggestions Ahsan "Hilary Cotter" wrote: I'm not sure if full text search is the best tool for what you are doing. The type of fuzzy search that SQL FTS uses while doing FreeText search es will do linguistic stemming and expand the search to singular and plural forms of the search phrase. So it seems to me you are doing a freetext search to match Mr. John Q. Public with J.Q.Public, J. Q. Public, John Public, etc. It simply won't work for address and name. It also seems that you are executing the query once for each entry you have in your table. After you return matches, do you prune/consolidate these matches from the table? I also don't understand why you are doing a Contains query on the zip code? Surely this should be an equality match or perhaps a match with left(zip,5) =left(@vOuterZip, 5), or perhaps a case statement so you could avoid the left(zip,5) and merely search on zip. Lastly I strongly urge you to have a look at commercial products which do exactly what you are looking for and have logic built into them to handle all the possible cases of street name combination, or name variants, as well as compensating for spelling errors, like the product in the below link. http://www.name-searching.com/Correct_Address.html -- Hilary Cotter Looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html "Ahsan" <Ahsan (AT) discussions (DOT) microsoft.com> wrote in message news:3DB4A82E-023B-43D5-BE38-042B41846B71 (AT) microsoft (DOT) com... Operating System: Windows 2000 Server Server Brand: DELL (Intel Penium 111, 1133 MHz) Sql Server Standadrd Edition Ram: 2GB Processor: 1 Fulltext Index Size: 52MB Number Of Rows In Table: 1.2 Million Aprox SQL SERVER MEMORY SETTINGS: FIXED, 1945 MB MINIMUM QUERY MEMORY SETTING: 1048576 KB Average Records Processing Per Minute : 7.2 Average Records Processing Per Hour : 72 Target NUmber Of Rows: 1.1 Million stored procedure i am using is used to remove duplications from the contacts list, I altered the few parts of stored procedure and checked every statement using profiler,the only one taking long time to execute is the FULL TEXT SEARCHES one, SELECT Contact_ID From Contact22 con WHERE con.Gender = @vOuterGender AND (Con.Reported = 0 or con.reported is null) AND Contains(con.zip,@vOuterZip) AND Freetext(Address, @vOuterAddress) AND Freetext(Con.Full_Name,@vOuterName) as i mentioned that I have 1.1 million rows table on which fulltext is enabled on three cloumns Address,zip and Fullname (total size after full population is 52MB) so what all my stored procedure does is take every record from the contact table and matches it against others presnet in the table(1.1 million rows) using fulltext searches, now these full text searches are taking time!!! Currrenlty with all the settings I mentioned above my virtual memory is set to 1.5 times then the actual RAM, my database and full text indexes are on the same drive(thats the only choice) and it is execting at the speed of 20000 rows in 24 hours which means it required almost 55 days to finish, unacceptable..... can any one tell me that how to improve the performance in a senario like mine for full text seaches? or is it the standard speed every one gets from full text searches ? |
#5
| |||
| |||
|
|
Ahsan, While I agree with Hilary that SQL FTS is not the best solution for this task of data cleansing, you can use 3rd party software to clean-up and match your SQL based textual data, for example: Intelligent Search Technology - Searching and Matching Software http://www.name-searching.com/ "NameSearch® - Search and matching software for enterprise systems." "CorrectAddress® - Is (US CASS Certified) software that validates, corrects, and enhances address information" "MerlinMerge® Is a sophisticated list management tool that performs duplicate record detection, merge/purge operations and can produce house hold link determination." While I've not (yet) tested this software, you can call their 800 number and get a free 60-day trial to test it out... Regards, John "Hilary Cotter" <hilaryk (AT) att (DOT) net> wrote in message news:ODRfjTriEHA.2848 (AT) TK2MSFTNGP10 (DOT) phx.gbl... this is absolutely the wrong way to do this. It will never match rd with road, or street with st or steert with street, ave with avenue, John Smith, with J Smith, etc. The strategy for handling this sort of a problem is to first clean up your data, by changing all occurences of St. or st, to Street, similar with Road, Avenue, Highway, etc. You will have to do something to put the names into a common format. After you have done this, you should start to group data by zip code, and then examine the groupings to scrub further and remove duplicates. Then group by city, examine the groupings and scrub further and remove duplicates. Then group by street and scrub further and remove duplicates. Then you can do the same procedure on last name to try to find people with the same names, but who have moved. Again I believe SQL FTS is the wrong tool for this, and there are commercial programs out there which can do much better jobs faster then you could. -- Hilary Cotter Looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html "Ahsan" <Ahsan (AT) discussions (DOT) microsoft.com> wrote in message news:CB88EA56-710F-43E8-AFC1-BEDD35CD4A50 (AT) microsoft (DOT) com... Thanks for the reply Hilary, Yes I am using this procedure to catch possible duplicates, as our web site allows a customer to enter its details more then one time, this leads to duplicate records, and they can have transactions related against different customers ID's, My goal is to give marketing department a clean mailing list for their mail shots and merge related records of customers (as a customer can be involved in differ rent processes under different ID's) so they can query using different criteria's, for this purpose I have created Full Text Indexes on 3 columns of the table in which My Contact information is present, Address Zip FullName Now this table contains multiple entries for one customer ( as it is allowed by design and according to business requirements) Example: CUTOMER ID NAME ADDRESS ZIP 1 xyz 88 Corp Road SE4 1FG 2 xyz 88 Corp RD SE4 1 FG Note: Road and RD, SE4 1NF and SE4 1 NF so My try is to catch these matched and other records through full text search, query through their related records in other tables Like their orders, brochure requests and their participations in different events and create two final tables FinalContact (Master) DenormalizedContactDetails (Detail) With FinalContact populated with unique customers(no duplications) and DenormalizedContactDetails with the activity related detail along with their duplicates ID's, I have tried this stored procedure with sample records and it works really fine and I am getting my desired output, the only problem is that full text search under cursor loop is taking to much time. As you can see in the store procedure that I consolidates the matches as well. As I mentioned in records example that data quality for Zip is not good so I have to use Contain for it We are already dealing with a company to remove duplication from our CRM, but I am just working on providing mailing list so I have to deal with it as quick as possible STORE PROCEDURE: IF EXISTS (SELECT name FROM sysobjects WHERE name = 'stp_PopulateDedupedContactsAlteredWithDetail' AND type = 'P') DROP PROCEDURE stp_PopulateDedupedContactsAlteredWithDetail GO CREATE PROCEDURE stp_PopulateDedupedContactsAlteredWithDetail AS SET QUOTED_IDENTIFIER OFF SET NOCOUNT ON DECLARE @vOuterContactID binary(8) DECLARE @vOuterAddress varchar(50) DECLARE @vOuterZip varchar(14) DECLARE @vOuterName varchar(30) DECLARE @vOuterGender varchar(6) DECLARE @vOuterZipTemp varchar(14) DECLARE @vContactID binary(8) DECLARE @vPriorityCode int DECLARE @vTitle varchar(9) DECLARE @vFirstName varchar(30) DECLARE @vLastName varchar(30) DECLARE @vCity varchar(40) DECLARE @vState varchar(30) DECLARE @vCountry varchar(35) DECLARE @vEmail varchar(50) DECLARE @vPhone varchar(25) DECLARE @vCell varchar(25) DECLARE @vDoNotMail tinyint DECLARE @vDoNotMobile tinyint DECLARE @vDoNotAll tinyint DECLARE @vDoNotEmail tinyint DECLARE @vDPAConsent tinyint DECLARE @vDPAConsentTP tinyint DECLARE @vDPAQasked tinyint Create Table #tTemp ( DedupId int IDENTITY (1, 1) NOT NULL, Address varchar(50), Zip varchar(12), FullName varchar(30), NTimes int, Gender varchar(10), Reported tinyint NULL ) Insert into #tTemp select con.Address, con.ZIP, con.full_name As "Full Name", COUNT(con1.contact_Id) as "Number of Times In Contact Table", con.Gender from contact22 con FUll Outer JOIN Contact22 con1 ON con.zip = con1.zip and con.address = con1.address and con.full_name = con1.full_name WHERE con.contact_id = con1.contact_Id AND con.reported = 0 group by con.Address,con.zip,con.full_name,Con.Gender order by Con.Address,COUNT(con1.contact_Id) desc Update #tTemp SET Reported = 0 CREATE NONCLUSTERED INDEX Temp_dedup_Index ON #tTemp (DedupId) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Address_Index ON #tTemp (Address) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Zip_Index ON #tTemp (Zip) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_FullName_Index ON #tTemp (FullName) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Reported_Index ON #tTemp (Reported) WITH FILLFACTOR = 100 DECLARE @vLoop tinyint SET @vLoop = 0 WHILE @vLoop =0 BEGIN IF EXISTS (select count(contact_id) from contact22 where Reported= 0) BEGIN DECLARE cr_OuterLoop CURSOR FOR SELECT TOP 1 Address,ZIP,FullName,Gender FROM #tTemp where Reported = 0 OPEN cr_OuterLoop FETCH NEXT FROM cr_OuterLoop INTO @vOuterAddress,@vOuterZip,@vOuterName,@vOuterGende r WHILE (@@FETCH_STATUS <> -1) BEGIN SET @vOuterZipTemp = @vOuterZip SET @vOuterZip = '"' + @vOuterZip + '"' IF (@@FETCH_STATUS <> -2) -- SET @vTemp =(select count(DedupId) from #tTemp where Reported reported = 0) -- Update DedupProcessesExecutionTime SET TimesExecuted = TimesExecuted + 1 --, Reported = @vTemp DELETE FROM TempDdpSup INSERT INTO TempDdpSup SELECT Contact_ID From Contact22 con WHERE con.Gender = @vOuterGender AND (Con.Reported = 0 or con.reported is null) AND Contains(con.zip,@vOuterZip) AND Freetext(Address, @vOuterAddress) AND Freetext(Con.Full_Name,@vOuterName) SELECT TOP 1 @vContactID = Con.Contact_Id,@vPriorityCode = Con.Priority_Code,@vTitle = Con.Title,@vFirstName = Con.First_Name, @vLastName = Con.Last_Name,@vCity = Con.City,@vState = Con.State_,@vCountry=Con.Country, @vEmail = Con.Email,@vPhone = Con.Phone,@vCell = Con.Cell,@vDoNotEmail = Con.Do_Not_Use_Email, @vDoNotMobile = Con.Do_Not_Use_Mobile, @vDoNotAll = Con.Do_Not_Use_All, @vDoNotMail = Con.Do_Not_Mail,@vDPAConsent = Con.DPA_Consent, @vDPAConsentTP = Con.DPA_Consent_TP,@vDPAQasked = Con.DPA_QAsked FROM Contact22 Con WHERE Con.Contact_id In (SELECT contact_id FROm TempDdpSup) ORDER BY Con.Rn_Create_Date DESC INSERT INTO FinalContact SELECT TOP 1 @vContactID,@vPriorityCode,@vTitle,@vFirstName,@vL astName,@vOuterName,@vOute rGender,@vOuterAddress,@vOuterZipTemp,@vCity, @vState,@vCountry,@vEmail,@vPhone,@vCell,@vDoNotEm ail,@vDoNotMobile,@vDoNotA ll,@vDoNotMail,@vDPAConsent, @vDPAConsentTP,@vDPAQasked, NULL,NULL,0, MAX(Ord.Order_Date) AS RecentPurchaseDate, CASE Ord.Status WHEN 'Payment Taken' THEN Count(Ord.Order__Id) ELSE 0 END AS NumberOfOrders, -- MAX(Bro.Sent_Date) AS RecentBrochureDate, MAX(Book.Start_Date) AS RecentparticipantDate, MAX(Bro.Request_Date) AS RecentBrochureRequestDate, 0 FROM TempDdpSup Con LEFT OUTER JOIN Order_ Ord ON Con.Contact_Id = Ord.Bill_To_Contact_Id LEFT OUTER JOIN Brochure Bro ON Con.Contact_Id = Bro.Contact_Id LEFT OUTER JOIN Booking Book ON Con.Contact_Id = Book.Contact_Id GROUP BY Ord.Status ORDER BY MAX(Ord.Order_Date) DESC, -- MAX(Bro.Sent_Date) DESC, MAX(Book.Start_Date), MAX(Bro.Request_Date) --update #tTemp so next time looped row will not be selected UPDATE #tTemp SET Reported = 1 WHERE zip = @vOuterZipTemp AND Address=@vOuterAddress AND FullName =@vOuterName AND Gender = @vOuterGender --Update contact2 so next time these rows will not be selected UPDATE Contact22 SET Reported = 1 WHERE Contact_Id IN (SELECT contact_id FROm TempDdpSup) --CODE FOR CONTACT DEATIL STARTS HERE INSERT INTO DenormalizedContactDetails SELECT @vContactID ,TempCon.contact_Id,Ord.Order_Date,Cha.Channel_Ide ntifier, NULL,NULL,Ord.Sub_Total FROM TempDdpSup TempCon INNER JOIN Order_ Ord ON TempCon.Contact_Id = Ord.Bill_To_Contact_Id INNER JOIN RLD_Order_Channels Cha ON Ord.Order_Channel_Id = Cha.RLD_Order_Channels_Id --insert related Brochure INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL,NULL, Bro.Request_Date,0 FROM TempDdpSup TempCon INNER JOIN Brochure Bro ON TempCon.Contact_Id = Bro.Contact_Id --insert related bookings (participent) INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL, Book.Start_Date,NULL,0 FROM TempDdpSup TempCon INNER JOIN Booking Book ON TempCon.Contact_Id = Book.Contact_Id /* --insert related recipient dates INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL, NULL,NULL,OrdD.Rn_Create_Date,0 FROM TempDdpSup TempCon INNER JOIN _Order_Detail OrdD ON TempCon.Contact_Id = OrdD.Recipient */ --CODE FOR CONTACT DEATIL ENDS HERE FETCH NEXT FROM cr_OuterLoop INTO @vOuterAddress,@vOuterZip,@vOuterName,@vOuterGende r END CLOSE cr_OuterLoop DEALLOCATE cr_OuterLoop END ELSE BEGIN SET @vLoop = 1 END END GO I open for all the suggestions Ahsan "Hilary Cotter" wrote: I'm not sure if full text search is the best tool for what you are doing. The type of fuzzy search that SQL FTS uses while doing FreeText search es will do linguistic stemming and expand the search to singular and plural forms of the search phrase. So it seems to me you are doing a freetext search to match Mr. John Q. Public with J.Q.Public, J. Q. Public, John Public, etc. It simply won't work for address and name. It also seems that you are executing the query once for each entry you have in your table. After you return matches, do you prune/consolidate these matches from the table? I also don't understand why you are doing a Contains query on the zip code? Surely this should be an equality match or perhaps a match with left(zip,5) =left(@vOuterZip, 5), or perhaps a case statement so you could avoid the left(zip,5) and merely search on zip. Lastly I strongly urge you to have a look at commercial products which do exactly what you are looking for and have logic built into them to handle all the possible cases of street name combination, or name variants, as well as compensating for spelling errors, like the product in the below link. http://www.name-searching.com/Correct_Address.html -- Hilary Cotter Looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html "Ahsan" <Ahsan (AT) discussions (DOT) microsoft.com> wrote in message news:3DB4A82E-023B-43D5-BE38-042B41846B71 (AT) microsoft (DOT) com... Operating System: Windows 2000 Server Server Brand: DELL (Intel Penium 111, 1133 MHz) Sql Server Standadrd Edition Ram: 2GB Processor: 1 Fulltext Index Size: 52MB Number Of Rows In Table: 1.2 Million Aprox SQL SERVER MEMORY SETTINGS: FIXED, 1945 MB MINIMUM QUERY MEMORY SETTING: 1048576 KB Average Records Processing Per Minute : 7.2 Average Records Processing Per Hour : 72 Target NUmber Of Rows: 1.1 Million stored procedure i am using is used to remove duplications from the contacts list, I altered the few parts of stored procedure and checked every statement using profiler,the only one taking long time to execute is the FULL TEXT SEARCHES one, SELECT Contact_ID From Contact22 con WHERE con.Gender = @vOuterGender AND (Con.Reported = 0 or con.reported is null) AND Contains(con.zip,@vOuterZip) AND Freetext(Address, @vOuterAddress) AND Freetext(Con.Full_Name,@vOuterName) as i mentioned that I have 1.1 million rows table on which fulltext is enabled on three cloumns Address,zip and Fullname (total size after full population is 52MB) so what all my stored procedure does is take every record from the contact table and matches it against others presnet in the table(1.1 million rows) using fulltext searches, now these full text searches are taking time!!! Currrenlty with all the settings I mentioned above my virtual memory is set to 1.5 times then the actual RAM, my database and full text indexes are on the same drive(thats the only choice) and it is execting at the speed of 20000 rows in 24 hours which means it required almost 55 days to finish, unacceptable..... can any one tell me that how to improve the performance in a senario like mine for full text seaches? or is it the standard speed every one gets from full text searches ? |
#6
| |||
| |||
|
strange, that's the same link I provided in my first response to Ahsan![]() -- Hilary Cotter Looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html "John Kane" <jt-kane (AT) comcast (DOT) net> wrote in message news:OhiyyNxiEHA.140 (AT) TK2MSFTNGP12 (DOT) phx.gbl... Ahsan, While I agree with Hilary that SQL FTS is not the best solution for this task of data cleansing, you can use 3rd party software to clean-up and match your SQL based textual data, for example: Intelligent Search Technology - Searching and Matching Software http://www.name-searching.com/ "NameSearch® - Search and matching software for enterprise systems." "CorrectAddress® - Is (US CASS Certified) software that validates, corrects, and enhances address information" "MerlinMerge® Is a sophisticated list management tool that performs duplicate record detection, merge/purge operations and can produce house hold link determination." While I've not (yet) tested this software, you can call their 800 number and get a free 60-day trial to test it out... Regards, John "Hilary Cotter" <hilaryk (AT) att (DOT) net> wrote in message news:ODRfjTriEHA.2848 (AT) TK2MSFTNGP10 (DOT) phx.gbl... this is absolutely the wrong way to do this. It will never match rd with road, or street with st or steert with street, ave with avenue, John Smith, with J Smith, etc. The strategy for handling this sort of a problem is to first clean up your data, by changing all occurences of St. or st, to Street, similar with Road, Avenue, Highway, etc. You will have to do something to put the names into a common format. After you have done this, you should start to group data by zip code, and then examine the groupings to scrub further and remove duplicates. Then group by city, examine the groupings and scrub further and remove duplicates. Then group by street and scrub further and remove duplicates. Then you can do the same procedure on last name to try to find people with the same names, but who have moved. Again I believe SQL FTS is the wrong tool for this, and there are commercial programs out there which can do much better jobs faster then you could. -- Hilary Cotter Looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html "Ahsan" <Ahsan (AT) discussions (DOT) microsoft.com> wrote in message news:CB88EA56-710F-43E8-AFC1-BEDD35CD4A50 (AT) microsoft (DOT) com... Thanks for the reply Hilary, Yes I am using this procedure to catch possible duplicates, as our web site allows a customer to enter its details more then one time, this leads to duplicate records, and they can have transactions related against different customers ID's, My goal is to give marketing department a clean mailing list for their mail shots and merge related records of customers (as a customer can be involved in differ rent processes under different ID's) so they can query using different criteria's, for this purpose I have created Full Text Indexes on 3 columns of the table in which My Contact information is present, Address Zip FullName Now this table contains multiple entries for one customer ( as it is allowed by design and according to business requirements) Example: CUTOMER ID NAME ADDRESS ZIP 1 xyz 88 Corp Road SE4 1FG 2 xyz 88 Corp RD SE4 1 FG Note: Road and RD, SE4 1NF and SE4 1 NF so My try is to catch these matched and other records through full text search, query through their related records in other tables Like their orders, brochure requests and their participations in different events and create two final tables FinalContact (Master) DenormalizedContactDetails (Detail) With FinalContact populated with unique customers(no duplications) and DenormalizedContactDetails with the activity related detail along with their duplicates ID's, I have tried this stored procedure with sample records and it works really fine and I am getting my desired output, the only problem is that full text search under cursor loop is taking to much time. As you can see in the store procedure that I consolidates the matches as well. As I mentioned in records example that data quality for Zip is not good so I have to use Contain for it We are already dealing with a company to remove duplication from our CRM, but I am just working on providing mailing list so I have to deal with it as quick as possible STORE PROCEDURE: IF EXISTS (SELECT name FROM sysobjects WHERE name = 'stp_PopulateDedupedContactsAlteredWithDetail' AND type = 'P') DROP PROCEDURE stp_PopulateDedupedContactsAlteredWithDetail GO CREATE PROCEDURE stp_PopulateDedupedContactsAlteredWithDetail AS SET QUOTED_IDENTIFIER OFF SET NOCOUNT ON DECLARE @vOuterContactID binary(8) DECLARE @vOuterAddress varchar(50) DECLARE @vOuterZip varchar(14) DECLARE @vOuterName varchar(30) DECLARE @vOuterGender varchar(6) DECLARE @vOuterZipTemp varchar(14) DECLARE @vContactID binary(8) DECLARE @vPriorityCode int DECLARE @vTitle varchar(9) DECLARE @vFirstName varchar(30) DECLARE @vLastName varchar(30) DECLARE @vCity varchar(40) DECLARE @vState varchar(30) DECLARE @vCountry varchar(35) DECLARE @vEmail varchar(50) DECLARE @vPhone varchar(25) DECLARE @vCell varchar(25) DECLARE @vDoNotMail tinyint DECLARE @vDoNotMobile tinyint DECLARE @vDoNotAll tinyint DECLARE @vDoNotEmail tinyint DECLARE @vDPAConsent tinyint DECLARE @vDPAConsentTP tinyint DECLARE @vDPAQasked tinyint Create Table #tTemp ( DedupId int IDENTITY (1, 1) NOT NULL, Address varchar(50), Zip varchar(12), FullName varchar(30), NTimes int, Gender varchar(10), Reported tinyint NULL ) Insert into #tTemp select con.Address, con.ZIP, con.full_name As "Full Name", COUNT(con1.contact_Id) as "Number of Times In Contact Table", con.Gender from contact22 con FUll Outer JOIN Contact22 con1 ON con.zip = con1.zip and con.address = con1.address and con.full_name = con1.full_name WHERE con.contact_id = con1.contact_Id AND con.reported = 0 group by con.Address,con.zip,con.full_name,Con.Gender order by Con.Address,COUNT(con1.contact_Id) desc Update #tTemp SET Reported = 0 CREATE NONCLUSTERED INDEX Temp_dedup_Index ON #tTemp (DedupId) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Address_Index ON #tTemp (Address) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Zip_Index ON #tTemp (Zip) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_FullName_Index ON #tTemp (FullName) WITH FILLFACTOR = 100 CREATE NONCLUSTERED INDEX Temp_Reported_Index ON #tTemp (Reported) WITH FILLFACTOR = 100 DECLARE @vLoop tinyint SET @vLoop = 0 WHILE @vLoop =0 BEGIN IF EXISTS (select count(contact_id) from contact22 where Reported= 0) BEGIN DECLARE cr_OuterLoop CURSOR FOR SELECT TOP 1 Address,ZIP,FullName,Gender FROM #tTemp where Reported = 0 OPEN cr_OuterLoop FETCH NEXT FROM cr_OuterLoop INTO @vOuterAddress,@vOuterZip,@vOuterName,@vOuterGende r WHILE (@@FETCH_STATUS <> -1) BEGIN SET @vOuterZipTemp = @vOuterZip SET @vOuterZip = '"' + @vOuterZip + '"' IF (@@FETCH_STATUS <> -2) -- SET @vTemp =(select count(DedupId) from #tTemp where Reported reported = 0) -- Update DedupProcessesExecutionTime SET TimesExecuted = TimesExecuted + 1 --, Reported = @vTemp DELETE FROM TempDdpSup INSERT INTO TempDdpSup SELECT Contact_ID From Contact22 con WHERE con.Gender = @vOuterGender AND (Con.Reported = 0 or con.reported is null) AND Contains(con.zip,@vOuterZip) AND Freetext(Address, @vOuterAddress) AND Freetext(Con.Full_Name,@vOuterName) SELECT TOP 1 @vContactID = Con.Contact_Id,@vPriorityCode = Con.Priority_Code,@vTitle = Con.Title,@vFirstName = Con.First_Name, @vLastName = Con.Last_Name,@vCity = Con.City,@vState = Con.State_,@vCountry=Con.Country, @vEmail = Con.Email,@vPhone = Con.Phone,@vCell = Con.Cell,@vDoNotEmail = Con.Do_Not_Use_Email, @vDoNotMobile = Con.Do_Not_Use_Mobile, @vDoNotAll = Con.Do_Not_Use_All, @vDoNotMail = Con.Do_Not_Mail,@vDPAConsent = Con.DPA_Consent, @vDPAConsentTP = Con.DPA_Consent_TP,@vDPAQasked = Con.DPA_QAsked FROM Contact22 Con WHERE Con.Contact_id In (SELECT contact_id FROm TempDdpSup) ORDER BY Con.Rn_Create_Date DESC INSERT INTO FinalContact SELECT TOP 1 @vContactID,@vPriorityCode,@vTitle,@vFirstName,@vL astName,@vOuterName,@vOute rGender,@vOuterAddress,@vOuterZipTemp,@vCity, @vState,@vCountry,@vEmail,@vPhone,@vCell,@vDoNotEm ail,@vDoNotMobile,@vDoNotA ll,@vDoNotMail,@vDPAConsent, @vDPAConsentTP,@vDPAQasked, NULL,NULL,0, MAX(Ord.Order_Date) AS RecentPurchaseDate, CASE Ord.Status WHEN 'Payment Taken' THEN Count(Ord.Order__Id) ELSE 0 END AS NumberOfOrders, -- MAX(Bro.Sent_Date) AS RecentBrochureDate, MAX(Book.Start_Date) AS RecentparticipantDate, MAX(Bro.Request_Date) AS RecentBrochureRequestDate, 0 FROM TempDdpSup Con LEFT OUTER JOIN Order_ Ord ON Con.Contact_Id = Ord.Bill_To_Contact_Id LEFT OUTER JOIN Brochure Bro ON Con.Contact_Id = Bro.Contact_Id LEFT OUTER JOIN Booking Book ON Con.Contact_Id = Book.Contact_Id GROUP BY Ord.Status ORDER BY MAX(Ord.Order_Date) DESC, -- MAX(Bro.Sent_Date) DESC, MAX(Book.Start_Date), MAX(Bro.Request_Date) --update #tTemp so next time looped row will not be selected UPDATE #tTemp SET Reported = 1 WHERE zip = @vOuterZipTemp AND Address=@vOuterAddress AND FullName =@vOuterName AND Gender = @vOuterGender --Update contact2 so next time these rows will not be selected UPDATE Contact22 SET Reported = 1 WHERE Contact_Id IN (SELECT contact_id FROm TempDdpSup) --CODE FOR CONTACT DEATIL STARTS HERE INSERT INTO DenormalizedContactDetails SELECT @vContactID ,TempCon.contact_Id,Ord.Order_Date,Cha.Channel_Ide ntifier, NULL,NULL,Ord.Sub_Total FROM TempDdpSup TempCon INNER JOIN Order_ Ord ON TempCon.Contact_Id = Ord.Bill_To_Contact_Id INNER JOIN RLD_Order_Channels Cha ON Ord.Order_Channel_Id = Cha.RLD_Order_Channels_Id --insert related Brochure INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL,NULL, Bro.Request_Date,0 FROM TempDdpSup TempCon INNER JOIN Brochure Bro ON TempCon.Contact_Id = Bro.Contact_Id --insert related bookings (participent) INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL, Book.Start_Date,NULL,0 FROM TempDdpSup TempCon INNER JOIN Booking Book ON TempCon.Contact_Id = Book.Contact_Id /* --insert related recipient dates INSERT INTO DenormalizedContactDetails SELECT @vContactID,TempCon.contact_Id,NULL,NULL, NULL,NULL,OrdD.Rn_Create_Date,0 FROM TempDdpSup TempCon INNER JOIN _Order_Detail OrdD ON TempCon.Contact_Id = OrdD.Recipient */ --CODE FOR CONTACT DEATIL ENDS HERE FETCH NEXT FROM cr_OuterLoop INTO @vOuterAddress,@vOuterZip,@vOuterName,@vOuterGende r END CLOSE cr_OuterLoop DEALLOCATE cr_OuterLoop END ELSE BEGIN SET @vLoop = 1 END END GO I open for all the suggestions Ahsan "Hilary Cotter" wrote: I'm not sure if full text search is the best tool for what you are doing. The type of fuzzy search that SQL FTS uses while doing FreeText search es will do linguistic stemming and expand the search to singular and plural forms of the search phrase. So it seems to me you are doing a freetext search to match Mr. John Q. Public with J.Q.Public, J. Q. Public, John Public, etc. It simply won't work for address and name. It also seems that you are executing the query once for each entry you have in your table. After you return matches, do you prune/consolidate these matches from the table? I also don't understand why you are doing a Contains query on the zip code? Surely this should be an equality match or perhaps a match with left(zip,5) =left(@vOuterZip, 5), or perhaps a case statement so you could avoid the left(zip,5) and merely search on zip. Lastly I strongly urge you to have a look at commercial products which do exactly what you are looking for and have logic built into them to handle all the possible cases of street name combination, or name variants, as well as compensating for spelling errors, like the product in the below link. http://www.name-searching.com/Correct_Address.html -- Hilary Cotter Looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html "Ahsan" <Ahsan (AT) discussions (DOT) microsoft.com> wrote in message news:3DB4A82E-023B-43D5-BE38-042B41846B71 (AT) microsoft (DOT) com... Operating System: Windows 2000 Server Server Brand: DELL (Intel Penium 111, 1133 MHz) Sql Server Standadrd Edition Ram: 2GB Processor: 1 Fulltext Index Size: 52MB Number Of Rows In Table: 1.2 Million Aprox SQL SERVER MEMORY SETTINGS: FIXED, 1945 MB MINIMUM QUERY MEMORY SETTING: 1048576 KB Average Records Processing Per Minute : 7.2 Average Records Processing Per Hour : 72 Target NUmber Of Rows: 1.1 Million stored procedure i am using is used to remove duplications from the contacts list, I altered the few parts of stored procedure and checked every statement using profiler,the only one taking long time to execute is the FULL TEXT SEARCHES one, SELECT Contact_ID From Contact22 con WHERE con.Gender = @vOuterGender AND (Con.Reported = 0 or con.reported is null) AND Contains(con.zip,@vOuterZip) AND Freetext(Address, @vOuterAddress) AND Freetext(Con.Full_Name,@vOuterName) as i mentioned that I have 1.1 million rows table on which fulltext is enabled on three cloumns Address,zip and Fullname (total size after full population is 52MB) so what all my stored procedure does is take every record from the contact table and matches it against others presnet in the table(1.1 million rows) using fulltext searches, now these full text searches are taking time!!! Currrenlty with all the settings I mentioned above my virtual memory is set to 1.5 times then the actual RAM, my database and full text indexes are on the same drive(thats the only choice) and it is execting at the speed of 20000 rows in 24 hours which means it required almost 55 days to finish, unacceptable..... can any one tell me that how to improve the performance in a senario like mine for full text seaches? or is it the standard speed every one gets from full text searches ? |
![]() |
| Thread Tools | |
| Display Modes | |
| |