dbTalk Databases Forums  

Preventing double-loading of input files

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss Preventing double-loading of input files in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
DavidStallard
 
Posts: n/a

Default Preventing double-loading of input files - 05-16-2005 , 01:28 PM






I have a situation where I'm loading text files into a database. Filenames
are reused, so two files with the same name might hold different data. I
need to be able to uniquely identify files using a method other than filename
so that I know if I've loaded the file already or not. One thought was to
generate a hash key for each file, but I don't know if that's possible in
DTS/TSQL. Any suggestions?

Reply With Quote
  #2  
Old   
Darren Green
 
Posts: n/a

Default Re: Preventing double-loading of input files - 05-16-2005 , 01:55 PM






In message <D1DD4FB2-A74A-4C3C-A447-A4A848BBD5EE (AT) microsoft (DOT) com>,
DavidStallard <DavidStallard (AT) discussions (DOT) microsoft.com> writes
Quote:
I have a situation where I'm loading text files into a database. Filenames
are reused, so two files with the same name might hold different data. I
need to be able to uniquely identify files using a method other than filename
so that I know if I've loaded the file already or not. One thought was to
generate a hash key for each file, but I don't know if that's possible in
DTS/TSQL. Any suggestions?
No it isn't directly support but there are tools out there you could
use. Saw a cmd line executable that would generate a CRC for a file, so
run that and capture it perhaps.

Or

Why not move or rename the files once loaded, then any file in the
original folder or any file not renamed, you know is new and needs
loading.


--
Darren Green (SQL Server MVP)
DTS - http://www.sqldts.com

PASS - the definitive, global community for SQL Server professionals
http://www.sqlpass.org



Reply With Quote
  #3  
Old   
DavidStallard
 
Posts: n/a

Default Re: Preventing double-loading of input files - 05-16-2005 , 02:49 PM



The latter won't work because what if the source accidentally sends the same
file twice? It would get loaded twice since the original has already been
renamed/moved.

FYI, since I posted this question, I came across an Extended Stored Proc MD5
implementation. You can check it out here:

http://www.thecodeproject.com/databa...lect=10542 59


Reply With Quote
  #4  
Old   
Dave
 
Posts: n/a

Default RE: Preventing double-loading of input files - 05-16-2005 , 03:51 PM



Have you considered using the file system object to get the text file's last
modified date and comparing subsequent files last modified date. Only allow
import if the new file's last modified date is greater (later) than the last
update.

"DavidStallard" wrote:

Quote:
I have a situation where I'm loading text files into a database. Filenames
are reused, so two files with the same name might hold different data. I
need to be able to uniquely identify files using a method other than filename
so that I know if I've loaded the file already or not. One thought was to
generate a hash key for each file, but I don't know if that's possible in
DTS/TSQL. Any suggestions?

Reply With Quote
  #5  
Old   
Darren Green
 
Posts: n/a

Default Re: Preventing double-loading of input files - 05-16-2005 , 04:44 PM



In message <4D042BE8-E162-4DBF-90A5-494A0E06D7FD (AT) microsoft (DOT) com>,
DavidStallard <DavidStallard (AT) discussions (DOT) microsoft.com> writes
Quote:
The latter won't work because what if the source accidentally sends the same
file twice? It would get loaded twice since the original has already been
renamed/moved.

FYI, since I posted this question, I came across an Extended Stored Proc MD5
implementation. You can check it out here:

http://www.thecodeproject.com/databa...orumid=32460&e
xp=0&select=1054259

You said the filename was reused, not the file itself was reused or
rather resent. Isn't this a bit stupid on the part of the file supplier?
Why not tackle that?

Either way I wouldn't use an xp, partly because unless they are 100%
perfect they could destabilise your SQL Server, and also transferring
the entire file into SQL to do the hash seems excessive.

Try some of these articles on file CRCs, seems more appropriate since
you start with a file-

File Verification Using CRC
(http://www.dogma.net/markn/articles/crcman/crcman.htm)

32 Bit CRC File Calculation
(http://www.createwindow.com/programm...32/crcfile.htm)


FSUM - File Integrity Checker
(http://www.slavasoft.com/fsum/)

Free Download Command Line SFV Checker 0.1 - Generates and checks the
integrity of a SFV.
(http://www.vadino.com/windows/utilit...sfv-checker.ht
ml)

MD5: Command Line Message Digest Utility
(http://www.fourmilab.ch/md5/)


--
Darren Green (SQL Server MVP)
DTS - http://www.sqldts.com

PASS - the definitive, global community for SQL Server professionals
http://www.sqlpass.org



Reply With Quote
  #6  
Old   
Panch
 
Posts: n/a

Default Re: Preventing double-loading of input files - 05-16-2005 , 05:22 PM



Hi
My suggestion is

1. create acticex script
2. set object ("scripting.FileSystemObject")
3. get the file created date ---

(save it in a table every time you load this file)

4. check against this date

I hope this helps





Darren Green wrote:
Quote:
In message <4D042BE8-E162-4DBF-90A5-494A0E06D7FD (AT) microsoft (DOT) com>,
DavidStallard <DavidStallard (AT) discussions (DOT) microsoft.com> writes
The latter won't work because what if the source accidentally sends
the same
file twice? It would get loaded twice since the original has
already been
renamed/moved.

FYI, since I posted this question, I came across an Extended Stored
Proc MD5
implementation. You can check it out here:


http://www.thecodeproject.com/databa...orumid=32460&e
xp=0&select=1054259


You said the filename was reused, not the file itself was reused or
rather resent. Isn't this a bit stupid on the part of the file
supplier?
Why not tackle that?

Either way I wouldn't use an xp, partly because unless they are 100%
perfect they could destabilise your SQL Server, and also transferring

the entire file into SQL to do the hash seems excessive.

Try some of these articles on file CRCs, seems more appropriate since

you start with a file-

File Verification Using CRC
(http://www.dogma.net/markn/articles/crcman/crcman.htm)

32 Bit CRC File Calculation
(http://www.createwindow.com/programm...32/crcfile.htm)


FSUM - File Integrity Checker
(http://www.slavasoft.com/fsum/)

Free Download Command Line SFV Checker 0.1 - Generates and checks the

integrity of a SFV.

(http://www.vadino.com/windows/utilit...sfv-checker.ht
ml)

MD5: Command Line Message Digest Utility
(http://www.fourmilab.ch/md5/)


--
Darren Green (SQL Server MVP)
DTS - http://www.sqldts.com

PASS - the definitive, global community for SQL Server professionals
http://www.sqlpass.org


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.