dbTalk Databases Forums  

Bad data

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss Bad data in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mike
 
Posts: n/a

Default Bad data - 03-01-2004 , 09:49 AM






I am creating a DTS process that will extract data from a
text file and place it into sql server. The data comes
from a Paradox database and outputted into a text file.
The file is delimited with a pipe (|) and text has double
quotes "".

Issue number 1.
One problem I'm noticing, there are many dates in the file
that have invalid dates such as 02-04-0203. So when I try
to load them into a datetime field in SQL it gives an
error. How can that be avoided, temporarily I changed the
field type to be a varchar, but that doesn't seem
efficient or like the right thing to do.

Issue number 2.
The second problem is that somewhere in the file, the
double quotes must be off, because when I try to tell sql
server that the text is delimited by a | and "" it gives
an error. So my work around that issue is to allow quotes
to be inserted into the fields then I have to run a
routine to remove the quotes. That again seems like an
inefficient way of handling the data, but it does work.
Are there any suggestions on if I should alter this
process? What is a better way to handle this data which
seems to have bad data in it? Is there any way to pre-
clean the data from the textfile so it will go into the
table without quotes?

Reply With Quote
  #2  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: Bad data - 03-01-2004 , 11:32 AM






#1 is a classic problem when moving date data. Everybody has there own
limits and none of them ever seem to meet :-(

Have you tried an Active Script transform and DEFAULT the data in this
column if it is Out Of Range.


#2 The Quotes should not be entered into the table as well. Have you a
sample row from the text file and your table structure?

--

Allan Mitchell MCSE,MCDBA, (Microsoft SQL Server MVP)
www.allisonmitchell.com - Expert SQL Server Consultancy.
www.SQLDTS.com - The site for all your DTS needs.
I support PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org


"Mike" <anonymous (AT) discussions (DOT) microsoft.com> wrote

Quote:
I am creating a DTS process that will extract data from a
text file and place it into sql server. The data comes
from a Paradox database and outputted into a text file.
The file is delimited with a pipe (|) and text has double
quotes "".

Issue number 1.
One problem I'm noticing, there are many dates in the file
that have invalid dates such as 02-04-0203. So when I try
to load them into a datetime field in SQL it gives an
error. How can that be avoided, temporarily I changed the
field type to be a varchar, but that doesn't seem
efficient or like the right thing to do.

Issue number 2.
The second problem is that somewhere in the file, the
double quotes must be off, because when I try to tell sql
server that the text is delimited by a | and "" it gives
an error. So my work around that issue is to allow quotes
to be inserted into the fields then I have to run a
routine to remove the quotes. That again seems like an
inefficient way of handling the data, but it does work.
Are there any suggestions on if I should alter this
process? What is a better way to handle this data which
seems to have bad data in it? Is there any way to pre-
clean the data from the textfile so it will go into the
table without quotes?



Reply With Quote
  #3  
Old   
Mike
 
Posts: n/a

Default Re: Bad data - 03-01-2004 , 12:57 PM



Table Structure:

CREATE TABLE [dbo].[INET_Sessions] (
[Course] [varchar] (8) ,
[Type] [varchar] (2) ,
[SessionDate] [smalldatetime] NOT NULL ,
[Session] [varchar] (2) ,
[Provider] [varchar] (7) ,
[SessionDescription] [varchar] (60) ,
[DateRange] [varchar] (40) ,
[OEPACertType] [varchar] (4) ,
[OEPACrs] [varchar] (15) ,
[OEPAExpDate] [smalldatetime] NULL ,
[CourseDescription] [varchar] (60) ,
[Contact] [varchar] (35) ,
[ProviderName] [varchar] (60)
) ON [PRIMARY]
GO

Sample data:
"A-1"|"SC"|8/30/2001|"A0"|"AWWA00"|"Strategic Plan for the
Water industry AWWA
2001Conference"||"D"|"104"|8/23/2003|"Strategic Plan for
the Water industry AWWA 2001Conference"|"Ray
Shawn"|"American Water Works Association"
"A-2"|"SC"|8/30/2001|"A0"|"AWWA00"|"Solving the Cooper
Corrosion Prob. AWWA 2001
Conference"||"D"|"109"|8/23/2003|"Solving the Cooper
Corrosion Prob. AWWA 2001 Conference"|"Ray Shawn"|"US
Association"
"A-3"|"SC"|8/30/2001|"A0"|"AWWA00"|"Standardization of
Hach PO4 Kit Data AWWA 2001
Conference"||"D"|"115"|8/23/2003|"Standardization of Hach
PO4 Kit Data AWWA 2001 Conference"|"Ray Shawn"|"SS
Association"

Hope that helps, by the way, where do you begin to learn
how to set up an Active Script transform to DEFAULT the
data in this column if it is Out Of Range?


Quote:
-----Original Message-----
#1 is a classic problem when moving date data. Everybody
has there own
limits and none of them ever seem to meet :-(

Have you tried an Active Script transform and DEFAULT the
data in this
column if it is Out Of Range.


#2 The Quotes should not be entered into the table as
well. Have you a
sample row from the text file and your table structure?

--

Allan Mitchell MCSE,MCDBA, (Microsoft SQL Server MVP)
www.allisonmitchell.com - Expert SQL Server Consultancy.
www.SQLDTS.com - The site for all your DTS needs.
I support PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org


"Mike" <anonymous (AT) discussions (DOT) microsoft.com> wrote in
message
news:488201c3ffa4$c3ae5f10$a501280a (AT) phx (DOT) gbl...
I am creating a DTS process that will extract data from
a
text file and place it into sql server. The data comes
from a Paradox database and outputted into a text file.
The file is delimited with a pipe (|) and text has
double
quotes "".

Issue number 1.
One problem I'm noticing, there are many dates in the
file
that have invalid dates such as 02-04-0203. So when I
try
to load them into a datetime field in SQL it gives an
error. How can that be avoided, temporarily I changed
the
field type to be a varchar, but that doesn't seem
efficient or like the right thing to do.

Issue number 2.
The second problem is that somewhere in the file, the
double quotes must be off, because when I try to tell
sql
server that the text is delimited by a | and "" it gives
an error. So my work around that issue is to allow
quotes
to be inserted into the fields then I have to run a
routine to remove the quotes. That again seems like an
inefficient way of handling the data, but it does work.
Are there any suggestions on if I should alter this
process? What is a better way to handle this data which
seems to have bad data in it? Is there any way to pre-
clean the data from the textfile so it will go into the
table without quotes?


.


Reply With Quote
  #4  
Old   
Jay Grubb
 
Posts: n/a

Default Re: Bad data - 03-01-2004 , 06:09 PM



Bad data is always a problem, and using delimited files can really
cause problems if the data isn't processed before extraction.

I'd reccomend using an ODBC driver and going directly after the data.
I beleive Paradox has an ODBC packaged with. DTS can apply validation
on a row by row basis, and it's more likely to be repeatable.

Using an ODBC driver would let you, for example, default a bad value
(failing a conversion test) or null it.

You can either run it directly from DTS, or build an app in the
language of your choice and fetch and insert the records. The big
hang up would be if the Paradox data is on another server.
(Disclaimer, I work for an ODBC Driver vendor)

Openlink can provide a bridge, (sorry, we don't have native paradox
drivers) where you have a generic client for the DTS, for example,
and it connects to a Broker on the system where the paradox files are
hosted. It then proxies the requests and returns the results.

The benefit is that there would be almost no escaping problem for
embeded meta characters, like " or / or' etc. Simpler interface,
far less special cases. You'd just have to worry about things like
the bad dates.

Good luck, feel free to contact me.




"Mike" <anonymous (AT) discussions (DOT) microsoft.com> wrote

Quote:
I am creating a DTS process that will extract data from a
text file and place it into sql server. The data comes
from a Paradox database and outputted into a text file.
The file is delimited with a pipe (|) and text has double
quotes "".

Issue number 1.
One problem I'm noticing, there are many dates in the file
that have invalid dates such as 02-04-0203. So when I try
to load them into a datetime field in SQL it gives an
error. How can that be avoided, temporarily I changed the
field type to be a varchar, but that doesn't seem
efficient or like the right thing to do.

Issue number 2.
The second problem is that somewhere in the file, the
double quotes must be off, because when I try to tell sql
server that the text is delimited by a | and "" it gives
an error. So my work around that issue is to allow quotes
to be inserted into the fields then I have to run a
routine to remove the quotes. That again seems like an
inefficient way of handling the data, but it does work.
Are there any suggestions on if I should alter this
process? What is a better way to handle this data which
seems to have bad data in it? Is there any way to pre-
clean the data from the textfile so it will go into the
table without quotes?

Reply With Quote
  #5  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: Bad data - 03-02-2004 , 01:22 AM



You data imports good into my table using a datapump task.

As for dates. If you are not interested in time then personally like to
enter textual dates using this

Formatting Character Data into Datetime fields
(http://www.sqldts.com/default.aspx?249)

Is it out of range?

Check like this

You column datatype is a smalldatetime = Date and time data from January 1,
1900, through June 6, 2079

so let's test some failures

(value = "01-01-0203")

if Right( DTSSource("MyPossibleBadData"), 4 ) < "1900" then
msgbox "OOR - defaulted to 19000101"
end if



--

----------------------------

Allan Mitchell MCSE,MCDBA, (Microsoft SQL Server MVP)
www.allisonmitchell.com - Expert SQL Server Consultancy.
www.SQLDTS.com - The site for all your DTS needs.
I support PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org


"Mike" <anonymous (AT) discussions (DOT) microsoft.com> wrote

Quote:
Table Structure:

CREATE TABLE [dbo].[INET_Sessions] (
[Course] [varchar] (8) ,
[Type] [varchar] (2) ,
[SessionDate] [smalldatetime] NOT NULL ,
[Session] [varchar] (2) ,
[Provider] [varchar] (7) ,
[SessionDescription] [varchar] (60) ,
[DateRange] [varchar] (40) ,
[OEPACertType] [varchar] (4) ,
[OEPACrs] [varchar] (15) ,
[OEPAExpDate] [smalldatetime] NULL ,
[CourseDescription] [varchar] (60) ,
[Contact] [varchar] (35) ,
[ProviderName] [varchar] (60)
) ON [PRIMARY]
GO

Sample data:
"A-1"|"SC"|8/30/2001|"A0"|"AWWA00"|"Strategic Plan for the
Water industry AWWA
2001Conference"||"D"|"104"|8/23/2003|"Strategic Plan for
the Water industry AWWA 2001Conference"|"Ray
Shawn"|"American Water Works Association"
"A-2"|"SC"|8/30/2001|"A0"|"AWWA00"|"Solving the Cooper
Corrosion Prob. AWWA 2001
Conference"||"D"|"109"|8/23/2003|"Solving the Cooper
Corrosion Prob. AWWA 2001 Conference"|"Ray Shawn"|"US
Association"
"A-3"|"SC"|8/30/2001|"A0"|"AWWA00"|"Standardization of
Hach PO4 Kit Data AWWA 2001
Conference"||"D"|"115"|8/23/2003|"Standardization of Hach
PO4 Kit Data AWWA 2001 Conference"|"Ray Shawn"|"SS
Association"

Hope that helps, by the way, where do you begin to learn
how to set up an Active Script transform to DEFAULT the
data in this column if it is Out Of Range?


-----Original Message-----
#1 is a classic problem when moving date data. Everybody
has there own
limits and none of them ever seem to meet :-(

Have you tried an Active Script transform and DEFAULT the
data in this
column if it is Out Of Range.


#2 The Quotes should not be entered into the table as
well. Have you a
sample row from the text file and your table structure?

--

Allan Mitchell MCSE,MCDBA, (Microsoft SQL Server MVP)
www.allisonmitchell.com - Expert SQL Server Consultancy.
www.SQLDTS.com - The site for all your DTS needs.
I support PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org


"Mike" <anonymous (AT) discussions (DOT) microsoft.com> wrote in
message
news:488201c3ffa4$c3ae5f10$a501280a (AT) phx (DOT) gbl...
I am creating a DTS process that will extract data from
a
text file and place it into sql server. The data comes
from a Paradox database and outputted into a text file.
The file is delimited with a pipe (|) and text has
double
quotes "".

Issue number 1.
One problem I'm noticing, there are many dates in the
file
that have invalid dates such as 02-04-0203. So when I
try
to load them into a datetime field in SQL it gives an
error. How can that be avoided, temporarily I changed
the
field type to be a varchar, but that doesn't seem
efficient or like the right thing to do.

Issue number 2.
The second problem is that somewhere in the file, the
double quotes must be off, because when I try to tell
sql
server that the text is delimited by a | and "" it gives
an error. So my work around that issue is to allow
quotes
to be inserted into the fields then I have to run a
routine to remove the quotes. That again seems like an
inefficient way of handling the data, but it does work.
Are there any suggestions on if I should alter this
process? What is a better way to handle this data which
seems to have bad data in it? Is there any way to pre-
clean the data from the textfile so it will go into the
table without quotes?


.




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.