![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
In my brute force port, I just bulk copied the date fields into temporary tables and then did a to_timestamp(field, 'Mon DD YYYY HH:MI:SS:MSAM'). Again, I created a temporary table and did a decode(field, 'hex') to the real table. |
#3
| |||
| |||
|
|
Sybase bulk copies the date fields out in this format: Mar 4 1973 10:28:00:000AM Postgresql's COPY (or psql \copy) doesn't like that format. |
|
I have a similarish problem with another field type. In Sybase it's a binary format. In postgres it is a binary format (bytea). But Sybase bcps the data out in ASCII. Sybase recognizes that when it is a binary field and auto-converts the ASCII back to binary. Postgres doesn't. Again, I created a temporary table and did a decode(field, 'hex') to the real table. |
#4
| |||
| |||
|
|
On Wed, Oct 13, 2004 at 10:06:58AM -0400, David Rysdam wrote: Sybase bulk copies the date fields out in this format: Mar 4 1973 10:28:00:000AM Postgresql's COPY (or psql \copy) doesn't like that format. You could filter the data through a script that reformats certain fields, then feed the reformatted data to PostgreSQL. This is usually a trivial task for Perl, awk, sed, or the like. Right, I *can* do this. But then I have to build knowledge into that |
|
I have a similarish problem with another field type. In Sybase it's a binary format. In postgres it is a binary format (bytea). But Sybase bcps the data out in ASCII. Sybase recognizes that when it is a binary field and auto-converts the ASCII back to binary. Postgres doesn't. Again, I created a temporary table and did a decode(field, 'hex') to the real table. Sounds like Sybase is dumping in hex, whereas PostgreSQL expects octal. If you can't change the dump format, then again, filtering the data through a script might work. Oh, so I can load binary data into PG if it's ASCII-encoded octal? Why |
#5
| ||||
| ||||
|
|
David Rysdam <drysdam (AT) ll (DOT) mit.edu> writes: In my brute force port, I just bulk copied the date fields into temporary tables and then did a to_timestamp(field, 'Mon DD YYYY HH:MI:SS:MSAM'). Again, I created a temporary table and did a decode(field, 'hex') to the real table. This is the standard approach. You're rather lucky these are the only data representation changes you've had to do so far. I fear you'll run into more and more complex changes over time and trying to avoid the temporary table will get harder and harder. No, I think I'm OK there. These are programmatically-generated values |
|
If it were me I would consider processing the files in perl. It should be pretty easy to do both of these modifications very quickly. Very quick and easy to do one time. A little trickier to handle in an |
|
If you really want to go with a custom C code then you might be able to just grab the byteain/byteaout functions from src/backend/util/adt/varlena into a separate module and create new functions with modified names. Load it with CREATE FUNCTION byteain ... AS 'my_bytea_funcs.so' 'my_byteain'; Or maybe create the function as my_byteain in postgres and then update the catalog entries somehow. I'm not sure how to do that but it shouldn't be too hard. And it might make it easier to do the substitution for the data load and then undo the change afterwards. Why not create a type and then define the load function to be the |
|
Doing the same for timmestamp is a bit trickier but you could copy ParseDateTime from datetime.c as a static function for your module. Be careful though, test this out thoroughly on a test database. I'm not sure of all the impacts of altering the in/out functions for data types. I expect it would break pg_dump, for example. And I would worry about the statistics tables too. This is kind of a hybrid of my suggestions and the problems are a hybrid |

#6
| |||||
| |||||
|
|
Michael Fuhr wrote: You could filter the data through a script that reformats certain fields, then feed the reformatted data to PostgreSQL. This is usually a trivial task for Perl, awk, sed, or the like. Right, I *can* do this. But then I have to build knowledge into that script so it can find each of these date fields (there's like 20 of them across 10 different files) and then update that knowledge each time it changes. |
|
I'm still leaning towards just making postgres accept at ':' delimiter for milliseconds. |
|
Also, how much would a secondary script slow down the bulk copy, if any? |
|
Sounds like Sybase is dumping in hex, whereas PostgreSQL expects octal. If you can't change the dump format, then again, filtering the data through a script might work. Oh, so I can load binary data into PG if it's ASCII-encoded octal? |
|
Why not the user-defined type with associated user-defined input function? |
#7
| |||
| |||
|
|
Right, I *can* do this. But then I have to build knowledge into that script so it can find each of these date fields (there's like 20 of them across 10 different files) and then update that knowledge each time it changes. In your case that's a reasonable argument against filtering the data with a script. Using a regular expression in the script might reduce or eliminate the need for some of the logic, but then you'd run the risk of reformatting data that shouldn't have been touched. |
#8
| |||
| |||
|
|
On Wed, Oct 13, 2004 at 01:32:01PM -0400, David Rysdam wrote: Michael Fuhr wrote: You could filter the data through a script that reformats certain fields, then feed the reformatted data to PostgreSQL. This is usually a trivial task for Perl, awk, sed, or the like. Right, I *can* do this. But then I have to build knowledge into that script so it can find each of these date fields (there's like 20 of them across 10 different files) and then update that knowledge each time it changes. In your case that's a reasonable argument against filtering the data with a script. Using a regular expression in the script might reduce or eliminate the need for some of the logic, but then you'd run the risk of reformatting data that shouldn't have been touched. I'm still leaning towards just making postgres accept at ':' delimiter for milliseconds. Based on your requirements, that might indeed be a better solution. I'd probably choose to extend PostgreSQL rather than hack what already exists, though. Doing the latter might break something else and you have to remember to add the hack every time you upgrade the server software. That can cause headaches for whoever inherits the system from you unless it's well-documented. By "extend PostgreSQL" do you mean create a custom input_function for |
|
Why not the user-defined type with associated user-defined input function? If filtering the data is awkward, then that might be a better way to go. I think I will, when I get to that point. |
#9
| |||
| |||
|
|
You can have your script make a query in the database to fetch the data types of the fields and then know which ones are to be transformed and how. The script would take as arguments a dump file and a database,schema.table, would read the file and pipe the transformed data into a psql with a COPY FROM stdin command... could save you a lot of work no ? |
#10
| |||
| |||
|
|
Michael Fuhr wrote: I'd probably choose to extend PostgreSQL rather than hack what already exists, though. By "extend PostgreSQL" do you mean create a custom input_function for timestamp? Are there docs that give hints for replacing the input function of an existing type? Someone else replied similarly, but I'm afraid I'm not familiar enough with PG to decipher it all. |
![]() |
| Thread Tools | |
| Display Modes | |
| |