![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
|
I have a somewhat similar situation that I have to deal with routinely. My approach: Create a separate FileMaker file specifically to clean up the data, then import from that into my main database. Call it Cleanup.fp7 for the sake of discussion. The first step is to get rid of the first rows of garbage in the input file, so that Cleanup.fp7 is presented with a consistent set of fields for the first record. Then I import the data into Cleanup.fp7. Cleanup.fp7 has a series of scripts that do several things: Find all the records that contain garbage, based on suitable criteria, and delete those. Parse out names and other data fields into more rational structure, using scripts with Text functions. Normalize data to a consistent set of naming conventions, using Find and Replace (for example, make Street, St. and St into a standard abbreviation St) And so on. Before I could design the Cleanup.fp7 file I had to know the final form for import of data into my main database, and I also had to know the structure of the "good" data and the garbage in the input file. Armed with that, I could define fields and scripts in Cleanup.fp7 to accomplish the job. There was a lot of trial and error to get it right. I did not import anything into my main database until i was confident that the data was right in Cleanup.fp7. Once I had it right, I could use Cleanup.fp7 over and over to clean up sets of new data as they came to me. Hope this helps. |
#2
| |||
| |||
|
|
I'm surprised no one has mentioned the web viewer functionality yet. This could eliminate much of the hassle of getting the data from the web site. You can point the web viewer at the protected site and log in manually. Or, you could figure out a way to include the credentials within the URL, if possible. Once the data you're interested in is showing in the web viewer, you can click a button to grab the content of that page as the HTML source text. The button would be something like: Set Field [YourField, GetLayoutObjectAttribute ( "wv"; "content" )] Where "wv" is the name you gave the web viewer object in layout mode (using the Object Inspector palette), and YourField is the place where you'd like to put the data. Now with several clicks eliminated, you can direct your calculation skills toward parsing the data. Without knowing the precise format of the information it is very difficult for me to recommend a specific strategy. But it is often easier than you think. For example, if there is something unique about each non-garbage line, you could use this characteristic in your formula to quicken the chase. Perhaps there is always a <TD> in there? The HTML code looks more complicated to human eyes, but it's often more helpful for parsing out relevant bits of data. If you have FileMaker 8.5 Advanced, you can write a spiffy recursive function to parse out the content. If you have the non-Advanced version, you can write a script. And finally you'd use a script to transform the cleaned-up lines into tidy new records in a table. |
|
By the way, you say you're on a Mac, but that you have to log into IE? Does the web site somehow not work with Safari? |
#3
| |||
| |||
|
|
On 2007-01-02 08:48:30 -0500, "Bill Marriott" <wjm (AT) wjm (DOT) org> said: I'm surprised no one has mentioned the web viewer functionality yet. This could eliminate much of the hassle of getting the data from the web site. You can point the web viewer at the protected site and log in manually. Or, you could figure out a way to include the credentials within the URL, if possible. Once the data you're interested in is showing in the web viewer, you can click a button to grab the content of that page as the HTML source text. The button would be something like: Set Field [YourField, GetLayoutObjectAttribute ( "wv"; "content" )] Where "wv" is the name you gave the web viewer object in layout mode (using the Object Inspector palette), and YourField is the place where you'd like to put the data. Now with several clicks eliminated, you can direct your calculation skills toward parsing the data. Without knowing the precise format of the information it is very difficult for me to recommend a specific strategy. But it is often easier than you think. For example, if there is something unique about each non-garbage line, you could use this characteristic in your formula to quicken the chase. Perhaps there is always a <TD> in there? The HTML code looks more complicated to human eyes, but it's often more helpful for parsing out relevant bits of data. If you have FileMaker 8.5 Advanced, you can write a spiffy recursive function to parse out the content. If you have the non-Advanced version, you can write a script. And finally you'd use a script to transform the cleaned-up lines into tidy new records in a table. Thanks to all... I had hoped to so something similar to using a perlmod to extract the data, but I looked at the raw html and unfortunately it's not even a table - it's just preformatted text with spaces. By the way, you say you're on a Mac, but that you have to log into IE? Does the web site somehow not work with Safari? Sorry, it must have been confusing. The system that I have to use to extract the data is XP, then I will be (somehow) getting the html or text files to my personal machine (Mac). Our company doesn't allow connecting computers to their net, or even a hard drive, or installing any software, etc. I will be emailing the files home. Also I can't read the stuff from the web from my Mac, as the site is only available from within the company. hence I'm stuck with reading a file. More klunk than I really want. Perhaps the approach to take is to simply test the assumption that the preformatting is consistent, and use brute force to find the "rows" I want to parse. |
#4
| |||
| |||
|
|
The beauty of it is that you wouldn't need to install any software or "connect" your computer to the network!! The web viewer is web compatible... if they allow you to browse an outside site like MSN or FileMaker.com, you can do this. If you want proof-of-concept, send me an email and I'll give you a web link to try. "Mike" <exceptionsTakeMeOut (AT) earthlink (DOT) net> wrote in message news:2007010212500975249-exceptionsTakeMeOut (AT) earthlinknet (DOT) .. On 2007-01-02 08:48:30 -0500, "Bill Marriott" <wjm (AT) wjm (DOT) org> said: I'm surprised no one has mentioned the web viewer functionality yet. This could eliminate much of the hassle of getting the data from the web site. You can point the web viewer at the protected site and log in manually. Or, you could figure out a way to include the credentials within the URL, if possible. Once the data you're interested in is showing in the web viewer, you can click a button to grab the content of that page as the HTML source text. The button would be something like: Set Field [YourField, GetLayoutObjectAttribute ( "wv"; "content" )] Where "wv" is the name you gave the web viewer object in layout mode (using the Object Inspector palette), and YourField is the place where you'd like to put the data. Now with several clicks eliminated, you can direct your calculation skills toward parsing the data. Without knowing the precise format of the information it is very difficult for me to recommend a specific strategy. But it is often easier than you think. For example, if there is something unique about each non-garbage line, you could use this characteristic in your formula to quicken the chase. Perhaps there is always a <TD> in there? The HTML code looks more complicated to human eyes, but it's often more helpful for parsing out relevant bits of data. If you have FileMaker 8.5 Advanced, you can write a spiffy recursive function to parse out the content. If you have the non-Advanced version, you can write a script. And finally you'd use a script to transform the cleaned-up lines into tidy new records in a table. Thanks to all... I had hoped to so something similar to using a perlmod to extract the data, but I looked at the raw html and unfortunately it's not even a table - it's just preformatted text with spaces. By the way, you say you're on a Mac, but that you have to log into IE? Does the web site somehow not work with Safari? Sorry, it must have been confusing. The system that I have to use to extract the data is XP, then I will be (somehow) getting the html or text files to my personal machine (Mac). Our company doesn't allow connecting computers to their net, or even a hard drive, or installing any software, etc. I will be emailing the files home. Also I can't read the stuff from the web from my Mac, as the site is only available from within the company. hence I'm stuck with reading a file. More klunk than I really want. Perhaps the approach to take is to simply test the assumption that the preformatting is consistent, and use brute force to find the "rows" I want to parse. |
#5
| |||
| |||
|
|
Sorry, it must have been confusing. The system that I have to use to extract the data is XP, then I will be (somehow) getting the html or text files to my personal machine (Mac). Our company doesn't allow connecting computers to their net, or even a hard drive, or installing any software, etc. I will be emailing the files home. Also I can't read the stuff from the web from my Mac, as the site is only available from within the company. hence I'm stuck with reading a file. More klunk than I really want. Perhaps the approach to take is to simply test the assumption that the preformatting is consistent, and use brute force to find the "rows" I want to parse. |
#6
| |||
| |||
|
|
text files to my personal machine (Mac) I've had a similar case once. I was able to get html file as you have |
#7
| |||
| |||
|
|
I've had a similar case once. I was able to get html file as you have mentioned. At home I opened Safari, opened desired file in Safari. Table rows and columns were preserved. |
#8
| |||
| |||
|
|
tomislav <tomislav (AT) htnet (DOT) hr> wrote: I've had a similar case once. I was able to get html file as you have mentioned. At home I opened Safari, opened desired file in Safari. Table rows and columns were preserved. if you can, that seems a good solution. and if you need to parse, you can do some things with combinations of Applescript and FMP scripting slow, but can be simple. |
![]() |
| Thread Tools | |
| Display Modes | |
| |