dbTalk Databases Forums  

Help Parsing

comp.databases.filemaker comp.databases.filemaker


Discuss Help Parsing in the comp.databases.filemaker forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
squeed2000@yahoo.com
 
Posts: n/a

Default Help Parsing - 01-31-2007 , 08:40 PM






I am trying to extract some web based directory data. The data is
fairly consistent. An example of the data is posted below. I want
the end results to be.

11th Frame Restaurant & Lounge
1400 Bowling Lane
GRIFFIN
GA
30233
7702292695

If I could see how to parse out just one of the fields based on the
pattern, I could figure out the rest. Any ideas ?

-S



<br><b>Detailed info:</b>
</td>
</tr>
<tr>
<td colspan="4" valign="top"><table width="100%" border="0"
cellspacing="1" cellpadding="3">

<tr>
<td width="18%" class="inputfield">Name:</td>
<td width="82%" class="inputfield">11th Frame Restaurant &
Lounge</td>
</tr>
<tr>
<td class="inputfield">Address:</td>
<td class="inputfield">1400 Bowling Lane</td>

</tr>
<tr>
<td class="inputfield">City:</td>
<td class="inputfield">GRIFFIN</td>
</tr>
<tr>
<td class="inputfield">State:</td>
<td class="inputfield">GA</td>

</tr>
<tr>
<td class="inputfield">Zip:</td>
<td class="inputfield">30223</td>
</tr>
<tr>
<td class="inputfield">Phone:</td>
<td class="inputfield">7702292695</td>

</tr>
</table></td>
</tr>


Reply With Quote
  #2  
Old   
Howard Schlossberg
 
Posts: n/a

Default Re: Help Parsing - 01-31-2007 , 08:53 PM






Here's a custom function that I wrote, which may work for you. Using
this function, you would get the address with:

Parse (WebDataField, "Address:</td> <td
class=\"inputfield\">", "</td>" )

The problem will be in whether the number of blank spaces ever varies.

The function is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~

Parse (OrigText; FindText; EndText) =

Let([
EndText = Substitute(Endtext; "<BR>"; "¶");
Start = Position(Origtext; FindText; 1; 1) + Length(FindText);
End = Case(IsEmpty(EndText); Length(OrigText); Position(OrigText & ¶;
EndText; Start - 1; 1));
Diff = End - Start];

Case(
not PatternCount(OrigText; FindText); "";
TrimFull(Middle(OrigText; Start; Diff)))
)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~

squeed2000 (AT) yahoo (DOT) com wrote:
Quote:
I am trying to extract some web based directory data. The data is
fairly consistent. An example of the data is posted below. I want
the end results to be.

11th Frame Restaurant & Lounge
1400 Bowling Lane
GRIFFIN
GA
30233
7702292695

If I could see how to parse out just one of the fields based on the
pattern, I could figure out the rest. Any ideas ?

-S



<br><b>Detailed info:</b
</td
/tr
tr
td colspan="4" valign="top"><table width="100%" border="0"
cellspacing="1" cellpadding="3"

<tr
td width="18%" class="inputfield">Name:</td
td width="82%" class="inputfield">11th Frame Restaurant &
Lounge</td
/tr
<tr
td class="inputfield">Address:</td
td class="inputfield">1400 Bowling Lane</td

/tr
<tr
td class="inputfield">City:</td
td class="inputfield">GRIFFIN</td
/tr
<tr
td class="inputfield">State:</td
td class="inputfield">GA</td

/tr
<tr
td class="inputfield">Zip:</td
td class="inputfield">30223</td
/tr
<tr
td class="inputfield">Phone:</td
td class="inputfield">7702292695</td

/tr
</table></td
/tr

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Howard Schlossberg (818) 883-2846
FM Professional Solutions, Inc. Los Angeles

FileMaker 8 Certified Developer
Associate Member, FileMaker Solutions Alliance


Reply With Quote
  #3  
Old   
squeed2000@yahoo.com
 
Posts: n/a

Default Re: Help Parsing - 01-31-2007 , 09:12 PM



Howard, thanks, I think this is a little over my head. I'm not even
sure how to implement a custom function. Is there any
other easier way to do this or can you explain more about how to use
this custom function ?

-S


Reply With Quote
  #4  
Old   
Howard Schlossberg
 
Posts: n/a

Default Re: Help Parsing - 01-31-2007 , 09:26 PM



Hmmm -- I think I see you participate so much here that I've assumed you
have FM Pro Advanced. You would need FMPA to create custom functions.

My other assumption is that you are doing this in FM8. If so, then you
can still use my prior Let() statement as a calc in a script, with some
minor modifications, even without custom functions:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~

Let([
OrigText = YourWebDataField;
Findtext = "Address:</td> <td class=\"inputfield\">";
EndText = "</td>";
Start = Position(Origtext; FindText; 1; 1) + Length(FindText);
End = Case(IsEmpty(EndText); Length(OrigText); Position(OrigText & ¶;
EndText; Start - 1; 1));
Diff = End - Start];

Case(
not PatternCount(OrigText; FindText); "";
Middle(OrigText; Start; Diff)
))

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~


squeed2000 (AT) yahoo (DOT) com wrote:
Quote:
Howard, thanks, I think this is a little over my head. I'm not even
sure how to implement a custom function. Is there any
other easier way to do this or can you explain more about how to use
this custom function ?
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Howard Schlossberg (818) 883-2846
FM Professional Solutions, Inc. Los Angeles

FileMaker 8 Certified Developer
Associate Member, FileMaker Solutions Alliance


Reply With Quote
  #5  
Old   
squeed2000@yahoo.com
 
Posts: n/a

Default Re: Help Parsing - 01-31-2007 , 09:50 PM



Howard, I'm running 8.5 Advanced. But I have never used custom
functions before. I've tried cutting and pasting your solution above
into my calc field but Filemaker is trying to understand what "Find
Text" means.

-S


Reply With Quote
  #6  
Old   
Howard Schlossberg
 
Posts: n/a

Default Re: Help Parsing - 01-31-2007 , 10:06 PM



That is a function parameter. You'll need to add the three parameters
to the function definition. I also realized after I'd posted the
original that it includes a TrimFull() function, which is another of my
custom functions. You can probably get away without it, by changing:

TrimFull(Middle(OrigText; Start; Diff))
to
Middle(OrigText; Start; Diff)


squeed2000 (AT) yahoo (DOT) com wrote:
Quote:
Howard, I'm running 8.5 Advanced. But I have never used custom
functions before. I've tried cutting and pasting your solution above
into my calc field but Filemaker is trying to understand what "Find
Text" means.

-S

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Howard Schlossberg (818) 883-2846
FM Professional Solutions, Inc. Los Angeles

FileMaker 8 Certified Developer
Associate Member, FileMaker Solutions Alliance


Reply With Quote
  #7  
Old   
Helpful Harry
 
Posts: n/a

Default Re: Help Parsing - 02-01-2007 , 05:29 PM



In article <1170297630.662771.192600 (AT) s48g2000cws (DOT) googlegroups.com>,
squeed2000 (AT) yahoo (DOT) com wrote:

Quote:
I am trying to extract some web based directory data. The data is
fairly consistent. An example of the data is posted below. I want
the end results to be.

11th Frame Restaurant & Lounge
1400 Bowling Lane
GRIFFIN
GA
30233
7702292695

If I could see how to parse out just one of the fields based on the
pattern, I could figure out the rest. Any ideas ?

snip

Parsing a block of text is usually fairly easy, as long as the text is
ALWAYS standardised - you just have to look for the established markers
(also called delimiters) - but it can get tricky getting it to work
properly since the calculations become so long. These markers /
delimiters should be something that does not appear in the wanted text
itself.

In this case you can see that every piece of data you want is preceeded
by:
class="inputfield">

and followed by:

</td>

As long as none of the wanted text contains these strings we can use
them to chop out what we do want. Using the Position function on these
you can find and extract the appropriate data between the two.

There's a slight hiccup here in that the preceeding text includes quote
/ speech marks which are a little difficult to enter into calculations.
To get around this I've used a Global field called g_InputFieldText
into which I've type the text:

class="inputfield">


The start of the data you want is always given by:

Start = Position(IncomingHTML, g_InputFieldText, 1, (X*2)) + 19

This gives you the position of the first character after the preceeding
text, ie. the position of the start of the preceeding text plus the 19
characters to skip over the preceeding text itself.

The X is the line number of the data line you want (eg. 1 for Name, 2
for Address, 5 for Zip).

Similarly we can find the position of the following </td> text by using
the position of the preceeding text as the starting point and finding
the first </td> we come to:

End = Position(IncomingHTML, "</td>", Position(IncomingHTML,
g_InputFieldText, 1, (X*2)) + 19, 1)

Now that we know the start and end positions of the wanted text we can
use the Middle function to extract the wanted characters by a
calculation of:

Middle (IncomingHTML, Start, End - Start)

which when putting in the Start and End calculation from above becomes
the complicated:

Middle(IncomingHTML, Position(IncomingHTML, g_InputFieldText,
1, (X*2)) + 19, Position(IncomingHTML, "</td>",
Position(IncomingHTML, g_InputFieldText, 1, (X*2)) + 19,
1) - Position(IncomingHTML, g_InputFieldText, 1, (X*2))
- 19)

where again the three X's are the number of the required line.

This works, but there is another slight hiccup here. The IncomingHTML
may have extra unwanted carriage returns in it. To get rid of these we
need to wrap the above calculation inside a Substitute function that
replaces the carriage returns with spaces.
eg.
Substitute(Middle(IncomingHTML, Position(IncomingHTML,
g_InputFieldText, 1, (X*2)) + 19,
Position(IncomingHTML, "</td>",
Position(IncomingHTML, g_InputFieldText, 1,
(X*2)) +19, 1) - Position(IncomingHTML,
g_InputFieldText, 1, (X*2)) - 19), "ret", " ")

where the 'ret' near the end is the backwards P carriage return
character on a button in the define calculation window (you still need
the quote / speech marks around it).


Now, that calculation gives you the wanted text for line X. Whether you
use that in six separate Calculation fields or concatenate them
altogether into one large Calculation field will depend on what you
want to do with the data. Personally I would use six separate fields
since that is more flexible for reporting and makes it easier to spot
where a mistake might be - you can always put them together using
another Calculation field if needed.
ie.
Name & "ret" & Address & "ret" & City & "ret"
& State & "ret" & Zip & "ret" & Phone



Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.