![]() | |
![]() |
| | Thread Tools | Display Modes |
#11
| |||
| |||
|
|
From Newsgroup: comp.databases.pick We are using AIX/d3 7.4. I saved a 25k row xls file to a tab delimited txt file and copied it to AIX, converting the CR/LF to AM. Even though the rows are less than 50 characters, it took approximately 20 seconds to extract a small amount of data (with field command) from each line using non-dim'd read. I thought this process could be done quicker. Thanks, guys, for all of your responses. You might want to look into using %fgets() to read the file one line at a time. |
#12
| |||
| |||
|
#13
| |||
| |||
|
|
Excalibur21 wrote: TL support assure me that 20k is a very short file. *In fact I often use much larger items. You needed TL support to tell you that? ![]() Seriously, it's not just about line count, it's about how wide the lines are too. In D3 Windows the COPY Dos: *followed by Count, DIM and Matparse is exceptionally easy and very quick There's a number of things I'd take issue with there. 1) Why copy data into hashed space only to open and read it, when you can just open and read it from OS space? *You've virtually doubled the processing time for no reason, especially if your hashed files are only 4k (default) and you need to go through the pain of frame linkage. 2) Why DIM then matparse into a dynamic array when you can just read an item into a dynamic array? 3) Maybe you meant MatRead, in which case you can skip the count and related Dim and just do this: * DIM BLOCK() * MATREAD BLOCK FROM ... The MatRead will automatically dimension the block array for you. For reference, I've used %read on files that are tens of gigabytes in size with hundreds of thousands of wide lines, blocking with small buffers in a manner similar to what I described in my other post on this topic. *(Be sure to convert (CR)LF to @AM) The only real way to work out the "best" method of processing files like this is to try a few methods and see if the performance is reasonable for your specific application. *You might find that a simple Read with a ForNext though attributes is fine. *You will certainly find that with large strings, FlashBASIC is MUCH better than non-flashed code, maybe good enough to preclude anything but the most simple approach. T |
#14
| |||||||
| |||||||
|
|
As usual a totally ill conceived rant. |
|
When are you going to get your head around the DOS: operation which converts CRLF to AM at lightning speed. |
|
Why use DIM? The fact that a dimensioned matrix operation is vastly faster than an extract |
|
Why bring it into D3 - to save it for permanent record and possible further analysis of course. |
|
As for %open that only became reliable in version 9.1. I had to downgrade it in a hurry when it started bombing a clients user count. |
|
Why specify the DIM? for clarity for future maintenance plus one needs the count to process the array. |
|
As for 25 seconds to process the request that is abysmal and indicates something else astray. |
#15
| |||
| |||
|
|
Excalibur21 wrote: As usual a totally ill conceived rant. Your recommendation: COPY DOS:/PATH/FILE.EXT TO: (HASHED.FILE OPEN "HASHED.FILE" TO FV ELSE STOP READ ITEM FROM FV,"FILE.EXT" ELSE STOP DIM ARRAY() CT = DCOUNT(ITEM,@AM) DIM ARRAY(CT) MATPARSE ARRAY FROM ITEM My recommendation: OPEN "DOS:/PATH" TO FV ELSE STOP DIM ARRAY() MATREAD ARRAY FROM FV,"FILE.EXT" ELSE STOP Your TCL command plus 6 line program just got reduced to 3 lines. Now, if that doesn't work, say so, but "totally ill conceived rant"? When are you going to get your head around the DOS: operation which converts CRLF to AM at lightning speed. Get my head around it? *I use and recommend using OSFI all the time. I'm the only one here who has tried for years to get PS/RD/TL to make better use of it. You didn't notice that my recommendation to "read it from OS space" implied using DOS: as shown above. Why use DIM? The fact that a dimensioned matrix operation is vastly faster than an extract I didn't say "Why use DIM?". *My sentence was longer. *Read what I wrote then look at my code. *I'm happy to be corrected but please correct something I actually said. Why bring it into D3 - to save it for permanent record and possible further analysis of course. Of course? *You just changed the definition of the task and invalidated most of the suggestions in this thread. And pulling a large item into frame space could just create unnecessary burden on the system, especially for file-saves. *IF that were a part of the task definition, you could consider just leaving it in the host OS "for permanent record and possible further analysis" ... of course. As for %open that only became reliable in version 9.1. *I had to downgrade it in a hurry when it started bombing a clients user count. If you've found an environment-specific bug, thanks for reporting it here. *But we're talking about how to use the technology. *A release or platform-specific issue doesn't negate the concept. *"Only became reliable in version 9.1"? *Seriously? *%open has been a part of the system for over 15 years. Oh yeah, you do realize that 9.1 doesn't exist yet, right? Why specify the DIM? for clarity for future maintenance plus one needs the count to process the array. Again, you didn't read the rest of my sentence, and no, you don't need a count, see my code above. As for 25 seconds to process the request that is abysmal and indicates something else astray. Well, you're not responding to my post there. *I think you're responding to Rob. *It sounds to me like the following factors are affecting his performance: 1) Not flashed. 2) 20 seconds (not 25 according to Rob) seems to include item read time which should be separated from processing time. 3) Sequential index through 20 thousand attributes is always going to be painful, try using Delete(var,1) trick and always operate on atb1. Or use one of the others discussed here. 4) Rather than extract and multiple field statements on each line consider converting delimiters (commas?) to system delimiters (@vm) and the pain of parsing might be reduced by referencing dynamic array elements. One could also read in the whole block, then maybe for/next through the block to break it into something like 5 blocks of 4000 lines (watching not to break lines). *The pain of reaching out from atb1 to 4001, 4002, 4003 ... 19999... will be reduced because there will only be 4000 attributes per block. *The overhead of busting up the block is trivial compared to parsing every single time through the entire block. There are so many ways to skin this cat. *But citing any more could be perceived as a totally ill conceived rant, as usual... T |
#16
| |||
| |||
|
![]() |
| Thread Tools | |
| Display Modes | |
| |