dbTalk Databases Forums  

Sorting a dynamic array

comp.databases.pick comp.databases.pick


Discuss Sorting a dynamic array in the comp.databases.pick forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
michael@preece.net
 
Posts: n/a

Default Sorting a dynamic array - 09-28-2005 , 08:12 PM






Hi

I'm interested in finding out the quickest and simplest way to sort a
dynamic array. Googling cdp has been interesting but inconclusive. I
did find this by Robert Herbin : http://tinyurl.com/8d72g which was
interesting.

I'd like to write a subroutine that can be called with 3 input
arguments:
ARG(1) = Array to be sorted (in which each attribute represents a line
to be sorted, each consisting of a number of fields delimited by value
marks - column 1 being mv 1 in all atts, etc.)
ARG(2) = Multivalued list of columns on which sort is to operate
ARG(3) = Associated mv'd list of sort types
("AR"/"ARS"/"AL"/"DR"/"DRS"/"DL")
and one output argument:
ARG(4) = Sorted array

Note the requirement to sort BY...BY...BY...etc..

If we work together on this hopefully we can come up with a generic
subroutine that can be called with confidence on any MV platform to
work most efficiently and simply whatever the platform and size of the
array. Everybody wins.

There are a number of ways to go.

1. Write code using the common or garden nested
LOCATE...BY...ELSE...INSERT construct. I believe this would be slow -
especially if the array is large.

2. Write an alternative using a variation on the QuickSort code posted
several times before to this ng (google for "QuickSort"). Quicker than
LOCATEs I believe.

3. Write a null item for each line out to a hashed work-file, with an
item id composed of just the columns to be sorted, each column
delimited by a character that is not present in the original array,
create dictionary items for each partial id, and do
SSELECT...LOOP...READNEXT etc. code to build a sorted array.
Cumbersome, probably quicker for large arrays - but probably not
quickest. Note that Doug Dumitru had some good thoughts on this
approach a while back : http://tinyurl.com/cky94

4. Swap value marks to tabs and attribute marks for line-feeds and
write the array out to an OS file, construct and execute a unix sort
command, read the sorted array/file in and pass it back. I suspect this
would be quick (possibly quickest) with large arrays. (Btw - My Unix
skills are not great I'm afraid. From my interpretation of the man
pages I would construct a command something like this :
"sort -T /udttmp -o myDIR/mySortedFileForPortnnnn -k1,1 -k2,2 -k3,3
-k4,4n myDIR/myWorkFileForPortnnnn"
It looks pretty ugly to me (as do most *nix commands imo) and I would
welcome suggestions on how to improve it.) Thinking more about this,
maybe it would be better to write the data to disk before calling the
subroutine and just pass the file-path instead, and read the sorted
list after returning. I'm thinking that the array could potentially be
huge and it might be inefficient to pass it back and forth. Efficiency
vs ease-of-use??

Has anyone done benchmarks to find out which of these alternatives is
best with arrays of various sizes? It would also be interesting to find
out which variation performs best on different platforms. I'm thinking
of the efficiency of the LOOP...REMOVE construct in U2 in particular.

Maybe if we come up with the perfect subroutine we could post the
resultant source to PickSource or somewhere.

Cheers
Mike.


Reply With Quote
  #2  
Old   
Scott Ballinger
 
Posts: n/a

Default Re: Sorting a dynamic array - 09-28-2005 , 08:29 PM






D3 already has a basic sort function. Very fast. On newer releases
(>7.0) it works like locate, with AR/AL/DR/DL options (despite the
WARNINGS in the REF). Here is the REF entry:

d3.doc basic.sort
token SORT
syntax SORT( string.expression )
SORT( string.expression, ac.expression, vc.expression,
sequence.expression )
category BASIC
type Function
terse Sorts a string of attributes or values.
desc sorts an attribute or value mark delimited string.expression in
ascending order.

If the expression contains both attribute and value marks,
this
function replaces all marks with the type first encountered.

On release 7.0 and above, a second sort function is available
that works in a manner similar to the "locate" statement. On
this expanded version, the user may specify in which
attribute,
and or value the list resides as well as the sort sequence to
use.
options
see.also statements & functions (BASIC)
u1072 (BASIC)

LOCATE (BASIC)
example equ vm to char(253)
list='zebra':vm:'albatross':vm:'gooney bird':vm:'elephant'
newlist=sort(list)

The variable "newlist" contains the string:

albatross]elephant]gooney bird]zebra

The bracket ( ] ) represents a value mark.
warnings The sort function performs a left-justified string sort only.
For example:

vm = char(253)
string = "1" : vm : "2" : vm : "11" : vm : "100"
crt "before " : string
string = sort(string)
crt "after " : string

displays:

before 1]2]11]100

string = "1" : vm : "2" : vm : "11" : vm : "100"
crt "before " : string
string = sort(string)
crt "after " : string

displays:

before 1]2]11]100
after 1]100]11]2

vm = char(253)
string = "-1": vm: "-21": vm: "-3"
crt "before ": string
string = sort(string)
crt "after ": string

displays:

before -1]-21]-3
after -1]-21]-3
compat D3 7.0
AP
print.rules



/Scott Ballinger
Pareto Corporation
Edmonds WA USA
206 713 6006

michael (AT) preece (DOT) net wrote:
Quote:
Hi

I'm interested in finding out the quickest and simplest way to sort a
dynamic array. Googling cdp has been interesting but inconclusive. I
did find this by Robert Herbin : http://tinyurl.com/8d72g which was
interesting.

I'd like to write a subroutine that can be called with 3 input
arguments:
ARG(1) = Array to be sorted (in which each attribute represents a line
to be sorted, each consisting of a number of fields delimited by value
marks - column 1 being mv 1 in all atts, etc.)
ARG(2) = Multivalued list of columns on which sort is to operate
ARG(3) = Associated mv'd list of sort types
("AR"/"ARS"/"AL"/"DR"/"DRS"/"DL")
and one output argument:
ARG(4) = Sorted array

Note the requirement to sort BY...BY...BY...etc..

If we work together on this hopefully we can come up with a generic
subroutine that can be called with confidence on any MV platform to
work most efficiently and simply whatever the platform and size of the
array. Everybody wins.

There are a number of ways to go.

1. Write code using the common or garden nested
LOCATE...BY...ELSE...INSERT construct. I believe this would be slow -
especially if the array is large.

2. Write an alternative using a variation on the QuickSort code posted
several times before to this ng (google for "QuickSort"). Quicker than
LOCATEs I believe.

3. Write a null item for each line out to a hashed work-file, with an
item id composed of just the columns to be sorted, each column
delimited by a character that is not present in the original array,
create dictionary items for each partial id, and do
SSELECT...LOOP...READNEXT etc. code to build a sorted array.
Cumbersome, probably quicker for large arrays - but probably not
quickest. Note that Doug Dumitru had some good thoughts on this
approach a while back : http://tinyurl.com/cky94

4. Swap value marks to tabs and attribute marks for line-feeds and
write the array out to an OS file, construct and execute a unix sort
command, read the sorted array/file in and pass it back. I suspect this
would be quick (possibly quickest) with large arrays. (Btw - My Unix
skills are not great I'm afraid. From my interpretation of the man
pages I would construct a command something like this :
"sort -T /udttmp -o myDIR/mySortedFileForPortnnnn -k1,1 -k2,2 -k3,3
-k4,4n myDIR/myWorkFileForPortnnnn"
It looks pretty ugly to me (as do most *nix commands imo) and I would
welcome suggestions on how to improve it.) Thinking more about this,
maybe it would be better to write the data to disk before calling the
subroutine and just pass the file-path instead, and read the sorted
list after returning. I'm thinking that the array could potentially be
huge and it might be inefficient to pass it back and forth. Efficiency
vs ease-of-use??

Has anyone done benchmarks to find out which of these alternatives is
best with arrays of various sizes? It would also be interesting to find
out which variation performs best on different platforms. I'm thinking
of the efficiency of the LOOP...REMOVE construct in U2 in particular.

Maybe if we come up with the perfect subroutine we could post the
resultant source to PickSource or somewhere.

Cheers
Mike.


Reply With Quote
  #3  
Old   
michael@preece.net
 
Posts: n/a

Default Re: Sorting a dynamic array - 09-29-2005 , 03:32 AM



Can you do me a favour (I don't have access to a D3 machine atm) and
check it for speed against SSELECT and *nix "sort" using arrays of
different sizes? It would be great for those working with D3 if it
proved to be fastest regardless of the size of the array. Also - has it
got the ARS/DRS functionality - for dealing with signed and floating
point numbers? I suppose it's faster than QuickSort - is it?

I'm working on UniData at the moment.

I should have some code to post for cutting/pasting/compiling for
benchmarks soon... Oh, and the code will cater for sorts down to 6
levels (SORT...BY...BY...BY...BY...BY...BY). I'll be looking forward
for people's criticisms.

Cheers,
Mike.


Reply With Quote
  #4  
Old   
Tony Gravagno
 
Posts: n/a

Default Re: Sorting a dynamic array - 09-29-2005 , 05:37 AM



Mike, if I may just offer one design criticism: I think it's best to
use attributes as columns, and values as rows, unless you have some
ungodly number of rows, in which case this doesn't belong in a single
item anyway. You'll recognize this as being more in line with how we
use attributes as columns, etc...

BTW, this is yet another great project for mvDevCentral.com.

You don't need source code to start a project. Just start a project,
create a forum, and you can discuss your project with other interested
parties in the forum. You can use your Project Home Page as a
reference to draw people's interest. You can post the latest code on
your homepage or you can use the CVS and Files functionality so
everyone can see and download the latest version of the project. Not
worth a real project? Use the Code Snippets function to just post and
forget some proggies that you love and want to share.

Ref your other thread - consider a series of links as a project and
let people contribute via a project forum. This is how Yahoo got
started. I remember when they were just a single page of "cool
links", and The Amazing FishCam was right at the top. I remember
using NCSA Mosaic, the predecessor to Netscape/Mozilla, on my Windows
3.1 over a 2400baud dialup (much better than the 110 baud dialups back
in school). I remember using Gopher and WAIS and ... oh, I'm sorry, I
drifted off. Anyway, lots of good reasons to use mvDevCentral.com.

T

michael preece wrote:
Quote:
I'm interested in finding out the quickest and simplest way to sort a
dynamic array. [snip]
Maybe if we come up with the perfect subroutine we could post the
resultant source to PickSource or somewhere.


Reply With Quote
  #5  
Old   
michael@preece.net
 
Posts: n/a

Default Re: Sorting a dynamic array - 09-29-2005 , 06:04 PM



Quote:
I think it's best to use attributes as columns, and values as rows, unless you have some ungodly number of rows, in which case this doesn't belong in a single item anyway.
You're right (again) of course, but...

The requirements in front of me atm are to deal with arrays like this:

London]John Smith]Closed]3
Sydney]Adam Jones]Open]1
Mumbai]Dipak Patel]Closed]5
Auckland]John Gray]Closed]1
....etc - and to sort by column 1 by column 2 by column 3

and...

220]Ian Jackson]Closed]A]Joe Bloggs
221]Ian Jackson]Open]C]Scott Ball
186]Mike Kezman]Closed]B]Joe Delany
429]Ian Jackson]Closed]D]Scott Ball
....etc - and to sort by column 3 by column 4

and other similarly structured arrays with various sort order
requirements.

The data doesn't actually originate from a Pick file. There is no limit
on the number of lines/rows I'll have to deal with. I had a very
similar requirement once before when, coincidentally, the data also
didn't originate from a Pick file. In the other situation the data
originated from scanned forms. This time it's coming from a front-end
VB app. In both situations the arrays could potentially be huge.

If the data was in the form:

London]Sydney]Mumbai]Auckland]...etc
John Smith]Adam Jones]Dipak Patel]John Gray]...etc
Closed]Open]Closed]Closed]...etc
3]1]5]1]...etc

which I agree would be the most likely form if the data came from a
Pick file, the task of sorting by city by name by status is also an
interesting problem - but one which I can't address just now.

I'll check out mvDevCentral.com soon, but right now I must get on.
http://groups.google.com.au/group/comp.databases.pick is my homepage on
my desktop here at work and after this quick visit I'm off to get some
work done. [sighs - Gee it would be nice to have a computer at home
again. One day soon].

Cheers
Mike.



Reply With Quote
  #6  
Old   
GVP
 
Posts: n/a

Default Re: Sorting a dynamic array - 09-30-2005 , 12:37 AM



Fof unique sorting, You can use quicSort method (Niterations ~ Ln(n)*n)
see on PickSource.com . You can modify it and use any order and
comparision. If array is not dimensioned and have huge elements or size
then You can using sorting by references array (i.e. actualy sorts
array that contain indexes to huge array). Other way for very huge
array(file) is sorting array of ItemsId in dimensioned array.

Regards, Grigory.


Reply With Quote
  #7  
Old   
Tony Gravagno
 
Posts: n/a

Default Re: Sorting a dynamic array - 09-30-2005 , 07:58 PM



I'm not too keen on solutions that require reformatting, but since
this data doesn't come from MV in the first place, why can't you
reformat the data and then use indexing to get your optimization?

Regarding your statement "There is no limit on the number of
lines/rows I'll have to deal with": that scares me because it means
anytime you read such a record you're going to face lots of issues
with frame chasing, memory shuffling, etc. I really recommend you
reorganize that data and then delete these unmanagable items. You're
trying to apply Pick data management structures to what is essentially
a blob of data. You can probably brute-force some solution but as you
well know, this isn't the sort of data that the system was designed to
work on.

Use entirely unique, non-numeric keys, with each MV attribute that you
document below containing the data from the MV's. Or (I don't like
this idea, but here it is) you can convert spaces to ! or something
similar and just create key-only files composed of all of your
delimited data. A simple index on all fields will allow very fast
sorting, and you can even create complex indexes of common fields so
you can get very fast By-B,By-C organization.

Sometimes the solution to a problem is avoiding the problem in the
first place. Maybe I'm being too simplistic, sorry.

T

michael (AT) preece (DOT) net wrote:

Quote:
I think it's best to use attributes as columns, and values as rows, unless you have some ungodly number of rows, in which case this doesn't belong in a single item anyway.

You're right (again) of course, but...

The requirements in front of me atm are to deal with arrays like this:

London]John Smith]Closed]3
Sydney]Adam Jones]Open]1
Mumbai]Dipak Patel]Closed]5
Auckland]John Gray]Closed]1
...etc - and to sort by column 1 by column 2 by column 3

and...

220]Ian Jackson]Closed]A]Joe Bloggs
221]Ian Jackson]Open]C]Scott Ball
186]Mike Kezman]Closed]B]Joe Delany
429]Ian Jackson]Closed]D]Scott Ball
...etc - and to sort by column 3 by column 4

and other similarly structured arrays with various sort order
requirements.

The data doesn't actually originate from a Pick file. There is no limit
on the number of lines/rows I'll have to deal with. I had a very
similar requirement once before when, coincidentally, the data also
didn't originate from a Pick file. In the other situation the data
originated from scanned forms. This time it's coming from a front-end
VB app. In both situations the arrays could potentially be huge.

If the data was in the form:

London]Sydney]Mumbai]Auckland]...etc
John Smith]Adam Jones]Dipak Patel]John Gray]...etc
Closed]Open]Closed]Closed]...etc
3]1]5]1]...etc

which I agree would be the most likely form if the data came from a
Pick file, the task of sorting by city by name by status is also an
interesting problem - but one which I can't address just now.

I'll check out mvDevCentral.com soon, but right now I must get on.
http://groups.google.com.au/group/comp.databases.pick is my homepage on
my desktop here at work and after this quick visit I'm off to get some
work done. [sighs - Gee it would be nice to have a computer at home
again. One day soon].

Cheers
Mike.


Reply With Quote
  #8  
Old   
michael@preece.net
 
Posts: n/a

Default Re: Sorting a dynamic array - 10-04-2005 , 01:41 AM



Yes - the format (line items - rows - delimited by CR containing fields
- columns - delimited by tabs) is not a natural fit in Pick but is
pretty much everywhere else. Not surprisingly, my benchmarks showed
what you (and I) would have expected - that a Unix !sort was by far and
away the most efficient way to deal with it. I didn't get around to
trying CREATE.INDEX (and BUILD.INDEX on UniData) for the file in the
SSELECT test yet. Maybe someone else will have a go - although I'd be
surprised if that beat the !sort. Here are the results I got:

SORT.BENCHMARK

No. of Sort Initialisation Method 1 Method 2 Method 3
Method 4
Items Depth LOCATEs QuickSort SSELECT
UNIX !sort


1000 1 2 115 153 252
34
1000 2 2 141 165 231
34
1000 3 2 157 170 258
36
--------------------------------------------------------------------------------------------
1000 4 2 187 181 282
36
1000 5 2 207 231 255
39
2000 1 2 414 298 379
45
2000 2 3 508 324 420
43
2000 3 3 568 334 404
49
2000 4 3 683 402 440
50
2000 5 3 747 373 428
58
3000 1 4 903 498 604
60
3000 2 4 1231 553 685
85
3000 3 4 1267 526 611
59
3000 4 4 1499 556 633
65
3000 5 4 1664 577 1089
167
4000 1 6 1554 667 821
79
4000 2 5 1940 722 852
69
4000 3 5 2185 756 851
74
4000 4 6 2615 783 948
79
4000 5 5 2904 804 948
96
5000 1 6 48614 865 1115
117
5000 2 7 2993 946 1106
81
5000 3 6 3447 979 1143
88
5000 4 7 4163 984 1137
93
5000 5 6 4863 1042 1282
109
6000 1 7 3546 1115 1402
115
6000 2 7 4347 1159 1414
93
6000 3 7 4916 1233 1459
98
6000 4 8 6257 1253 1495
111
6000 5 8 6461 1287 1514
135
7000 1 8 4866 1192 1791
133
7000 2 8 6056 1423 1757
110
7000 3 8 6587 1362 1873
155
7000 4 9 8085 1416 1762
120
7000 5 9 9479 1453 1822
144
8000 1 9 6281 1440 2347
151
8000 2 10 7463 1640 2030
113
8000 3 9 8898 1623 2167
123
8000 4 10 10196 1746 2090
132
8000 5 10 11695 1765 2321
156
9000 1 10 8189 1695 2434
166
9000 2 11 9807 1807 2411
126
9000 3 11 10773 1874 2581
137
9000 4 11 19348 2005 2540
146
9000 5 10 14811 1973 2569
175
10000 1 12 9748 1847 2782
263
10000 2 12 11545 2029 2807
134
10000 3 12 13153 2040 2886
146
10000 4 12 16000 2171 2930
154
10000 5 12 18282 2265 3081
190

The very first line here shows that with just 1000 items to sort - by
just a single column - the UNIX !sort was already the most efficient
method. By the time it got to sorting 10,000 items by 5 levels (BY COL1
BY COL2 BY COL3 BY COL4 BY COL5) it was somewhere around 100 times as
quick. QuickSort stood up pretty well in that it turned out to be
quicker than a SSELECT up to the limits of my tests.

Here's the code I used:

Note that it was written and tested on UniData - so expect to have to
make changes if you want to compile and run it on another Pick flavour.
Note also the technique I used to use LOCATEs down to 3 levels below
subvalue level. I welcome suggestions for improvements generally and
this might be one of many areas that can be improved upon. The modified
version of QUICK.SORT allows the dimensioned array to be sorted by any
or all of up to 6 columns (where each element represents a line
containing up to 6 line/column cells). The code used to create the OS
sequential file for the !sort is also specifically UniBasic and would
have to be modified according to MV/Pick flavour.

Oh.. and those of you that like to criticise what they see as bad code
or poor programming practice - don't hold back. Let me have it. We can
all learn and improve, except that if you're one of those that can't
stand the use of EXIT and CONTINUE in loops - I'm sorry to upset you
but I like them. The main point of this thread is so that we can end up
with the most efficient mechanism to sort lists. I realise the type of
lists I'm dealing with here are not the usual Pick structure - in that
I have up to 6 multivalues in a larger number of attributes, rather
than up to 6 attributes each containing a larger number of multivalues
- but I hope to get on to that next.

BP: SORT.BENCHMARK
===================
* This constructs the test array and calls the subroutine to test the
various sort methods:
*

* Construct a dynamic array containing 10,000 lines.

*

* Each line to have 5 multivalues.

* MV 1, representing column 1, will be a signed integer in the range
-9999 to 9999

* Column 2 will be an unsigned integer in the range 0 to 9999

* Column 3 will be a signed number in the range -9999.9999 to 9999.9999

* Column 4 will be an unsigned number in the range 0.1 to 9999.9999

* Column 5 will be a string of between 1 and 5 printable characters

*

POSNEG='-'

UNSORTED.ARRAY=''

DIM UNSORTED.MAT(10000)

MAT UNSORTED.MAT=''

FOR X=1 TO 10000

SIGN=RND(2)+1 ;* 1=negative / 2=positive (no sign)

UNSORTED.MAT(X)<1,1>=POSNEG[SIGN,1]:RND(10000) ;* Signed integer

UNSORTED.MAT(X)<1,2>=POSNEG[SIGN,1]:RND(10000):'.':RND(9999)+1 ;*
Signed numeric
UNSORTED.MAT(X)<1,3>=RND(10000) ;* Unsigned integer

UNSORTED.MAT(X)<1,4>=RND(10000):'.':RND(9999)+1 ;* Unsigned numeric

CHCNT=RND(5)+1 ;* Number of characters (1-5)

FOR Y=1 TO CHCNT

UNSORTED.MAT(X)<1,5>:=CHAR(32+RND(94)) ;* ASCII 32-125 (printable
chars)
NEXT Y

NEXT X

*

SORT.TYPES='AR'

SORT.TYPES<2>='DR'

SORT.TYPES<3>='AR'

SORT.TYPES<4>='DR'

SORT.TYPES<5>='AL'

* Note that some of the above codes will vary according to MV flavour -

* D3 supports ARS & DRS for sorting numerics with signs and/or decimals

* UV supports AN & DN (ditto)

*

CRT 'No. of Sort Initialisation Method 1 Method 2 Method 3
Method 4'

CRT 'Items Depth LOCATEs QuickSort SSELECT
UNIX !sort'
CRT

FOR ARRAY.SIZE=1000 TO 10000 STEP 1000

MATBUILD UNSORTED.ARRAY FROM UNSORTED.MAT,1,ARRAY.SIZE

COLS.TO.SORT.BY=''

SORT.BY=''

*

* Tests to be run for sorting by 1 column...

* then by column 1 by column 2...

* then by column 1 by column 2 by column 3...
etc.
*

FOR NO.OF.COLS.TO.SORT.BY=1 TO 5

COLS.TO.SORT.BY<1,NO.OF.COLS.TO.SORT.BY>=NO.OF.COL S.TO.SORT.BY

SORT.BY<1,NO.OF.COLS.TO.SORT.BY>=SORT.TYPES<NO.OF. COLS.TO.SORT.BY>

CALL TEST.SORT.METHODS(UNSORTED.ARRAY, COLS.TO.SORT.BY, SORT.BY,
SORTED.ARRAY)

NEXT NO.OF.COLS.TO.SORT.BY

NEXT ARRAY.SIZE

STOP

BP: TEST.SORT.METHODS
=====================

SUBROUTINE TEST.SORT.METHODS(UNSORTED.ARRAY, COLS.TO.SORT.BY, SORT.BY,
SORTED.ARRAY)
* UNSORTED.ARRAY is a dynamic array in which each attribute represents
a line and each multivalue a column/line cell
* COLS.TO.SORT.BY is a multivalued list of the columns UNSORTED.ARRAY
is to be sorted by
* - so to sort by column 4 by column 2 by column 3, it would be 4]2]3
* SORT.BY is an associated multivalued list of the type of sort for
each column (eg. AR]AL]AR)
* SORTED.ARRAY is the resultant sorted dynamic array to be returned
CHECKPOINT.0=SYSTEM(12)+1

*

SORTED.ARRAY=''

*

NO.OF.COLS=DCOUNT(COLS.TO.SORT.BY,@VM)

IF NO.OF.COLS>6 THEN

CRT 'This routine can only cope with sorts down to 6 levels. Sorry.'

DEBUG

RETURN

END

NO.OF.TYPES=DCOUNT(SORT.BY,@VM)

IF NO.OF.COLS#NO.OF.TYPES THEN

CRT 'Number of COLS.TO.SORT.BY (':NO.OF.COLS:') differs from Number of
SORT.BY (':NO.OF.TYPES:')'

DEBUG

RETURN

END

*
LINE.COUNT=DCOUNT(UNSORTED.ARRAY,@AM)

DIM UNSORTED.MAT(LINE.COUNT)

MATPARSE UNSORTED.MAT FROM UNSORTED.ARRAY,@AM

*

REFERENCE.ARRAY=''

*

DIM WORK.MAT(6)

MAT WORK.MAT=''

*************************

*

*

EXECUTE 'UDT.OPTIONS 85 ON' CAPTURING CRTOUT

CHECKPOINT.1=SYSTEM(12) ;* Time in milliseconds

CRT LINE.COUNT 'R#6':NO.OF.COLS 'R#8'CHECKPOINT.1-CHECKPOINT.0+1)
'R#17':
*

* Method 1 : Nested LOCATE...BY...ELSE...INSERT construct.

*

FOR LC=1 TO LINE.COUNT

ARRAY.LINE=UNSORTED.MAT(LC)

FOR COL.NO=1 TO NO.OF.COLS

FIELD.DATA=ARRAY.LINE<1,COLS.TO.SORT.BY<1,COL.NO>>

TYPE.OF.SORT=SORT.BY<1,COL.NO>

IF COL.NO=NO.OF.COLS THEN
MUST.INSERT=1

END ELSE

MUST.INSERT=0

END

NEW.FIELD=0

BEGIN CASE

CASE COL.NO=1

* LOCATE FIELD.DATA IN WORK.MAT(1),1 BY TYPE.OF.SORT SETTING AMPOS
ELSE NEW.FIELD=1

*** UniData version of LOCATE in use - change as reqd

LOCATE FIELD.DATA IN WORK.MAT(1)<1> BY TYPE.OF.SORT SETTING AMPOS
ELSE NEW.FIELD=1

IF MUST.INSERT OR NEW.FIELD THEN

INS FIELD.DATA BEFORE WORK.MAT(1)<AMPOS>

FOR OTHER.COLS=2 TO NO.OF.COLS

INS ARRAY.LINE<1,OTHER.COLS> BEFORE WORK.MAT(OTHER.COLS)<AMPOS>

NEXT OTHER.COLS

INS LC BEFORE REFERENCE.ARRAY<AMPOS>

EXIT

END

CASE COL.NO=2

* LOCATE FIELD.DATA IN WORK.MAT(2)<AMPOS>,1 BY TYPE.OF.SORT SETTING
VMPOS ELSE NEW.FIELD=1
*** UniData version of LOCATE in use - change as reqd

LOCATE FIELD.DATA IN WORK.MAT(2)<AMPOS,1> BY TYPE.OF.SORT SETTING
VMPOS ELSE NEW.FIELD=1

IF MUST.INSERT OR NEW.FIELD THEN

INS FIELD.DATA BEFORE WORK.MAT(2)<AMPOS,VMPOS>

FOR OTHER.COLS=3 TO NO.OF.COLS

INS ARRAY.LINE<1,OTHER.COLS> BEFORE
WORK.MAT(OTHER.COLS)<AMPOS,VMPOS>
NEXT OTHER.COLS

INS LC BEFORE REFERENCE.ARRAY<AMPOS,VMPOS>

EXIT

END

CASE COL.NO=3

* LOCATE FIELD.DATA IN WORK.MAT(3)<AMPOS,VMPOS>,1 BY TYPE.OF.SORT
SETTING SVMPOS ELSE NEW.FIELD=1

*** UniData version of LOCATE in use - change as reqd

LOCATE FIELD.DATA IN WORK.MAT(3)<AMPOS,VMPOS,1> BY TYPE.OF.SORT
SETTING SVMPOS ELSE NEW.FIELD=1

IF MUST.INSERT OR NEW.FIELD THEN

INS FIELD.DATA BEFORE WORK.MAT(3)<AMPOS,VMPOS,SVMPOS>

FOR OTHER.COLS=4 TO NO.OF.COLS

INS ARRAY.LINE<1,OTHER.COLS> BEFORE
WORK.MAT(OTHER.COLS)<AMPOS,VMPOS,SVMPOS>

NEXT OTHER.COLS
INS LC BEFORE REFERENCE.ARRAY<AMPOS,VMPOS,SVMPOS>

EXIT

END

CASE COL.NO=4

UNPACKED=CONVERT(CHAR(251),@AM,WORK.MAT(4)<AMPOS,V MPOS,SVMPOS>)

* LOCATE FIELD.DATA IN UNPACKED,1 BY TYPE.OF.SORT SETTING POS251 ELSE
NEW.FIELD=1

*** UniData version of LOCATE in use - change as reqd

LOCATE FIELD.DATA IN UNPACKED<1> BY TYPE.OF.SORT SETTING POS251 ELSE
NEW.FIELD=1

IF MUST.INSERT OR NEW.FIELD THEN

INS FIELD.DATA BEFORE UNPACKED<POS251>

WORK.MAT(4)<AMPOS,VMPOS,SVMPOS>=CONVERT(@AM,CHAR(2 51),UNPACKED)

FOR OTHER.COLS=5 TO NO.OF.COLS


UNPACKED=CONVERT(CHAR(251),@AM,WORK.MAT(OTHER.COLS )<AMPOS,VMPOS,SVMPOS>)

INS ARRAY.LINE<1,OTHER.COLS> BEFORE UNPACKED<POS251>


WORK.MAT(OTHER.COLS)<AMPOS,VMPOS,SVMPOS>=CONVERT(@ AM,CHAR(251),UNPACKED)

NEXT OTHER.COLS

UNPACKED=CONVERT(CHAR(251),@AM,REFERENCE.ARRAY<AMP OS,VMPOS,SVMPOS>)

INS LC BEFORE UNPACKED<POS251>

REFERENCE.ARRAY<AMPOS,VMPOS,SVMPOS>=CONVERT(@AM,CH AR(251),UNPACKED)

EXIT

END

UNPACKED=CONVERT(CHAR(251):CHAR(250),@AM:@VM,WORK. MAT(5)<AMPOS,VMPOS,SVMPOS>)


* LOCATE FIELD.DATA IN UNPACKED<POS251>,1 BY TYPE.OF.SORT SETTING
POS250 ELSE NEW.FIELD=1

*** UniData version of LOCATE in use - change as reqd

LOCATE FIELD.DATA IN UNPACKED<POS251,1> BY TYPE.OF.SORT SETTING
POS250 ELSE NEW.FIELD=1

IF MUST.INSERT OR NEW.FIELD THEN

INS FIELD.DATA BEFORE UNPACKED<POS251,POS250>


WORK.MAT(5)<AMPOS,VMPOS,SVMPOS>=CONVERT(@AM:@VM,CH AR(251):CHAR(250),UNPACKED)


FOR OTHER.COLS=6 TO NO.OF.COLS


UNPACKED=CONVERT(CHAR(251):CHAR(250),@AM:@VM,WORK. MAT(OTHER.COLS)<AMPOS,VMPOS,SVMPOS>)

INS ARRAY.LINE<1,OTHER.COLS> BEFORE UNPACKED<POS251,POS250>


WORK.MAT(OTHER.COLS)<AMPOS,VMPOS,SVMPOS>=CONVERT(@ AM:@VM,CHAR(251):CHAR(250),UNPACKED)

NEXT OTHER.COLS


UNPACKED=CONVERT(CHAR(251):CHAR(250),@AM:@VM,REFER ENCE.ARRAY<AMPOS,VMPOS,SVMPOS>)


INS LC BEFORE UNPACKED<POS251,POS250>


REFERENCE.ARRAY<AMPOS,VMPOS,SVMPOS>=CONVERT(@AM:@V M,CHAR(251):CHAR(250),UNPACKED)
EXIT

END

CASE COL.NO=6


UNPACKED=CONVERT(CHAR(251):CHAR(250):CHAR(249),@AM :@VM:@SVM,WORK.MAT(6)<AMPOS,VMPOS,SVMPOS>)

* LOCATE FIELD.DATA IN UNPACKED<POS251,POS250>,1 BY TYPE.OF.SORT
SETTING POS249 ELSE NEW.FIELD=1

*** UniData version of LOCATE in use - change as reqd

LOCATE FIELD.DATA IN UNPACKED<POS251,POS250,1> BY TYPE.OF.SORT
SETTING POS249 ELSE NEW.FIELD=1

IF MUST.INSERT OR NEW.FIELD THEN

INS FIELD.DATA BEFORE UNPACKED<POS251,POS250,POS249>


WORK.MAT(6)<AMPOS,VMPOS,SVMPOS>=CONVERT(@AM:@VM:@S VM,CHAR(251):CHAR(250):CHAR(249),UNPACKED)


UNPACKED=CONVERT(CHAR(251):CHAR(250):CHAR(249),@AM :@VM:@SVM,REFERENCE.ARRAY<AMPOS,VMPOS,SVMPOS>)

INS LC BEFORE UNPACKED<POS251,POS250,POS249>


REFERENCE.ARRAY<AMPOS,VMPOS,SVMPOS>=CONVERT(@AM:@V M:@SVM,CHAR(251):CHAR(250):CHAR(249),UNPACKED)

END

END CASE

NEXT COL.NO

NEXT LC
*

*

LOOP

REMOVE REF FROM REFERENCE.ARRAY SETTING MORE.DATA

SORTED.ARRAY<-1>=UNSORTED.MAT(REF)

WHILE MORE.DATA DO REPEAT

*

CRT (SYSTEM(12)-CHECKPOINT.1+1) 'R#11':

SORTED.ARRAY1=SORTED.ARRAY

CHECKPOINT.2=SYSTEM(12)

REFERENCE.ARRAY=''

DIM SORT.MAT(LINE.COUNT)

MAT SORT.MAT=MAT UNSORTED.MAT

*

* Method 2 : QuickSort

*

CALL QUICK.SORT.ASSOC(MAT SORT.MAT, COLS.TO.SORT.BY, 1, LINE.COUNT,
SORT.BY)
MATBUILD SORTED.ARRAY FROM SORT.MAT

CHECKPOINT.3=SYSTEM(12)

*

CRT (CHECKPOINT.3-CHECKPOINT.2+1) 'R#12':

CHECKPOINT.3=SYSTEM(12)

*
MODULO=INT(LEN(UNSORTED.ARRAY)/3000)

IF MODULO>LINE.COUNT THEN MODULO=LINE.COUNT

LOOP WHILE INT(MODULO/2)=MODULO/2 OR INT(MODULO/5)=MODULO/5 DO

MODULO-=1

REPEAT

FNAME='SORTWORK':@USERNO

PERFORM 'CREATE-FILE ':FNAME:' ':MODULO CAPTURING CRTOUT

OPEN 'DICT',FNAME TO FDICT ELSE DEBUG

OPEN FNAME TO FDATA ELSE DEBUG

CLEARFILE FDATA

*

* Method 3 : SSELECT

*

DELIM=''

FOR CH=126 TO 33 STEP -1

IF NOT(INDEX(UNSORTED.ARRAY,CHAR(CH),1)) THEN

DELIM=CHAR(CH)

EXIT

END

NEXT CH

IF DELIM='' THEN DEBUG

FOR X=1 TO LINE.COUNT

WRITE '' ON FDATA,CONVERT(@VM,DELIM,UNSORTED.MAT(X))
NEXT X

D='V'

CMD='SSELECT ':FNAME

FOR COL.NO=1 TO NO.OF.COLS

D<2>="FIELD(@ID,'"ELIM:"',":COL.NO:")"

D<4>="Column ":COL.NO

D<5>='10':SORT.BY<1,COL.NO>[2,1]

WRITE D ON FDICT,'F':COL.NO

IF SORT.BY<1,COL.NO>[1,1]='A' THEN

CMD:=' BY'

END ELSE

CMD:=' BY-DSND'

END

CMD:=' F':COL.NO

NEXT COL.NO

* Note : this syntax is correct for UniData - check for other MV
flavours
EXECUTE CMD CAPTURING CRTOUT

READLIST SORTED.ARRAY ELSE DEBUG

SORTED.ARRAY=CONVERT(DELIM,@VM,SORTED.ARRAY)

CRT (SYSTEM(12)-CHECKPOINT.3+1) 'R#11':

CHECKPOINT.4=SYSTEM(12)

*

* Method 4 : Unix sort
*

PERFORM 'CREATE.FILE DIR SORTDIR' CAPTURING CRTOUT

OSOPEN 'SORTDIR/UN':FNAME TO FVAR ELSE

OSWRITE '' ON 'SORTDIR/UN':FNAME

OSOPEN 'SORTDIR/UN':FNAME TO FVAR ELSE DEBUG

END

OSBWRITE CONVERT(@VM:@AM,CHAR(9):CHAR(10),UNSORTED.ARRAY) ON FVAR AT 0

OSCLOSE FVAR

CMD='sort -o SORTDIR/':FNAME

FOR COL.NO=1 TO NO.OF.COLS

CMD:=' -k':COLS.TO.SORT.BY<1,COL.NO>:',':COLS.TO.SORT.BY<1 ,COL.NO>

IF SORT.BY<1,COL.NO>[1,1]='D' THEN CMD:='r'

IF SORT.BY<1,COL.NO>[2,1]='R' THEN CMD:='n'

NEXT COL.NO

CMD:=' SORTDIR/UN':FNAME

PCPERFORM CMD CAPTURING CRTOUT

OSOPEN 'SORTDIR/':FNAME TO FVAR ELSE DEBUG

OSBREAD SORTED.ARRAY FROM FVAR AT 0 LENGTH LEN(UNSORTED.ARRAY)

OSCLOSE FVAR

OSDELETE 'SORTDIR/UN':FNAME

OSDELETE 'SORTDIR/':FNAME

SORTED.ARRAY=CONVERT(CHAR(9):CHAR(10),@VM:@AM,SORT ED.ARRAY)

CHECKPOINT.5=SYSTEM(12)
CRT (CHECKPOINT.5-CHECKPOINT.4+1) 'R#13'

RETURN

BP: QUICK.SORT.ASSOC
====================

SUBROUTINE QUICK.SORT.ASSOC(MAT LIST, COLS.TO.SORT.BY, SORT.FROM,
SORT.TO, SORT.BY)

*// quicksort using dim'd array

*// LIST(n) dimensioned list of elements to be sorted

*// COLS.TO.SORT.BY mulivalue nos. within elements to be sorted

*// SORT.FROM sort from this element number - usually 1

*// SORT.TO sort to this element number - usually n

*// SORT.BY 'AR' or 'DR'

DIM LIST(32000)

LOW.POINT = SORT.FROM

HIGH.POINT = SORT.TO

MID.POINT=INT((SORT.FROM+SORT.TO)/2)

MID.VALUES=''

NO.OF.COLS=DCOUNT(COLS.TO.SORT.BY,@VM)

FOR COL.NO=1 TO NO.OF.COLS

MID.VALUES<1,-1>=LIST(MID.POINT)<1,COLS.TO.SORT.BY<1,COL.NO>>

NEXT COL.NO

LOOP

LOOP

IF LOW.POINT >= SORT.TO THEN EXIT
INCREMENT.LOW=0

FOR COL.NO=1 TO NO.OF.COLS

IF MID.VALUES<1,COL.NO>=LIST(LOW.POINT)<1,COL.NO> THEN CONTINUE

IF SORT.BY<1,COL.NO>[1,1]='A' THEN

IF MID.VALUES<1,COL.NO> > LIST(LOW.POINT)<1,COL.NO> THEN

INCREMENT.LOW=1

END

END ELSE

IF MID.VALUES<1,COL.NO> < LIST(LOW.POINT)<1,COL.NO> THEN

INCREMENT.LOW=1

END

END

EXIT

NEXT COL.NO

IF INCREMENT.LOW THEN

LOW.POINT+=1

END ELSE

EXIT

END

REPEAT

LOOP

IF HIGH.POINT <= SORT.FROM THEN EXIT

DECREMENT.HIGH=0

FOR COL.NO=1 TO NO.OF.COLS

IF MID.VALUES<1,COL.NO>=LIST(HIGH.POINT)<1,COL.NO> THEN CONTINUE

IF SORT.BY<1,COL.NO>[1,1]='A' THEN

IF MID.VALUES<1,COL.NO> < LIST(HIGH.POINT)<1,COL.NO> THEN

DECREMENT.HIGH=1

END

END ELSE

IF MID.VALUES<1,COL.NO> > LIST(HIGH.POINT)<1,COL.NO> THEN

DECREMENT.HIGH=1

END

END

EXIT

NEXT COL.NO

IF DECREMENT.HIGH THEN

HIGH.POINT-=1

END ELSE

EXIT

END

REPEAT

IF (LOW.POINT < HIGH.POINT) THEN ;*// swap elements if required

SWAP.VALUE = LIST(LOW.POINT)

LIST(LOW.POINT) = LIST(HIGH.POINT)

LIST(HIGH.POINT) = SWAP.VALUE

END

IF (LOW.POINT <= HIGH.POINT) THEN

LOW.POINT += 1

HIGH.POINT -= 1

END

WHILE (LOW.POINT <= HIGH.POINT) DO REPEAT

*// sort remainder of the elements

IF (SORT.FROM < HIGH.POINT) THEN

CALL QUICK.SORT.ASSOC(MAT LIST, COLS.TO.SORT.BY, SORT.FROM,
HIGH.POINT, SORT.BY)

END

IF (LOW.POINT < SORT.TO) THEN

CALL QUICK.SORT.ASSOC(MAT LIST, COLS.TO.SORT.BY, LOW.POINT, SORT.TO,
SORT.BY)
END

RETURN

END


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.