![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
I have a disk file which contains 370,260 records of average length 60 ASCII text characters. I need to sort them on the first 9 characters, which are set off from the rest by a space. Once sorted the first record looks like 100100100 08:35-08:39 4 0 0 1 -115 -1 75 -100 dd17.txt and the last one looks like 500900900 12:41-13:25 44 1 885 0 0 1 925 -200 dd21.txt There is not enough memory in my Win2K/SP4 system to implement the traditional memory-based sorting algorithms, so I need to use a method that is disk-file based. I have plenty of disk space. I imagine that this method would open the input disk file and begin reading records one at a time and performing some kind of sort algorithm placing the result in a temporary output file. Once the original file has been swept, the process starts over and runs again and again for the required number of iterations until the entire list is sorted in ascending order. I will implement this method in standard DOS-based C (Microsoft Optimizing C Compiler "cl") using the Win2K DVM. Suggestions and references please. -- "Nothing in the world can take the place of perseverance. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination alone are omnipotent." --Calvin Coolidge |
#3
| |||
| |||
|
|
Try using a bubble sort. |
#4
| |||
| |||
|
|
I imagine that this method would open the input disk file and begin reading records one at a time and performing some kind of sort algorithm placing the result in a temporary output file. Once the original file has been swept, the process starts over and runs again and again for the required number of iterations until the entire list is sorted in ascending order. I will implement this method in standard DOS-based C (Microsoft Optimizing C Compiler "cl") using the Win2K DVM. |
#5
| |||
| |||
|
|
Try using a bubble sort. Ehrm no, don't. I sure the OP knows not to use it, but there are others reading this. Bubble sort is a nice (because of easy visualization) sort algorithm for beginners. Nothing more. |
#6
| |||
| |||
|
|
Try using a bubble sort. It is not fast but it works. |
#7
| |||
| |||
|
|
I imagine that this method would open the input disk file and begin reading records one at a time and performing some kind of sort algorithm placing the result in a temporary output file. Once the original file has been swept, the process starts over and runs again and again for the required number of iterations until the entire list is sorted in ascending order. I will implement this method in standard DOS-based C (Microsoft Optimizing C Compiler "cl") using the Win2K DVM. This is a classic Computer Science problem that has been solved many times. An easy way to solve your problem is to just open the source file and bite off as much as you can sort in memory. Sort it. And write an output file. Take another bite, sort, and write to a different output file. Repeat until the entire original file has been bitten off and written to smaller (but internally sorted) temporary files. Now open all the temporary output files. Read the first record from each. Write to your final output file the one record which sorts first. Replenish it with the next record from the file it came from. Repeat until all records in the temporary files have been copied to the output file. If you end up with 1,000 output files and can not open all at once, bite off as many as you can, say 20, and merge them into a single file, and another 20. And then merge the merged files, generation after generation, until you are down to a single file. |
#8
| |||
| |||
|
|
An easy way to solve your problem is to just open the source file and bite off as much as you can sort in memory. Sort it. And write an output file. Take another bite, sort, and write to a different output file. Repeat until the entire original file has been bitten off and written to smaller (but internally sorted) temporary files. Now open all the temporary output files. Read the first record from each. Write to your final output file the one record which sorts first. Replenish it with the next record from the file it came from. Repeat until all records in the temporary files have been copied to the output file. If you end up with 1,000 output files and can not open all at once, bite off as many as you can, say 20, and merge them into a single file, and another 20. And then merge the merged files, generation after generation, until you are down to a single file. I chose the Radix Sort and it works beautifully. I only have to open 11 files at a time - the input or the output file and 10 temporary files. I discover the digit value in the key and use it as an index into an array of FILE pointers. FILE *fp_temp[10]; fprintf(fp_temp[key],"%s",record); |
#9
| |||
| |||
|
|
Citizen Bob wrote: An easy way to solve your problem is to just open the source file and bite off as much as you can sort in memory. Sort it. And write an output file. Take another bite, sort, and write to a different output file. Repeat until the entire original file has been bitten off and written to smaller (but internally sorted) temporary files. Now open all the temporary output files. Read the first record from each. Write to your final output file the one record which sorts first. Replenish it with the next record from the file it came from. Repeat until all records in the temporary files have been copied to the output file. If you end up with 1,000 output files and can not open all at once, bite off as many as you can, say 20, and merge them into a single file, and another 20. And then merge the merged files, generation after generation, until you are down to a single file. I chose the Radix Sort and it works beautifully. I only have to open 11 files at a time - the input or the output file and 10 temporary files. I discover the digit value in the key and use it as an index into an array of FILE pointers. FILE *fp_temp[10]; fprintf(fp_temp[key],"%s",record); Whatever floats your boat. But I think what you describe is clumsy and only barely works in the specific application you describe. Grow the application and it will break as your solution is delicate. You are counting on an even distribution of the data which is already too large to handle in memory (doesn't matter what you use to sort in memory). You are counting on being able to break it into only 10 segments based on a pattern, and that none of those segments will be too large to handle in memory. Your first pass deals the data into 10 different files. Your 2nd pass loads each file in memory to perform the actual sort. Then 3rd pass concatenates the files into one. What I suggested was to take as much as you can and sort it, then write a temporary file. Repeat until all data is now in multiple temporary files, the contents of each is sorted. Then open all of these files and your first data record will be the first data record from one of those files. Each temporary file is now ordered so your final sort only has to compare the current record from each. If you have more temporary files than you can open simultaneously then the procedure can recurse, merging temporary files until a single file remains. If you like having names for things, splitting into multiple sorted files and putting them back together is called a merge sort. |
#10
| |||
| |||
|
|
Citizen Bob wrote: An easy way to solve your problem is to just open the source file and bite off as much as you can sort in memory. Sort it. And write an output file. Take another bite, sort, and write to a different output file. Repeat until the entire original file has been bitten off and written to smaller (but internally sorted) temporary files. Now open all the temporary output files. Read the first record from each. Write to your final output file the one record which sorts first. Replenish it with the next record from the file it came from. Repeat until all records in the temporary files have been copied to the output file. If you end up with 1,000 output files and can not open all at once, bite off as many as you can, say 20, and merge them into a single file, and another 20. And then merge the merged files, generation after generation, until you are down to a single file. I chose the Radix Sort and it works beautifully. I only have to open 11 files at a time - the input or the output file and 10 temporary files. I discover the digit value in the key and use it as an index into an array of FILE pointers. FILE *fp_temp[10]; fprintf(fp_temp[key],"%s",record); Whatever floats your boat. But I think what you describe is clumsy and only barely works in the specific application you describe. Grow the application and it will break as your solution is delicate. You are counting on an even distribution of the data which is already too large to handle in memory (doesn't matter what you use to sort in memory). You are counting on being able to break it into only 10 segments based on a pattern, and that none of those segments will be too large to handle in memory. Your first pass deals the data into 10 different files. Your 2nd pass loads each file in memory to perform the actual sort. Then 3rd pass concatenates the files into one. What I suggested was to take as much as you can and sort it, then write a temporary file. Repeat until all data is now in multiple temporary files, the contents of each is sorted. Then open all of these files and your first data record will be the first data record from one of those files. Each temporary file is now ordered so your final sort only has to compare the current record from each. If you have more temporary files than you can open simultaneously then the procedure can recurse, merging temporary files until a single file remains. If you like having names for things, splitting into multiple sorted files and putting them back together is called a merge sort. The thing about sorting numbers is you can sort from highest to lowest |
![]() |
| Thread Tools | |
| Display Modes | |
| |