![]() | |
![]() |
| | Thread Tools | Display Modes |
#21
| |||
| |||
|
|
On Mon, 23 Jan 2012 01:21:09 -0800 (PST) Kristoff <kristoff.pic... (AT) ingres (DOT) com> wrote: On Jan 21, 10:36*pm, "James K. Lowden" <jklow... (AT) schemamania (DOT) org wrote: the allkeys file, the Default Unicode Collation Element Table, which says: $ grep -E '^00.+(LOW LINE|SPACE)$' allkeys.txt *| head 0020 *; [*020A.0020.0002.0020] # SPACE 005F *; [*021B.0020.0002.005F] # LOW LINE ... I'm not saying this is what Ingres is doing. *I'm saying this is a valid interpretation of what I understand to be the controlling standard. I'm not speaking about the actual default collation at unicode.org, but the old one which came with the 2.0 standard. Take a look at: http://unicode.org/Public/UCA/2.1.9/basekeys-2.1.9.txt $egrep '^005F|^0020' basekeys-2.1.9.txt 0020 ; [*0209.0020.0002.0020] # SPACE 005F ; [*0209.0021.0002.005F] # LOW LINE; COMPATSEQ So in this old collation sequence the first weight of both chars is the same. Thanks for the clarification, Kristoff. *If I may, did the standard also change how/when the second field comes into play? *AIUI the current standard, each level is a tie-breaker for the the one before. I won't go into detail now in case it's irrelevant. To the OP's question, is the Unicode collation user modfiable (without a compiler) or contemplated for a future release? Regards, --jkl |
![]() |
| Thread Tools | |
| Display Modes | |
| |