dbTalk Databases Forums  

[BUGS] BUG #2317: Wrong sorting order for (VW)

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] BUG #2317: Wrong sorting order for (VW) in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
TomasKlockar
 
Posts: n/a

Default [BUGS] BUG #2317: Wrong sorting order for (VW) - 03-15-2006 , 05:02 PM







The following bug has been logged online:

Bug reference: 2317
Logged by: TomasKlockar
Email address: tomask (AT) omicron (DOT) se
PostgreSQL version: 7.3.2/7.4.7
Operating system: linux(fedora)
Description: Wrong sorting order for (VW)
Details:

SELECT cname FROM clients ORDER BY cname;

sorts W before V and that is a mistake.

the result is

V
W
WHI
Vi
Wi
Volvo

Correct order would be to place all vV before all wW.

I think the database was initialized with UTF8 but it might have been
ISO_8859-1, however the error is easily repeatable and ill be happy to
provide a dump.

I tested this on 7.3.2 and 7.4.7 and the error was repeatable.

regards,

/Tomas

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply With Quote
  #2  
Old   
Stephan Szabo
 
Posts: n/a

Default Re: [BUGS] BUG #2317: Wrong sorting order for (VW) - 03-15-2006 , 05:30 PM






On Mon, 13 Mar 2006, TomasKlockar wrote:

Quote:
The following bug has been logged online:

Bug reference: 2317
Logged by: TomasKlockar
Email address: tomask (AT) omicron (DOT) se
PostgreSQL version: 7.3.2/7.4.7
Operating system: linux(fedora)
Description: Wrong sorting order for (VW)
Details:

SELECT cname FROM clients ORDER BY cname;

sorts W before V and that is a mistake.

the result is

V
W
WHI
Vi
Wi
Volvo

Correct order would be to place all vV before all wW.

I think the database was initialized with UTF8 but it might have been
ISO_8859-1, however the error is easily repeatable and ill be happy to
provide a dump.

I tested this on 7.3.2 and 7.4.7 and the error was repeatable.
I think the most important thing we'd need to know is what locale the
database was initialized with in order to try to reproduce.

If you put similar data in a file and use the unix "sort" command with the
same locale, do you get the same order?


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


Reply With Quote
  #3  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2317: Wrong sorting order for (VW) - 03-15-2006 , 07:01 PM



"TomasKlockar" <tomask (AT) omicron (DOT) se> writes:
Quote:
I think the database was initialized with UTF8 but it might have been
ISO_8859-1, however the error is easily repeatable and ill be happy to
provide a dump.
If the database encoding doesn't match the database locale (server
LC_COLLATE setting), all kinds of very strange sorting behavior can
ensue. I suspect you've been bit by this gotcha.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Reply With Quote
  #4  
Old   
Magnus Hagander
 
Posts: n/a

Default Re: [BUGS] BUG #2317: Wrong sorting order for (VW) - 03-16-2006 , 03:11 AM



Quote:
The following bug has been logged online:

Bug reference: 2317
Logged by: TomasKlockar
Email address: tomask (AT) omicron (DOT) se
PostgreSQL version: 7.3.2/7.4.7
Operating system: linux(fedora)
Description: Wrong sorting order for (VW)
Details:

SELECT cname FROM clients ORDER BY cname;

sorts W before V and that is a mistake.

the result is

V
W
WHI
Vi
Wi
Volvo

Correct order would be to place all vV before all wW.

I think the database was initialized with UTF8 but it might=20
have been=20
ISO_8859-1, however the error is easily repeatable and ill=20
be happy to=20
provide a dump.

I tested this on 7.3.2 and 7.4.7 and the error was repeatable.
=20
I think the most important thing we'd need to know is what=20
locale the database was initialized with in order to try to reproduce.
=20
If you put similar data in a file and use the unix "sort"=20
command with the same locale, do you get the same order?
That looks like the typical swedish locale, which sorts V and W as the
same character. Yes, that can be very annoying for some of us :-), but
that's the way it's defined.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
  #5  
Old   
Magnus Hagander
 
Posts: n/a

Default Re: [BUGS] BUG #2317: Wrong sorting order for (VW) - 03-16-2006 , 04:09 AM



Quote:
The following bug has been logged online:
=20=09=09=09
Bug reference: 2317
Logged by: TomasKlockar
Email address: tomask (AT) omicron (DOT) se
PostgreSQL version: 7.3.2/7.4.7
Operating system: linux(fedora)
Description: Wrong sorting order for (VW)
Details:
=20=09=09=09
SELECT cname FROM clients ORDER BY cname;
=20=09=09=09
sorts W before V and that is a mistake.
=20=09=09=09
the result is
=20=09=09=09
V
W
WHI
Vi
Wi
Volvo
=20=09=09=09
Correct order would be to place all vV=20
before all wW.
=20=09=09=09
I think the database was initialized=20
with UTF8 but it might=20
=20=09=09=09=20=20=20=20=20=20
=20
have been=20
=20=09=09=20=20=20=20
=20
ISO_8859-1, however the error is easily=20
repeatable and ill=20
=20=09=09=09=20=20=20=20=20=20
=20
be happy to=20
=20=09=09=20=20=20=20
=20
provide a dump.
=20=09=09=09
I tested this on 7.3.2 and 7.4.7 and=20
the error was repeatable.
=20=09=09=09=20=20=20=20=20=20
=20
I think the most important thing we'd need to=20
know is what=20
locale the database was initialized with in=20
order to try to reproduce.
=20=09=09
If you put similar data in a file and use the=20
unix "sort"=20
command with the same locale, do you get the same order?
=20=09=09=20=20=20=20
=20
=20=09
That looks like the typical swedish locale, which sorts=20
V and W as the
same character. Yes, that can be very annoying for some=20
of us :-), but
that's the way it's defined.
=20=09
//Magnus
=20=09=20=20
=20
I would say that sorting V and W as the same character may=20
work in a dictionary or sometimes when sorting names, however=20
in that case why don't it sort C and K as the same character?=20
and in some cases C and S as the same.=20
=20
The best sorting algorithm should sort all characters as=20
separate characters, and if you want fancy sorting you should=20
need to turn it on for a table.
=20
Fancy sorting, like sorting Carlsson and Karlsson together=20
since they are equal in the same way as=20
Viktor=3DWiktor=3DVictor=3DWictor. Cesar and Sesar would also have=20
to be sorted together since they are pronounced the same in=20
swedish, and if you look in a phonebook you would find them=20
at the same place.
=20
Now I get the english words was and vas sorted together when=20
I need them separated.
=20
Currently the sortingfunction in postgreSQL have disqualified=20
itself from beeing used, and I do the sorting in java which=20
treat them as separate caracters.
Well, PostgreSQL uses the locale functionatlity provided by your OS, so you=
will need to talk to them. Perhaps they already ship a different locale de=
finition that is more suitable for your needs that you can change to?=20

If you select locale=3DC you will get the sort all chars as separate charac=
ters. It will break your sorting of =E5=E4=F6, because they are not in the =
correct sequence in LATIN1 (or UTF8 for that matter), but you can certainly=
do that. It's a matter of picking the correct locale when you initdb your =
database.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


Reply With Quote
  #6  
Old   
Tomas Zerolo
 
Posts: n/a

Default Re: [BUGS] BUG #2317: Wrong sorting order for (VW) - 03-16-2006 , 11:52 PM




--wxDdMuZNg1r63Hyj
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Mar 16, 2006 at 11:03:59AM +0100, Tomas Klockar wrote:
[...]
Quote:
I would say that sorting V and W as the same character may work in a=20
dictionary [...]

Currently the sortingfunction in postgreSQL have disqualified itself=20
from beeing used, and I do the sorting in java which treat them as=20
separate caracters.
=20
This is my 5 cent,
Tomas, as it has been said, PostgreSQL uses the operating system's
locale. So you can do two things: change it (e.b. to C , which is
neutral) or fix it. If you are on an unix-like OS, man 5 locale might
help with that.

Regards
-- tom=C3=A1s

--wxDdMuZNg1r63Hyj
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEGk5bBcgs9XrR2kYRApU8AJ9qYVowY46ByjnCQwP1wL YrGsqtZACdHYYe
mERFVGkvj1cqPMQHBKv+Psc=
=sL7I
-----END PGP SIGNATURE-----

--wxDdMuZNg1r63Hyj--



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.