dbTalk Databases Forums  

[BUGS] unicode strings are not sorted alphabetically

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] unicode strings are not sorted alphabetically in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Volodymyr Kostyrko
 
Posts: n/a

Default [BUGS] unicode strings are not sorted alphabetically - 08-05-2004 , 10:40 PM






This is a multi-part message in MIME format.
--------------030807020705040601060901
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

This applies to non-english strings (in my case - russian). I've
stumbled upon it on version 7.3.4. ( PostgreSQL 7.4.3 on
i386-portbld-freebsd5.2.1, compiled by GCC cc (GCC) 3.3.3 [FreeBSD]
20031106)

The attached files where created with:

pg_dump -U lib lib > database_dump
psql lib lib -c "select * from authors order by name" > result_of_query

The sorting order seem to be incorrect. Alpabetically they should be
sorted by 'id's as:

1
2
3
5
4
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

There's also another question on it. The russian alhabet differs from
ukrainian alphabet, so sorting should occur in different order. But the
order, provided by unicode charmap isn't good for any of them. This
probably applies to any Cyrillic charset.

--
[WBR], Arcade. [SAT Astronomy/Think to survive!]

--------------030807020705040601060901
Content-Type: text/plain;
name="database_dump"
Content-Transfer-Encoding: 8bit
Content-Disposition: inline;
filename="database_dump"

--
-- PostgreSQL database dump
--

SET client_encoding = 'UNICODE';
SET check_function_bodies = false;

SET SESSION AUTHORIZATION 'pgsql';

--
-- TOC entry 4 (OID 2200)
-- Name: public; Type: ACL; Schema: -; Owner: pgsql
--

REVOKE ALL ON SCHEMA public FROM PUBLIC;
GRANT ALL ON SCHEMA public TO PUBLIC;


SET SESSION AUTHORIZATION 'lib';

SET search_path = public, pg_catalog;

--
-- TOC entry 5 (OID 69028)
-- Name: authors; Type: TABLE; Schema: public; Owner: lib
--

CREATE TABLE authors (
id serial NOT NULL,
name character varying(256)
);


--
-- Data for TOC entry 9 (OID 69028)
-- Name: authors; Type: TABLE DATA; Schema: public; Owner: lib
--

COPY authors (id, name) FROM stdin;
1 Андерсон, Пол
2 Азимов, Айзек
3 Асприн, *оберт
4 Булгаков, Михаил
5 Брэдбери, *ей
6 Гамильтон, *дмонд
7 Гаррисон, Гарри
8 Даррелл, Джеральд
9 Дойл, Артур Конан
10 Кинг, Стивен
11 Кларк, Артур
12 Лукьяненко, Сергей
13 Желязны, *оджер
14 Пол, Фредерик
15 Твен, Марк
16 Пирс, *нтони
17 Саймак, Клиффорд Дональд
18 Силверберг, *оберт
19 Фостер, Алан Дин
20 Фрай, Макс
21 Херберт, Фрэнк
22 Честертон, Гилберт Кийт
23 *нтони, Марк
\.


--
-- TOC entry 7 (OID 69031)
-- Name: authors_id; Type: INDEX; Schema: public; Owner: lib
--

CREATE INDEX authors_id ON authors USING btree (id);


--
-- TOC entry 8 (OID 69032)
-- Name: authors_name; Type: INDEX; Schema: public; Owner: lib
--

CREATE INDEX authors_name ON authors USING btree (name);


--
-- TOC entry 6 (OID 69026)
-- Name: authors_id_seq; Type: SEQUENCE SET; Schema: public; Owner: lib
--

SELECT pg_catalog.setval('authors_id_seq', 23, true);


SET SESSION AUTHORIZATION 'pgsql';

--
-- TOC entry 3 (OID 2200)
-- Name: SCHEMA public; Type: COMMENT; Schema: -; Owner: pgsql
--

COMMENT ON SCHEMA public IS 'Standard public schema';



--------------030807020705040601060901
Content-Type: text/plain;
name="result_of_query"
Content-Transfer-Encoding: 8bit
Content-Disposition: inline;
filename="result_of_query"

id | name
----+--------------------------
10 | Кинг, Стивен
11 | Кларк, Артур
16 | Пирс, *нтони
14 | Пол, Фредерик
23 | *нтони, Марк
19 | Фостер, Алан Дин
20 | Фрай, Макс
22 | Честертон, Гилберт Кийт
13 | Желязны, *оджер
6 | Гамильтон, *дмонд
7 | Гаррисон, Гарри
12 | Лукьяненко, Сергей
17 | Саймак, Клиффорд Дональд
18 | Силверберг, *оберт
15 | Твен, Марк
21 | Херберт, Фрэнк
1 | Андерсон, Пол
2 | Азимов, Айзек
3 | Асприн, *оберт
5 | Брэдбери, *ей
4 | Булгаков, Михаил
8 | Даррелл, Джеральд
9 | Дойл, Артур Конан
(23 rows)


--------------030807020705040601060901
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

--------------030807020705040601060901--

Reply With Quote
  #2  
Old   
Stephan Szabo
 
Posts: n/a

Default Re: [BUGS] unicode strings are not sorted alphabetically - 08-05-2004 , 11:59 PM






On Sat, 31 Jul 2004, Volodymyr Kostyrko wrote:

Quote:
This applies to non-english strings (in my case - russian). I've
stumbled upon it on version 7.3.4. ( PostgreSQL 7.4.3 on
i386-portbld-freebsd5.2.1, compiled by GCC cc (GCC) 3.3.3 [FreeBSD]
20031106)
What locale and server encoding was the server configured with?

I get a different result using -E UNICODE and ru_RU.UTF8 but still not in
id order, (2,1,3,5,4,6,7,8,9,13,10,11,12,16,14,17,18,15,19,2 0,21,22,23).
Do you get a different order from the unix sort command?


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.