dbTalk Databases Forums  

Case-insensitive select

comp.databases.postgresql comp.databases.postgresql


Discuss Case-insensitive select in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
RVince
 
Posts: n/a

Default Case-insensitive select - 08-31-2011 , 07:29 AM






I'm using Postgres and the end-users are complaining that the search
features provided by the web app are case-sensitive (via Postgres). Is
there a way to do a case-insensitive version of select statements? Not
all statements are of the form "=" or "like" but often of the form
"...where lastname >= 'smith';" so the I don;t think the '~*' operator
will work here for me in all cases. Is there a way to do this type of
case-insensitive comparison? Is there a way to designate certain
tables or columns be compared on a case-insensitive basis ? Thanks
RVince

Reply With Quote
  #2  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Case-insensitive select - 08-31-2011 , 08:19 AM






On Wed, 31 Aug 2011 05:29:13 -0700, RVince wrote:

Quote:
I'm using Postgres and the end-users are complaining that the search
features provided by the web app are case-sensitive (via Postgres). Is
there a way to do a case-insensitive version of select statements? Not
all statements are of the form "=" or "like" but often of the form
"...where lastname >= 'smith';" so the I don;t think the '~*' operator
will work here for me in all cases. Is there a way to do this type of
case-insensitive comparison? Is there a way to designate certain tables
or columns be compared on a case-insensitive basis ? Thanks RVince
First of all, PostgreSQL supports ILIKE operator, which is case
insensitive. Second, ~* will have a problem with indexes, which limits
its usefulness. Third, PostgreSQL support standard function like "upper"
and "lower". Your comparison can be rewritten like
where lower(lastname) >= 'smith', which will be case insensitive.
PostgreSQL also allows you to create an index on lower(lastname), which
will probably be used, unless the number of Smiths in your name
collection isn't too big for that.



--
http://mgogala.byethost5.com

Reply With Quote
  #3  
Old   
Matthew Woodcraft
 
Posts: n/a

Default Re: Case-insensitive select - 08-31-2011 , 02:42 PM



RVince <rvince99 (AT) gmail (DOT) com> wrote:
Quote:
Is there a way to designate certain tables or columns be compared on
a case-insensitive basis ? Thanks RVince
There's a case-insensitive string type in contrib (the "Additional
Supplied Modules"):

http://www.postgresql.org/docs/9.0/static/citext.html

-M-

Reply With Quote
  #4  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Case-insensitive select - 08-31-2011 , 06:00 PM



On Wed, 31 Aug 2011 20:42:40 +0100, Matthew Woodcraft wrote:

Quote:
There's a case-insensitive string type in contrib (the "Additional
Supplied Modules"):
There are unpleasant surprises with that, like the fact that unique index
of that type is not case insensitive. Standard varchar data type,
combined with a function index or a trigger should be more than adequate.



--
http://mgogala.byethost5.com

Reply With Quote
  #5  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Case-insensitive select - 09-01-2011 , 01:56 AM



Mladen Gogala wrote:
Quote:
There's a case-insensitive string type in contrib (the "Additional
Supplied Modules"):

There are unpleasant surprises with that, like the fact that unique index
of that type is not case insensitive. Standard varchar data type,
combined with a function index or a trigger should be more than adequate.
What you say seems to contradict citext's documentation:

The citext data type allows you to eliminate calls to lower in SQL
queries, and allows a primary key to be case-insensitive.

Can you explain?

Yours,
Laurenz Albe

Reply With Quote
  #6  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Case-insensitive select - 09-01-2011 , 07:57 AM



On Thu, 01 Sep 2011 08:56:31 +0200, Laurenz Albe wrote:

Quote:
Mladen Gogala wrote:
There's a case-insensitive string type in contrib (the "Additional
Supplied Modules"):

There are unpleasant surprises with that, like the fact that unique
index of that type is not case insensitive. Standard varchar data type,
combined with a function index or a trigger should be more than
adequate.

What you say seems to contradict citext's documentation:

The citext data type allows you to eliminate calls to lower in SQL
queries, and allows a primary key to be case-insensitive.

Can you explain?

Yours,
Laurenz Albe
You skipped a passage in the documentation:

"If you declare a column as UNIQUE or PRIMARY KEY, the implicitly
generated index is case-sensitive. So it's useless for case-insensitive
searches, and it won't enforce uniqueness case-insensitively. "



--
http://mgogala.byethost5.com

Reply With Quote
  #7  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Case-insensitive select - 09-01-2011 , 09:40 AM



Mladen Gogala wrote:
[about citext contrib]
Quote:
There are unpleasant surprises with that, like the fact that unique
index of that type is not case insensitive. Standard varchar data type,
combined with a function index or a trigger should be more than
adequate.

What you say seems to contradict citext's documentation:

The citext data type allows you to eliminate calls to lower in SQL
queries, and allows a primary key to be case-insensitive.

Can you explain?

You skipped a passage in the documentation:

"If you declare a column as UNIQUE or PRIMARY KEY, the implicitly
generated index is case-sensitive. So it's useless for case-insensitive
searches, and it won't enforce uniqueness case-insensitively. "
Oh, I see. I think you got confused here, because the sentence you
quote is a description of a drawback of the "standard approach" using
"WHERE lower(col) = LOWER(?)", *not* a drawback of citext.

Yours,
Laurenz Albe

Reply With Quote
  #8  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Case-insensitive select - 09-01-2011 , 04:26 PM



On Thu, 01 Sep 2011 16:40:58 +0200, Laurenz Albe wrote:

Quote:
Oh, I see. I think you got confused here, because the sentence you quote
is a description of a drawback of the "standard approach" using "WHERE
lower(col) = LOWER(?)", *not* a drawback of citext.
Seems you're right, I was mistaken.

mgogala=# create temporary table t1(col1 citext, constraint t1_pk primary
key(col1));
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pk"
for table "t1"
CREATE TABLE
mgogala=# insert into t1 values('Laurenz Albe');
INSERT 0 1
mgogala=# insert into t1 values('laurenz albe');
ERROR: duplicate key value violates unique constraint "t1_pk"
DETAIL: Key (col1)=(laurenz albe) already exists.
mgogala=# insert into t1 values('LAURENZ ALBE');
ERROR: duplicate key value violates unique constraint "t1_pk"
DETAIL: Key (col1)=(LAURENZ ALBE) already exists.
mgogala=#

However, the citext doesn't allow me to limit the column size:

mgogala=# create temporary table t1(col1 citext(10), constraint t1_pk
primary key(col1));
ERROR: type modifier is not allowed for type "citext"
LINE 1: create temporary table t1(col1 citext(10), constraint t1_pk ...

It seems to me that citext is a good replacement for the type text, not
for the type varchar(n). At any rate, the type citext is useful but
rather nonstandard. I would still use the standard varchar type for the
vast majority of cases and use trigger and/or functional indexes to
resolve the case insensitive search. Case insensitive search was the
problem for which functional indexes were invented. Cost based optimizers
were having trouble with the functional indexes, Oracle's until version
11 and the extended statistics, because the normal statistics gathering
procedures usually do not collect data on lower(col1) or something
similar to it. The trick to make it perform better was to add a column,
named something like lower_col1 and populate it by a trigger. That would
help the optimizer, at the expense of violating the relational rules and
storing the duplicate information. That column could then be indexed
normally. This method still provides the best performance, on all
databases I know of, with the notable exception of Oracle 11i.

--
http://mgogala.byethost5.com

Reply With Quote
  #9  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Case-insensitive select - 09-02-2011 , 03:08 AM



Mladen Gogala wrote:
Quote:
It seems to me that citext is a good replacement for the type text, not
for the type varchar(n).
Well, it's called "citext" and not "civarchar" for a reason.
But you can easily

CREATE DOMAIN civarchar20 AS citext
CONSTRAINT civarchar20_len CHECK(length(VALUE) <= 20);

to get that.

Quote:
I would still use the standard varchar type for the
vast majority of cases and use trigger and/or functional indexes to
resolve the case insensitive search.
It's everybody's choice.

Quote:
The trick to make it perform better was to add a column,
named something like lower_col1 and populate it by a trigger. That would
help the optimizer, at the expense of violating the relational rules and
storing the duplicate information. That column could then be indexed
normally. This method still provides the best performance, on all
databases I know of, with the notable exception of Oracle 11i.
Of course it also wastes space space, and triggers don't come for
free either.

I doubt that using an index based on an expression would perform worse
in PostgreSQL, but I don't have the time to run a performance test.

At any rate statistics for such indexes are collected (look for rows
in pg_stats that belong to the index), and these statistice are exactly
the same as would be collected for the redundant column, so why should
there be a difference?

Yours,
Laurenz Albe

Reply With Quote
  #10  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Case-insensitive select - 09-02-2011 , 03:08 AM



Mladen Gogala wrote:
Quote:
It seems to me that citext is a good replacement for the type text, not
for the type varchar(n).
Well, it's called "citext" and not "civarchar" for a reason.
But you can easily

CREATE DOMAIN civarchar20 AS citext
CONSTRAINT civarchar20_len CHECK(length(VALUE) <= 20);

to get that.

Quote:
I would still use the standard varchar type for the
vast majority of cases and use trigger and/or functional indexes to
resolve the case insensitive search.
It's everybody's choice.

Quote:
The trick to make it perform better was to add a column,
named something like lower_col1 and populate it by a trigger. That would
help the optimizer, at the expense of violating the relational rules and
storing the duplicate information. That column could then be indexed
normally. This method still provides the best performance, on all
databases I know of, with the notable exception of Oracle 11i.
Of course it also wastes space space, and triggers don't come for
free either.

I doubt that using an index based on an expression would perform worse
in PostgreSQL, but I don't have the time to run a performance test.

At any rate statistics for such indexes are collected (look for rows
in pg_stats that belong to the index), and these statistice are exactly
the same as would be collected for the redundant column, so why should
there be a difference?

Yours,
Laurenz Albe

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.