dbTalk Databases Forums  

Distinct values

comp.databases.oracle.server comp.databases.oracle.server


Discuss Distinct values in the comp.databases.oracle.server forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mladen Gogala
 
Posts: n/a

Default Distinct values - 08-04-2010 , 10:44 AM






I have recently had a conversation with an Oracle support engineer who
told me that, when analyzing VARCHAR2 columns, Oracle only counts the
first 32 characters. I checked the statement and it is true:

SQL> create table test1(col varchar2(40));

Table created.

Elapsed: 00:00:00.07
SQL> insert into test1 values('01235678901234567890123456789012A');

1 row created.

Elapsed: 00:00:00.07
SQL> insert into test1 values('01235678901234567890123456789012B');

1 row created.

Elapsed: 00:00:00.08
SQL> commit;

Commit complete.

Elapsed: 00:00:00.06
SQL> analyze table test1 compute statistics
2 for table for all columns size 254;

Table analyzed.

Elapsed: 00:00:00.07
SQL> select column_name,num_distinct from user_tab_columns
2 where table_name='TEST1';

COLUMN_NAME NUM_DISTINCT
------------------------------ ------------
COL 1

Elapsed: 00:00:00.11
SQL>

In other words, if you are storing FS paths into the database and
analyzing the table, stats will show significantly smaller number of
the distinct values than there really are. Queries may be messed up
because of that. I tested on 10.2.0.5 and 11.2.0.1, the behavior is the
same. Did anybody else notice this?


--
http://mgogala.byethost5.com

Reply With Quote
  #2  
Old   
joel garry
 
Posts: n/a

Default Re: Distinct values - 08-04-2010 , 11:01 AM






On Aug 4, 8:44Â*am, Mladen Gogala <n... (AT) email (DOT) here.invalid> wrote:
Quote:
I have recently had a conversation with an Oracle support engineer who
told me that, when analyzing VARCHAR2 columns, Oracle only counts the
first 32 characters. I checked the statement and it is true:

SQL> create table test1(col varchar2(40));

Table created.

Elapsed: 00:00:00.07
SQL> insert into test1 values('01235678901234567890123456789012A');

1 row created.

Elapsed: 00:00:00.07
SQL> insert into test1 values('01235678901234567890123456789012B');

1 row created.

Elapsed: 00:00:00.08
SQL> commit;

Commit complete.

Elapsed: 00:00:00.06
SQL> analyze table test1 compute statistics
Â* 2 Â*for table for all columns size 254;

Table analyzed.

Elapsed: 00:00:00.07
SQL> select column_name,num_distinct from user_tab_columns
Â* 2 Â*where table_name='TEST1';

COLUMN_NAME Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*NUM_DISTINCT
------------------------------ ------------
COL Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*Â* Â* Â* Â* Â* Â* Â* Â* Â* 1

Elapsed: 00:00:00.11
SQL

In other words, if you are storing FS paths into the database and
analyzing the table, stats will show significantly smaller number of
the distinct values than there really are. Queries may be messed up
because of that. I tested on 10.2.0.5 and 11.2.0.1, the behavior is the
same. Did anybody else notice this?

--http://mgogala.byethost5.com
Interesting. Tell Jože: http://joze-senegacnik.blogspot.com/...lectivity.html

Same results as yours on 10.2.0.4

jg
--
@home.com is bogus.
http://www.signonsandiego.com/news/2...tc-settlement/

Reply With Quote
  #3  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Distinct values - 08-04-2010 , 11:17 AM



On Wed, 04 Aug 2010 09:01:50 -0700, joel garry wrote:

Quote:
On Aug 4, 8:44Â*am, Mladen Gogala <n... (AT) email (DOT) here.invalid> wrote:
I have recently had a conversation with an Oracle support engineer who
told me that, when analyzing VARCHAR2 columns, Oracle only counts the
first 32 characters. I checked the statement and it is true:

SQL> create table test1(col varchar2(40));

Table created.

Elapsed: 00:00:00.07
SQL> insert into test1 values('01235678901234567890123456789012A');

1 row created.

Elapsed: 00:00:00.07
SQL> insert into test1 values('01235678901234567890123456789012B');

1 row created.

Elapsed: 00:00:00.08
SQL> commit;

Commit complete.

Elapsed: 00:00:00.06
SQL> analyze table test1 compute statistics
Â* 2 Â*for table for all columns size 254;

Table analyzed.

Elapsed: 00:00:00.07
SQL> select column_name,num_distinct from user_tab_columns
Â* 2 Â*where table_name='TEST1';

COLUMN_NAME Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*NUM_DISTINCT
------------------------------ ------------ COL Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*
Â* Â* Â* Â* Â* Â* Â* 1

Elapsed: 00:00:00.11
SQL

In other words, if you are storing FS paths into the database and
analyzing the table, stats will show significantly smaller number of
the distinct values than there really are. Queries may be messed up
because of that. I tested on 10.2.0.5 and 11.2.0.1, the behavior is the
same. Did anybody else notice this?

--http://mgogala.byethost5.com

Interesting. Tell Jože:
http://joze-senegacnik.blogspot.com/...-in-determing-
selectivity.html

Same results as yours on 10.2.0.4

jg

Thanks! I didn't know that Joze had a blog and we have worked together in
the past. Joze Senegacnik is a great guy, a member of the furniture store
named "Oak Table" or something like that. Apparently, Joze did a bit
more disciplined and systematic testing than me. This, however, is an
interesting anomaly. BTW, I got Tom's book, I am busy reading it, so I
may miss an interesting blog or two or even some interesting threads on
this group.


--
http://mgogala.byethost5.com

Reply With Quote
  #4  
Old   
Michel Cadot
 
Posts: n/a

Default Re: Distinct values - 08-04-2010 , 11:24 AM



"Mladen Gogala" <no (AT) email (DOT) here.invalid> a écrit dans le message de news: pan.2010.08.04.15.44.06 (AT) email (DOT) here.invalid...
Quote:
I have recently had a conversation with an Oracle support engineer who
told me that, when analyzing VARCHAR2 columns, Oracle only counts the
first 32 characters. I checked the statement and it is true:

<...>
Quote:
--
http://mgogala.byethost5.com
This has always been true in any version since CBO was introduced.
I even wonder if it was not less in the previous versions but I can't remember.

Regards
Michel

Reply With Quote
  #5  
Old   
Randolf Geist
 
Posts: n/a

Default Re: Distinct values - 08-05-2010 , 04:24 AM



On Aug 4, 6:17*pm, Mladen Gogala <n... (AT) email (DOT) here.invalid> wrote:
Quote:
Thanks! I didn't know that Joze had a blog and we have worked together in
the past. Joze Senegacnik is a great guy, a member of the furniture store
named "Oak Table" or something like that. *Apparently, Joze did a bit
more disciplined and systematic testing than me. This, however, is an
interesting anomaly. BTW, I got Tom's book, I am busy reading it, so I
may miss an interesting blog or two or even some interesting threads on
this group.
Mladen,

you've used the ANALYZE command for generating CBO statistics that has
been deprecated a long time ago for that purpose:

http://download.oracle.com/docs/cd/E...htm#SQLRF01105

DBMS_STATS has similar issues but is less limited than ANALYZE.

In particular the NUM_DISTINCT information will be correct with
DBMS_STATS even for strings longer than 32 bytes as long as you don't
generate histograms (SIZE 1 vs. SIZE > 1), whereas ANALYZE has this 32
bytes limitation regardless of the histogram generation.

I've done some time ago a general discussion on the limitations of
histograms with DBMS_STATS, since the issue is not limited to
character columns:

http://oracle-randolf.blogspot.com/2...-rounding.html

The blog post doesn't cover ANALYZE and it doesn't mention above point
that DBMS_STATS does the NUM_DISTINCT correct when not generating a
histogram.

Regards,
Randolf

Oracle related stuff blog:
http://oracle-randolf.blogspot.com/

Co-author of the "OakTable Expert Oracle Practices" book:
http://www.apress.com/book/view/1430226684
http://www.amazon.com/Expert-Oracle-.../dp/1430226684

Reply With Quote
  #6  
Old   
Rob Burton
 
Posts: n/a

Default Re: Distinct values - 08-05-2010 , 04:38 AM



On Aug 5, 10:24*am, Randolf Geist <mah... (AT) web (DOT) de> wrote:


Quote:
I've done some time ago a general discussion on the limitations of
histograms with DBMS_STATS, since the issue is not limited to
character columns:

http://oracle-randolf.blogspot.com/2...of-histograms-...
Good demonstartion on your blog. As you pointed out you can get some
bad behaviour if storing strings with a long common root at the start
(URL's for example.)

This behaviour combined with automatic histogram generation in 10G,
bind variable peeking and queries that re-parse frequently has on more
than one occasion lead to queries that repeatedly flipped between good
and ban plans due
to a very high skew (50% of data in one bucket) from the histograms
even though each 'full' URL was fairly distinct. I do really think the
10G default auto histogram generation combine with bind peeking in a
OLTP environment is a very bind combination and potentially very
unstable.

Reply With Quote
  #7  
Old   
Tim X
 
Posts: n/a

Default Re: Distinct values - 08-05-2010 , 08:02 PM



Mladen Gogala <no (AT) email (DOT) here.invalid> writes:

Quote:
I have recently had a conversation with an Oracle support engineer who
told me that, when analyzing VARCHAR2 columns, Oracle only counts the
first 32 characters. I checked the statement and it is true:

SQL> create table test1(col varchar2(40));

Table created.

Elapsed: 00:00:00.07
SQL> insert into test1 values('01235678901234567890123456789012A');

1 row created.

Elapsed: 00:00:00.07
SQL> insert into test1 values('01235678901234567890123456789012B');

1 row created.

Elapsed: 00:00:00.08
SQL> commit;

Commit complete.

Elapsed: 00:00:00.06
SQL> analyze table test1 compute statistics
2 for table for all columns size 254;

Table analyzed.

Elapsed: 00:00:00.07
SQL> select column_name,num_distinct from user_tab_columns
2 where table_name='TEST1';

COLUMN_NAME NUM_DISTINCT
------------------------------ ------------
COL 1

Elapsed: 00:00:00.11
SQL

In other words, if you are storing FS paths into the database and
analyzing the table, stats will show significantly smaller number of
the distinct values than there really are. Queries may be messed up
because of that. I tested on 10.2.0.5 and 11.2.0.1, the behavior is the
same. Did anybody else notice this?
Thanks for this. As soon as I say your post, I rememberd this fact from
way back when I was first learning about CBO. It is one of those
important points that is so easily forgotten. I can't even remember
where I read about it, but think it was buried in some Oracle docs
somewhere and then later totally forgotten.

Tim

--
tcross (at) rapttech dot com dot au

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.