dbTalk Databases Forums  

does select with sample clause return variable record count

comp.databases.oracle.misc comp.databases.oracle.misc


Discuss does select with sample clause return variable record count in the comp.databases.oracle.misc forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
tamsun
 
Posts: n/a

Default does select with sample clause return variable record count - 11-05-2007 , 05:06 AM






hi,

Database version: Oracle Database 10.2.0.2.0
OS: RedHat Enterprise 4

I find select statement with sample clause return random count records,
e.g.

select count(*) from ADS_DENDOC;
select count(*) from (select * from ADS_DENDOC sample(1)) a;

1st run:
-----------------
1302; -- the record count of table ADS_DENDOC
9; -- the record count of sample recordset

2nd run:
-----------------
1302;
12;

3rd run:
-----------------
1302;
14;

4th run:
-----------------
1302;
10;

As Oracle sql reference written,sample_percent is a number specifying
the percentage of the total row or block count to be included in the
sample. The value must be in the range .000001 to (but not including)
100. Why the record count of sample recordset is variable


Best regards
tamsun

Reply With Quote
  #2  
Old   
fitzjarrell@cox.net
 
Posts: n/a

Default Re: does select with sample clause return variable record count - 11-05-2007 , 08:22 AM






Comments embedded.
On Nov 5, 5:06 am, tamsun <tam... (AT) gmail (DOT) com> wrote:
Quote:
hi,

Database version: Oracle Database 10.2.0.2.0
OS: RedHat Enterprise 4

I find select statement with sample clause return random count records,
As it should.

Quote:
e.g.

select count(*) from ADS_DENDOC;
select count(*) from (select * from ADS_DENDOC sample(1)) a;

1st run:
-----------------
1302; -- the record count of table ADS_DENDOC
9; -- the record count of sample recordset

2nd run:
-----------------
1302;
12;

3rd run:
-----------------
1302;
14;

4th run:
-----------------
1302;
10;

As Oracle sql reference written,sample_percent is a number specifying
the percentage of the total row or block count to be included in the
sample. The value must be in the range .000001 to (but not including)
100. Why the record count of sample recordset is variable
You didn't read far enough into that paragraph, the rest of which
reads:

"This percentage indicates the probability of each row, or each
cluster of rows in the case of block sampling, being selected as part
of the sample. It does not mean that the database will retrieve
exactly sample_percent of the rows of table."

This is an ESTIMATE, not an absolute. Your 'understanding' is
actually misunderstanding, apparently based up on reading only what
you want to read, rather than what you should be reading. If you
think you understand a function, but it's behaviour isn't what you
expect, then it's highly likely you don't fully understand all that
you should about that function. Selective reading of the
documentation is a prime source for such confusion.

Quote:
Best regards
tamsun

David Fitzjarrell



Reply With Quote
  #3  
Old   
tamsun
 
Posts: n/a

Default Re: does select with sample clause return variable record count - 11-05-2007 , 07:53 PM



fitzjarrell (AT) cox (DOT) net wrote:
Quote:
Comments embedded.
On Nov 5, 5:06 am, tamsun <tam... (AT) gmail (DOT) com> wrote:
hi,

Database version: Oracle Database 10.2.0.2.0
OS: RedHat Enterprise 4

I find select statement with sample clause return random count records,

As it should.

e.g.

select count(*) from ADS_DENDOC;
select count(*) from (select * from ADS_DENDOC sample(1)) a;

1st run:
-----------------
1302; -- the record count of table ADS_DENDOC
9; -- the record count of sample recordset

2nd run:
-----------------
1302;
12;

3rd run:
-----------------
1302;
14;

4th run:
-----------------
1302;
10;

As Oracle sql reference written,sample_percent is a number specifying
the percentage of the total row or block count to be included in the
sample. The value must be in the range .000001 to (but not including)
100. Why the record count of sample recordset is variable

You didn't read far enough into that paragraph, the rest of which
reads:

"This percentage indicates the probability of each row, or each
cluster of rows in the case of block sampling, being selected as part
of the sample. It does not mean that the database will retrieve
exactly sample_percent of the rows of table."

This is an ESTIMATE, not an absolute. Your 'understanding' is
actually misunderstanding, apparently based up on reading only what
you want to read, rather than what you should be reading. If you
think you understand a function, but it's behaviour isn't what you
expect, then it's highly likely you don't fully understand all that
you should about that function. Selective reading of the
documentation is a prime source for such confusion.

Best regards
tamsun


David Fitzjarrell

thank you, David.



Reply With Quote
  #4  
Old   
Malcolm Dew-Jones
 
Posts: n/a

Default Re: does select with sample clause return variable record count - 11-06-2007 , 07:03 PM



fitzjarrell (AT) cox (DOT) net (fitzjarrell (AT) cox (DOT) net) wrote:
: Comments embedded.
: On Nov 5, 5:06 am, tamsun <tam... (AT) gmail (DOT) com> wrote:
: > hi,
: >
: > Database version: Oracle Database 10.2.0.2.0
: > OS: RedHat Enterprise 4
: >
: > I find select statement with sample clause return random count records,

: As it should.

: > e.g.
: >
: > select count(*) from ADS_DENDOC;
: > select count(*) from (select * from ADS_DENDOC sample(1)) a;
: >
: > 1st run:
: > -----------------
: > 1302; -- the record count of table ADS_DENDOC
: > 9; -- the record count of sample recordset
: >
: > 2nd run:
: > -----------------
: > 1302;
: > 12;
: >
: > 3rd run:
: > -----------------
: > 1302;
: > 14;
: >
: > 4th run:
: > -----------------
: > 1302;
: > 10;
: >
: > As Oracle sql reference written,sample_percent is a number specifying
: > the percentage of the total row or block count to be included in the
: > sample. The value must be in the range .000001 to (but not including)
: > 100. Why the record count of sample recordset is variable

: You didn't read far enough into that paragraph, the rest of which
: reads:

: "This percentage indicates the probability of each row, or each
: cluster of rows in the case of block sampling, being selected as part
: of the sample. It does not mean that the database will retrieve
: exactly sample_percent of the rows of table."

: This is an ESTIMATE, not an absolute. Your 'understanding' is
: actually misunderstanding, apparently based up on reading only what
: you want to read, rather than what you should be reading. If you
: think you understand a function, but it's behaviour isn't what you
: expect, then it's highly likely you don't fully understand all that
: you should about that function. Selective reading of the
: documentation is a prime source for such confusion.

: >
: > Best regards
: > tamsun


There are other sligthly subtle things to notice about sample.

If you choose a sample size that you expect will return about 1 row then
there is a large probability on any one run that you will get no rows, so
if you want to be sure to have at least one row then you need to ask for
more than one row. (Of course you can't ask for any specific number of
rows, what I mean is what you choose as a sample probability to get a
desired size.)

But if you choose a sample size that likely returns more than one row, for
the sake of example lets say 3 rows, then the first row in the sample will
almost always come from amongst the first 1/3 of the data (what ever
"first" means, but it doesn't seem to vary from run to run within a single
table). So choosing a sample of (e.g.) about three rows and taking the
first row will never be a good sample of the entire dataset.

However, sample does seem to run extremely quickly compared to other
methods of getting random selections of data, so its nice to try to use
it. To get a well chosen single row I think you can union a few
individual sampling queries each of which should return about one row.
That gives you a higher chance of having one row and it being from any
where in the data.

I assume there are better techniques, but sample and union are both super
easy to use - meaning I can type them at the sql prompt with a high chance
of no mistakes.


Reply With Quote
  #5  
Old   
William Robertson
 
Posts: n/a

Default Re: does select with sample clause return variable record count - 11-07-2007 , 01:24 PM



On Nov 7, 1:03 am, yf... (AT) vtn1 (DOT) victoria.tc.ca (Malcolm Dew-Jones)
wrote:
Quote:
fitzjarr... (AT) cox (DOT) net (fitzjarr... (AT) cox (DOT) net) wrote:

: Comments embedded.
: On Nov 5, 5:06 am, tamsun <tam... (AT) gmail (DOT) com> wrote:
: > hi,
:
: > Database version: Oracle Database 10.2.0.2.0
: > OS: RedHat Enterprise 4
:
: > I find select statement with sample clause return random count records,

: As it should.

: > e.g.
:
: > select count(*) from ADS_DENDOC;
: > select count(*) from (select * from ADS_DENDOC sample(1)) a;
:
: > 1st run:
: > -----------------
: > 1302; -- the record count of table ADS_DENDOC
: > 9; -- the record count of sample recordset
:
: > 2nd run:
: > -----------------
: > 1302;
: > 12;
:
: > 3rd run:
: > -----------------
: > 1302;
: > 14;
:
: > 4th run:
: > -----------------
: > 1302;
: > 10;
:
: > As Oracle sql reference written,sample_percent is a number specifying
: > the percentage of the total row or block count to be included in the
: > sample. The value must be in the range .000001 to (but not including)
: > 100. Why the record count of sample recordset is variable

: You didn't read far enough into that paragraph, the rest of which
: reads:

: "This percentage indicates the probability of each row, or each
: cluster of rows in the case of block sampling, being selected as part
: of the sample. It does not mean that the database will retrieve
: exactly sample_percent of the rows of table."

: This is an ESTIMATE, not an absolute. Your 'understanding' is
: actually misunderstanding, apparently based up on reading only what
: you want to read, rather than what you should be reading. If you
: think you understand a function, but it's behaviour isn't what you
: expect, then it's highly likely you don't fully understand all that
: you should about that function. Selective reading of the
: documentation is a prime source for such confusion.

:
: > Best regards
: > tamsun

There are other sligthly subtle things to notice about sample.

If you choose a sample size that you expect will return about 1 row then
there is a large probability on any one run that you will get no rows, so
if you want to be sure to have at least one row then you need to ask for
more than one row. (Of course you can't ask for any specific number of
rows, what I mean is what you choose as a sample probability to get a
desired size.)

But if you choose a sample size that likely returns more than one row, for
the sake of example lets say 3 rows, then the first row in the sample will
almost always come from amongst the first 1/3 of the data (what ever
"first" means, but it doesn't seem to vary from run to run within a single
table). So choosing a sample of (e.g.) about three rows and taking the
first row will never be a good sample of the entire dataset.

However, sample does seem to run extremely quickly compared to other
methods of getting random selections of data, so its nice to try to use
it. To get a well chosen single row I think you can union a few
individual sampling queries each of which should return about one row.
That gives you a higher chance of having one row and it being from any
where in the data.

I assume there are better techniques, but sample and union are both super
easy to use - meaning I can type them at the sql prompt with a high chance
of no mistakes.
You can also use a SEED(n) expression if you want to repeat the same
random sample, or attempt to at least:

"SEED seed_value Specify this clause to instruct the database to
attempt to return the same sample from one execution to the next. The
seed_value must be an integer between 0 and 4294967295. If you omit
this clause, then the resulting sample will change from one execution
to the next."



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.