![]() | |
#1
| |||
| |||
|
|
As a part of some experimental work I've been doing, I need to write what I think is best described as a stochastic quota query. I would like to write a query like SELECT RANDOM 5 COLUMN_FOO FROM TABLE_BAR where RANDOM is like TOP, except that for successive queries, I get a (possibly) different set of 5 COLUMN_FOO values from table TABLE_BAR. Has anyone seen such a beast? Thanks in advance, Joe |
#2
| |||
| |||
|
|
As a part of some experimental work I've been doing, I need to write what I think is best described as a stochastic quota query. |
#3
| |||
| |||
|
|
You can write some proprietary kludges, but frankly they have problems with skewed distributions. .... |
|
then load it with the particular random numbers I wanrted from a stat package. More work, but MUCH better results. |
#4
| |||
| |||
|
|
In the first place, this does not answer the question the OP asked. |
#5
| ||||
| ||||
|
|
I suggest the OP read "Massive Stochastic Testing of SQL" by Don Slutz of Microsoft. |
|
I am VERY interested in understanding why these 'proprietary kludges' (either to select random samples, or to generate random numbers) 'have problems' with skewed distributions. (Disclosure - I worked on one.) |
|
Repeated use of the same set of random numbers will generate an identical sample each time it's used. |
|
From a practical point of view, you would be required to generate a new set of random values for each query. I am also curious to know how you use this table to restrict the rows being returned in the general case to any kind of uniform random sample. |
#6
| |||||
| |||||
|
|
I suggest the OP read "Massive Stochastic Testing of SQL" by Don Slutz of Microsoft. I need to get a copy, too. |
|
You might want to look at a current article by Ben Gan on the use of RAND() and NEWID() and differences in SQL Server 2000 and 2005. He covers the problems with how a CASE expression uses RAND(), deterministic functions, duplicate values, etc. |
|
I am VERY interested in understanding why these 'proprietary kludges' (either to select random samples, or to generate random numbers) 'have problems' with skewed distributions. (Disclosure - I worked on one.) The SQL products I have worked with use traditional Linear Congruential algorithms, which they inherited from C and UNIX. As you get larger and larger samples, you get skewing and duplicate values. Knuth Vol #2, Chapter 3 has a good history and some remarks about the history. The best (worst?) horror story in the 1970's was the discovery that an IBM FORTRAN routine was not valid. It trashed quite a few PhD projects. That was the most popular tool in those days for research. |
|
However, consider that one of the advantages of RNG is that you can repeat an experiment. |
|
But if I want a fixed table, I think most statisticians would agree that the RAND corporation "Table of One Million Random Digits" has been tested in every possible way for mathematical correctness. That is a fixed table available on diskette! |
![]() |
| Thread Tools | |
| Display Modes | |
| |