dbTalk Databases Forums  

Re: Break Up Large Table Query Into Results of N Rows

comp.databases.ms-sqlserver comp.databases.ms-sqlserver


Discuss Re: Break Up Large Table Query Into Results of N Rows in the comp.databases.ms-sqlserver forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
pbd22
 
Posts: n/a

Default Re: Break Up Large Table Query Into Results of N Rows - 02-01-2012 , 09:34 AM






On Jan 31, 5:03*pm, Erland Sommarskog <esq... (AT) sommarskog (DOT) se> wrote:
Quote:
pbd22 (dush... (AT) gmail (DOT) com) writes:
On Jan 31, 9:57*am, pbd22 <dush... (AT) gmail (DOT) com> wrote:
I am working in SQL Server 2005 and want to break up a table of 1M
rows into distinct results of 80,0000.

I feel like this is basically

use [database]

select * from [table]
where email_address like '%gmail.com'
group by ????

I am messing up with the group by (if this is the right way to go).
How do achieve this?

Anybody? To people use this group any more?

I am here! However, I only look in here about once a day. The traffic
does not really warrant anymore. If you want speedy answers to your
questioms, this is a busy place:http://social.msdn.microsoft.com/For...actsql/threads
Although, a bit too many people answers questions that goes outside
what they really know.

Anyway, your problem is a little vague. What is the purpose with
this division? And must the batches be exactly 80000 rows in size
or can the number vary?

--
Erland Sommarskog, SQL Server MVP, esq... (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008:http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005:http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
Hi Erland and Bob,

Thanks both a bundle for your responses. Looks like there is life on
planet
Google Groups after all, much appreciated!

The specifics is that we are doing email deployments but google is
moving
all of the email sent to gmail users to their spam boxes. As a result,
we have
to "chunk" the gmail users out of the total amount and send in
manageable
batches. We have figured that 80,000 per batch out of the total gmail
users
in the table is possible.

And, Erland, to answer your question, I would prefer to turn this into
a stored
procedure we can use for the purpose of "chunking" email addresses.
Accordingly,
the number per batch should be a variable parameter.

Bob, the table we are querying against is pretty simple. Essentially,
it has one
one column - "email_address" which is a varchar. Its data is about 1
million email
addresses (but that number changes often). The result set table(s)
should only have
two columns, the count (INT) and the email_address (varchar). Please
see below.

The query I am trying to write is supposed to dump each result set
(batch)
to a text file in some folder on the hard drive. Each result set
should have
a count and the email addresses as columns. Something like this:

RESULT 1:

[COUNT] [EMAIL_ADDRESS]
1 name (AT) gmail (DOT) com
2 name (AT) gmail (DOT) com
3 name (AT) gmail (DOT) com
4 name (AT) gmail (DOT) com
5 name (AT) gamil (DOT) com
....
80,000 name (AT) gmail (DOT) com

RESULT 2:

[COUNT] [EMAIL_ADDRESS]
80,001 name (AT) gmail (DOT) com
80,002 name (AT) gmail (DOT) com
80,003 name (AT) gmail (DOT) com
80,004 name (AT) gmail (DOT) com
80,005 name (AT) gamil (DOT) com
....
16,000 name (AT) gmail (DOT) com

And so on up to the to total amount of the gmail addresses out of the
original table.

So, the statement should read something like this (pseudo code):

select all distinct users
from the master table
where email_address like '%gmail.com'
return in batches of N (such as 80,000)
and write each batch to a text file on the
hard drive.

I hope I have explained myself well. Let me know if anything is
unclear.

Thanks a bundle for your help.

Reply With Quote
  #2  
Old   
Bob Barrows
 
Posts: n/a

Default Re: Break Up Large Table Query Into Results of N Rows - 02-01-2012 , 02:36 PM






pbd22 wrote:
Quote:
On Jan 31, 5:03 pm, Erland Sommarskog <esq... (AT) sommarskog (DOT) se> wrote:
pbd22 (dush... (AT) gmail (DOT) com) writes:
On Jan 31, 9:57 am, pbd22 <dush... (AT) gmail (DOT) com> wrote:
I am working in SQL Server 2005 and want to break up a table of 1M
rows into distinct results of 80,0000.

Thanks both a bundle for your responses. Looks like there is life on
planet
Google Groups after all, much appreciated!

The specifics is that we are doing email deployments but google is
moving
all of the email sent to gmail users to their spam boxes. As a result,
we have
to "chunk" the gmail users out of the total amount and send in
manageable
batches. We have figured that 80,000 per batch out of the total gmail
users
in the table is possible.

And, Erland, to answer your question, I would prefer to turn this into
a stored
procedure we can use for the purpose of "chunking" email addresses.
Accordingly,
the number per batch should be a variable parameter.

Bob, the table we are querying against is pretty simple. Essentially,
it has one
one column - "email_address" which is a varchar. Its data is about 1
million email
addresses (but that number changes often). The result set table(s)
should only have
two columns, the count (INT) and the email_address (varchar). Please
see below.

The query I am trying to write is supposed to dump each result set
(batch)
to a text file in some folder on the hard drive. Each result set
should have
a count and the email addresses as columns. Something like this:

RESULT 1:

[COUNT] [EMAIL_ADDRESS]
1 name (AT) gmail (DOT) com
2 name (AT) gmail (DOT) com
snip
And so on up to the to total amount of the gmail addresses out of the
original table.

So, the statement should read something like this (pseudo code):

select all distinct users
You keep making a point of saying "distinct". Does that imply that there are
duplicate email addresses in that 1-million row table?


Quote:
from the master table
where email_address like '%gmail.com'
return in batches of N (such as 80,000)
and write each batch to a text file on the
hard drive.
I assume you can handle this part, correct? It's only batching the data that
you need help with?

Quote:
I hope I have explained myself well. Let me know if anything is
unclear.

Thanks a bundle for your help.
SQL 2008 has some paging functionality builtin but you're using SQL 2005.
I can think of a couple approaches. Here's one:
1. Create a temp table (#batches) with an identity column (indentcol) and an
email column (email). Insert the distinct email addresses into it:
insert #batches (email)
select distinct email_address from master_table where email_address like
'%gmail.com'

Then use a WHILE loop to retrieve the batches, using a variable to keep
track of them.
declare @batchsize int --convert this to a parameter for your sproc
set @batchsize=80000

declare @lastrec int, @endrec int
set @lastrec=(select max(identcol) from #batches)
set @endrec=@batchsize
WHILE @endrec-@batchsize<=@lastrec
BEGIN
select email from #batches where identcol >=@lastrec
--process the batch
delete #batches where identcol >=@lastrec
set @endrec=@endrec + @batchsize
END

Reply With Quote
  #3  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Break Up Large Table Query Into Results of N Rows - 02-01-2012 , 04:54 PM



pbd22 (dushkin (AT) gmail (DOT) com) writes:
Quote:
The specifics is that we are doing email deployments but google is
moving all of the email sent to gmail users to their spam boxes. As a
result, we have to "chunk" the gmail users out of the total amount and
send in manageable batches. We have figured that 80,000 per batch out of
the total gmail users in the table is possible.
...
Bob, the table we are querying against is pretty simple. Essentially,
it has one one column - "email_address" which is a varchar. Its data is
about 1 million email addresses (but that number changes often). The
result set table(s) should only have two columns, the count (INT) and
the email_address (varchar). Please see below.
The simple-minded solution would be:

WITH numbered AS (
SELECT email, row_number() OVER(ORDER BY email) AS rowno
FROM addresses
WHERE email LIKE '%@gmail.com'
)
SELECT email
FROM numbered
WHERE rowno > (@batchno - 1) * @batchsize
AND rowno <= (@batchno - 1) * @batchsize

But it would be far more efficient to do:

CREATE TABLE gmail_addresses (rowno int NOT NULL,
gmail nvarchar(255) NOT NULL,
CONSTRAINT pk_gmail PRIMARY KEY CLUSTERED (rowno),
CONSTRAINT pk_unique UNIQUE NONCLUSTERED(gmail))

INSERT gmail_addresses(rowno, gmail)
SELECT row_number() OVER(ORDER BY (SELECT 1)), gmail
FROM (SELECT DISTINCT email
FROM addresses
WHERE email like '%@gmail.com') AS x

Then you have materialised the row number once for all, and a selection
of 80000 accounts will be quick.




--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.