dbTalk Databases Forums  

Increased row size decreasing I/O throughput

microsoft.public.sqlserver.server microsoft.public.sqlserver.server


Discuss Increased row size decreasing I/O throughput in the microsoft.public.sqlserver.server forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Jeroen Mostert
 
Posts: n/a

Default Increased row size decreasing I/O throughput - 09-30-2011 , 06:09 PM






Here's a little puzzler. If anyone's encountered a similar issue or knows
what I'm doing wrong I'd love to hear it.

SQL Server 2008 R2. I've got a database housing a fact table for a data
warehouse. It has 722 million rows with 55 columns. Indexes are fully
rebuilt, statistics are good, etc. If I do this:

bcp "dbcc dropcleanbuffers; select one_column from bigtable;" queryout
nul -n -T

Then the I/O-subsystem is fully saturated. But if I do this:

bcp "dbcc dropcleanbuffers; select * from bigtable;" queryout nul -n -T

Then performance drops through the floor. Note that network I/O is not the
issue -- this is local (and this reflects production use).

I know what you're thinking -- "select one_column" uses an index while
"select *" does not. Good guess, but not this time. Both queries use a
clustered index scan, since there is no index on one_column.

Frankly, this is baffling to me. A clustered index scan is a clustered index
scan, right? In fact, gathering query statistics shows me SQL Server is
reporting the same number of the same types of reads, it just seems to read
a whole lot slower if more columns are involved. In fact, performance seems
to be roughly inversely proportional to the number of columns selected. The
position of the columns doesn't seem to be a factor.

Googling around gives no suggestions as to why a bigger row size should be
such a significant factor, but it is -- highly significant, to the tune of a
tenfold slowdown. Surely constructing the result set doesn't produce so much
overhead that I/O itself stalls... right? I'm open to ideas.

--
J.

Reply With Quote
  #2  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Increased row size decreasing I/O throughput - 10-01-2011 , 11:58 AM






Jeroen Mostert (jmostert (AT) xs4all (DOT) nl) writes:
Quote:
SQL Server 2008 R2. I've got a database housing a fact table for a data
warehouse. It has 722 million rows with 55 columns. Indexes are fully
rebuilt, statistics are good, etc. If I do this:
And there are no LOB columns, all rows are safely below 8000 bytes in
size (so there is no overflow data)?

What is the average row size?




--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Reply With Quote
  #3  
Old   
Jeroen Mostert
 
Posts: n/a

Default Re: Increased row size decreasing I/O throughput - 10-01-2011 , 04:21 PM



On 2011-10-01 18:58, Erland Sommarskog wrote:
Quote:
Jeroen Mostert (jmostert (AT) xs4all (DOT) nl) writes:
SQL Server 2008 R2. I've got a database housing a fact table for a data
warehouse. It has 722 million rows with 55 columns. Indexes are fully
rebuilt, statistics are good, etc. If I do this:

And there are no LOB columns, all rows are safely below 8000 bytes in
size (so there is no overflow data)?

Sorry, forgot to mention that -- it would be a really obvious sinkhole of
course. No, there is no LOB or row-overflow data in the table at all.

Quote:
What is the average row size?

491 bytes. The optimizer thinks it's 1500, though. Manipulating this
assumption has no noticeable effect.

It gets more interesting. If I just leave out "DBCC DROPCLEANBUFFERS" and
limit it to a table that fits in memory, overall time goes down but the
speed difference between selecting one column or all remains. It seems I'm
hitting a CPU/memory issue, the reduced I/O was a red herring. SQL Server is
simply not pulling in more data than it can serve. Not what you'd expect
since being disk I/O-bound is so much more common.

The only way to work around this, it seems, is to start exploiting
parallelism. Executing multiple queries on distinct ranges in the table
simultaneously is faster than executing one that reads all of it (the
execution plans are the same for all queries regardless of range, and so is
their speed, but executing multiple in parallel improves things).

The optimizer doesn't seem to be clever enough to do something like a fully
parallel table scan with a nondeterministic gathering of rows (assuming such
a thing could overcome the bottleneck). I can force a parallel execution
plan by adding "WHERE SIGN(ID) = 1" to the query, but although it engages
multiple cores it actually slows things down a bit -- I guess it's only
effectively parallelizing the calculation.

Fortunately, querying the fact table needs to be fast for Analysis Services,
and that has no problem parallelizing queries if you use multiple partitions
in your cube. Partitioning was on the agenda anyway, I just moved it ahead.
Problem solved. The speedup is not quite linear with the number of
partitions, but it's respectable nevertheless.

I tried partitioning the table itself as well, incidentally (or rather a
copy of it containing fewer rows) but this does not seem to encourage
parallelism either. Perhaps you need to have the partitions in different
file groups for that -- I haven't tried yet.

--
J.

Reply With Quote
  #4  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Increased row size decreasing I/O throughput - 10-02-2011 , 07:26 AM



Jeroen Mostert (jmostert (AT) xs4all (DOT) nl) writes:
Quote:
The only way to work around this, it seems, is to start exploiting
parallelism. Executing multiple queries on distinct ranges in the table
simultaneously is faster than executing one that reads all of it (the
execution plans are the same for all queries regardless of range, and so
is their speed, but executing multiple in parallel improves things).
And is not the likely explanation for this that the bottleneck is in
the client, rather than in SQL Server? Even if BCP is told to write
the data to the NUL device, it still has to receive the data, perform
the file-system operations etc. So you need multiple clients for speed.

--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Reply With Quote
  #5  
Old   
Jeroen Mostert
 
Posts: n/a

Default Re: Increased row size decreasing I/O throughput - 10-02-2011 , 11:06 AM



On 2011-10-02 14:26, Erland Sommarskog wrote:
Quote:
Jeroen Mostert (jmostert (AT) xs4all (DOT) nl) writes:
The only way to work around this, it seems, is to start exploiting
parallelism. Executing multiple queries on distinct ranges in the table
simultaneously is faster than executing one that reads all of it (the
execution plans are the same for all queries regardless of range, and so
is their speed, but executing multiple in parallel improves things).

And is not the likely explanation for this that the bottleneck is in
the client, rather than in SQL Server? Even if BCP is told to write
the data to the NUL device, it still has to receive the data, perform
the file-system operations etc. So you need multiple clients for speed.

Well poTAYto, poTAHto. If there is no client to receive data, you can't run
a query that returns a result set, so transferring data between SQL Server
and the client is the bottleneck, regardless of who is "to blame".

I could retest with an absolutely minimal application that really does
nothing but receive and discard rows as fast as possible to determine the
baseline speed of the process, but that is of little concern to me since no
actual process that does anything interesting should be slower than BCP
doing a copy to the NUL device. It may be of interest to an SQL Server
developer just in case there really is something in SQL Server that could be
improved, but that's not likely.

So yes, I fully agree with you that the client is the bottleneck, but then
any interesting client (for my purposes) would be.

--
J.

Reply With Quote
  #6  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Increased row size decreasing I/O throughput - 10-02-2011 , 12:26 PM



Jeroen Mostert (jmostert (AT) xs4all (DOT) nl) writes:
Quote:
Well poTAYto, poTAHto. If there is no client to receive data, you can't
run a query that returns a result set, so transferring data between SQL
Server and the client is the bottleneck, regardless of who is "to
blame".
Yes, performance measurement is no by means difficult. The principles of
quantum mechanics very much apply. It's very easy to walk into the trap
where you measure something else than you intended to measure.

Quote:
So yes, I fully agree with you that the client is the bottleneck, but then
any interesting client (for my purposes) would be.
Depending how much data your process returns, as your test indicated.

It's a good demonstration that SELECT * just by routine is bad. I will
have to admit that I had not thought of it that way.


--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Reply With Quote
  #7  
Old   
Jeroen Mostert
 
Posts: n/a

Default Re: Increased row size decreasing I/O throughput - 10-02-2011 , 12:57 PM



On 2011-10-02 19:26, Erland Sommarskog wrote:
Quote:
Jeroen Mostert (jmostert (AT) xs4all (DOT) nl) writes:
So yes, I fully agree with you that the client is the bottleneck, but then
any interesting client (for my purposes) would be.

Depending how much data your process returns, as your test indicated.

It's a good demonstration that SELECT * just by routine is bad. I will
have to admit that I had not thought of it that way.


The difference between returning 55 columns and returning 1 is very much
measurable when you're dealing with great volumes, so yes, spelling out what
exactly you want is never a bad idea even from a performance standpoint (as
well as many others, of course). That said, I'm "only" selecting 35 columns
in the actual application. Thanks to your remark, though, I reexamined it
and found out I could actually drop one. I doubt this'll have a significant
influence on performance, but at this scale everything helps, so thanks. :-)

--
J.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.