dbTalk Databases Forums  

Any gotchas for STMM

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss Any gotchas for STMM in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Richard
 
Posts: n/a

Default Re: Any gotchas for STMM - 06-16-2010 , 10:24 AM






B.U.M.P
Bring up my post. -Richard

Reply With Quote
  #12  
Old   
Mark A
 
Posts: n/a

Default Re: Any gotchas for STMM - 06-20-2010 , 01:46 PM






"ajstorm (AT) ca (DOT) ibm.com" <ajstorm (AT) gmail (DOT) com> wrote

Quote:
Mark,

Thanks for the response. I can respond to some of your points here,
but others we should probably discuss in more detail through email as
they don't seem generally applicable to a broad audience.

If you're willing, I'd be interested in learning more about these
problems. You can send me these details via email.
Thanks for the offer, but after many months of desperation trying to get
STMM to work on a critical database server with multiple instances and
databases, we gave up and just hard-coded the memory values. It was not very
difficult. I don't think there is anything to discuss at this point.

Maybe you could contact someone is support and just go over every PMR opened
on STMM if you want more information on the problems customers have
encountered.

Quote:
You make a valid point, but it's only partially correct. I agree that
DB2's default configuration will not give you optimal performance.
That being said, there are two things that you're not considering.
First of all, DB2 defaults have changed substantially over the past 10
years. For example, it's now the case that the DB2 Configuration
Advisor will run automatically as part of database creation. After
the Configuration Advisor completes, it will have set 36 of the
database configuration parameters (including the size of the default
buffer pool) based on the machine specifications. The result is a
"default" configuration that is tailored to the environment in which
the database will be run.

The second thing that you may not be considering is that even
competent DBAs do not always have the time to optimally configure each
and every database created at their shop. I know many DBAs that will
devote hours and hours to optimally configure some of their databases
and yet, for their test environments, they're happy to let STMM do the
heavy lifting. That being said, some of these same DBAs have enabled
STMM on their most critical databases and found that its tuning
outperformed their hand tuned configuration.
There were very few default parameters that were changed 10 years ago, and
none of the important ones. In DB2 Version 8.2 (which was used by many until
about 2 two years ago, did not have STMM and had the following as defaults:

LOCKLIST 100 (400 KB)
(Linux/UNIX) IBMDEFAULTBP bufferpool 1000 (4 MB)
(Windows) IBMDEFAULTBP bufferpool 250 (1 MB)
LOGBUFSZ 8 (32 KB)
etc.

These are the same exact values as OS/2 Database Manager circa 1990, when
the largest PC's had 16 MB of memory.

Only after 8.2, STMM was added (introduced in 9.1 but changed in 9.5) and in
9.5 auto-configure was made the default (but not before then).

If the DB2 documentation had been written so people could understand typical
values that should be used (in the Reference Guides, not some other manual)
based on the types of applications, then STMM would not have necessary. The
auto-configure is nice, but that was not invoked automatically until 9.5 and
is not very easy to use properly IMO (using properly would force the
answering of the key questions about the intended database).

The problem of STMM is threefold:

1. When multiple bufferpools are desired to accommodate different priorities
for different tables, then STMM cannot know that, as it treats all SQL (and
the tables they go after) the same, giving equal weight to all of them. If
STMM is used for bufferpools (-2), I am not even sure if there is any
benefit to having multiple bufferpools.

2. STMM can cause severe database server problems, such as when STMM gives
up memory (as it frequently does when not needed at a particular moment) but
cannot get the memory back when it tries to get the memory back a few
seconds later. This obviously does not always happen, but when it does, it
can be catastrophic for a OLTP system with high transaction rates. When we
opened PMR's on this problem, IBM was not able to resolve it (9.5.4).

3. STMM has had problems in the past with being to manage a large number of
databases with multiple instances, especially under Linux. IBM support flat
out told us that automatic instance memory would not work under Linux if
more than one instance existed since DB2 could not coordinate the multiple
instance memory (and the databases within those instances).

As I pointed out, some of these may been fixed in 9.5.5 or 9.7.x, but they
were very serious problems. But even before these newer releases were
available, we were getting the same story from (some) in IBM who claimed
everything was fine with STMM, while we told by other IBM'ers (correctly)
that were still some problems that STMM that occurred in certain situations.
Even though improvements have been made in STMM in the most recent fixpacks
and in 9.7, I am skeptical about the claim now (from people who have not
admitted the past problems) that everything is now working perfectly.

Quote:
I'd be interested to hear more about these situations, perhaps over
email.

I'd be interested in hearing more about the problems you've hit over
email. While there have been some issues with multiple instances that
we've fixed, they only affected a hand-full of customers.
I don't have time to do that. We already opened several PMR's and engaged
Lab Services to assist, but nothing worked. So we just hard-coded the values
(which was quite simple for any competent DBA to do).

As to how many people have been affected by the problems, I don't think you
have a good gauge on the real number. In our case we use DB2 Linux for
mission critical databases running moderate to high transaction rates, and
we cannot afford to have any problems. A lot of customers don't have such
mission critical systems, or don't use DB2 LUW for them. For example, IBM
doesn't even have any mission critical systems (where they could go out of
business if a database was down for 4 hours).

Surveys conducted by independent consultants have shown that at least 25% of
customers have had at least some problems with STMM. Those running a single
instance and single database have probably had the fewest problems. AIX or
Windows has probably had much fewer problems than Linux.

For example, a poll conducted on 2010-03-12 by DB2Night (file
20100312DB2Night14.wmv on www.DB2NightShow.com) revealed the following:

Are you using STMM in Productions:

Yes, and with good results 17%
Yes, and uncertain of the measureable results 22%
No, we tried it but suffered with adverse consequences 28%
No, we are still on 8.2 or earlier 6%
No, we are not ready to turn it on yet 28%

BTW, IBM'ers are frequent presenters on the DB2NightShow sessions.

Quote:
I think you may not completely understand how STMM works with multiple
buffer pools. STMM works to optimize the configuration of multiple
buffer pools not by trying to increase their hit rates, but instead by
determining a configuration that will lead to the minimum possible
amount of time spent retrieving pages from disk. With this model, it
is valuable to treat all of the buffer pools the "same" since each of
them is caching pages in an attempt to prevent disk reads/writes. If
all you care about is overall database performance, the tuning that
STMM provides for multiple bufferpools is extremely effective.
On the contrary, I do understand. As a DBA I don't necessarily want to treat
access to all tables and indexes with the same priority, especially with a
very large database where the data is many times the size of server memory.

"Overall" database performance treats every table and every SQL the same,
which is often not optimum IMO.

Granted, there are many DBA's who don't understand how to set up
bufferpools, but if IBM had provided some documentation and guidance on
this, it could be tuned manually in a matter of seconds.

Quote:
That is correct. STMM is not recommended in the Balanced Warehouse
because in DPF environments, STMM must be used only on partitions that
have similar memory requirements. That being said, I know of several
customers who are happily running STMM in DPF after exercising the
necessary precautions. You can read more about the precautions here:

http://publib.boulder.ibm.com/infoce.../c0023815.html
If you read your own doc carefully, it says that STMM can be used for DPF if
all partitions have the same characteristics as follows:

- All database partitions are on identical hardware, and there is an even
distribution of multiple logical database partitions to multiple physical
database partitions
- There is a perfect or near-perfect distribution of data
- Workloads are distributed evenly across database partitions, meaning that
no database partition has higher memory requirements for one or more heaps
than any of the others

For the Balance Warehouse offerings from IBM, all of the above requirements
are true (since IBM has complete control over them). The reason why the
consultants who configured the Balanced Warehouse don't use STMM is because
they have had problems with it, and not because it does not meet the
requeirements stated above.

If IBM solves all the problems with STMM, then that may change, but the
reason for not using it in 9.5 Balanced Warehouse was because of the many
problems they encountered. BTW, the IBM Balanced Warehouse config also says
that auto-configure must be turned off, since it creates havic. I think this
recommendation to shut it off also applied to single partition databases in
9.7.0 (fixpack 0), but I don't recall.

Quote:
That is unfortunate because that's not the official IBM position.
IBM'ers who make a living by giving consulting advice to customers for
hundreds of dollars per hour (and even more than that for classes) and who
actually go to customer sites to implement things, cannot worry about the "
the official IBM position" of a bunch of marketing people who are trying to
cram stuff down our throats. The customer comes first, and keeping the
customer systems up and running is more important than your marketing goals.

This is surely the most troublesome comment I have heard from IBM in a long
time, because it shows that IBM is not listening to customers or their own
consultants while trying to force things into the marketplace before they
are fully tested.

Quote:
I would strongly disagree with your argument. While memory may be
cheap and plentiful in your shop, most of our customers are
consolidating servers to the point where many databases are all
fighting for the same small amount of memory. It is in these
environments where STMM can be the most effective at managing the
needs of the databases, especially if their peak workload requirements
are at different times of the day. In general, I think you're greatly
oversimplifying the configuration dilemma faced by a DBA in the
absence of tools like STMM.
With the exception of bufferpools, the things that STMM controls uses an
insignificant amount of memory in 99% of the cases. I have either 32 GB or
64 GB of memory on all my database servers and with the exception of
bufferpools, the other things that STMM controls don't amount to anything
even close to 1 GB (even with multiple databases). If IBM had set more
realistic defaults for these, or better yet just documented how to set them
for realistic scenarios likely to be encountered, then there would be very
little wasted memory and no need for STMM.

Quote:
Mark, I've been personally involved in almost all of the STMM APARs to
date.
There have been STMM APARs? I thought you clearly implied it has been
working quite well? In fact there have been many PMRs and many APARs and
many changes made to STMM without APARs, to fix the problems.

You are implying that the problems are all fixed now. I am sure there has
been improvements over time up to and including 9.7.2, but since you are not
exactly candid about the past problems with STMM, then how can I trust you
when you now say it works fine now? I can't risk my company on something
that only takes a few minutes to hardcode (about 5 parameters). Also, I
cannot migrate all my databases to 9.7 at this time due to the amount of
regression testing that would be needed on the application side.

If IBM had documented how to set the 5 parameters controlled by STMM (don't
recall the exact number) for various types of database scenarios, STMM would
not have been necessary. The one possible exception is bufferpools, but if
DBA's just allocate 50% of the server memory to bufferpools (the total of
all bufferpools for all databases on the server) then they wouldn't need
STMM to be constantly trying to adjust for them. You would be surprised how
many people try and use the default of 1000 4K pages per database and then
complain about performance.

Quote:
There are a great many DBAs (one of which has already posted to this
thread) who are quite happy with STMM. I think it's a gross mis-
statement of the facts to say that STMM was designed to sell DB2 to
executives as opposed to helping out DBAs.
If IBM really cared about their current customers, they would not be
releasing code that has so many bugs. All of these changes are to sell DB2
to new customers who think DB2 is too complex. In some ways it is too
complex, but in reality if IBM manuals provided "how-to" documentation for
setting up the memory values (other than the default, min, and max values)
for various types of common database scenarios, very few, STMM would not be
needed.

Quote:
Again, I welcome feedback about STMM and am willing to help you
through any issues you may be having. Please follow-up via email.

Thanks,
Adam
I appreciate your offer, but I am extremely busy. I don't need your help
since I solved the problems by hard-coding the STMM memory settings. If you
need my help to debug DB2, then you would have to pay my company for my
time, which I doubt you are willing to do.

One other point. I am not against automating the memory configurations for
DB2. Many of them are/were ridiculously complex. The use of automatic memory
was a great improvement, but STMM is a different story and I can not risk my
company on it based on the problems we have already encountered versus a
payback that questionable assuming a competent DBA is available (and no, it
doesn't take months to tune it, just minutes).

Reply With Quote
  #13  
Old   
Mark A
 
Posts: n/a

Default Re: Any gotchas for STMM - 06-20-2010 , 01:50 PM



"Jean-Marc Blaise" <jmblaise (AT) hotmail (DOT) com> wrote

Quote:
I have many customers using STMM in France in DB2 9.5 (from FP3 to FP5) and
we have not problem with it.
We have only deactivated because of 1 application the tuning of LOCKLIST,
that's all.

Regards,

JM
Well, if the application using that one database were you had a problem with
LOCKLIST was a mission critical application, the STMM problems encountered
could have bankrupted your customer. I need a database that stays up, and/or
doesn't hang, all the time, not just most of the time.

Also, I can't play Russian Roulette trying to figure out when it works and
when it doesn't work.

Reply With Quote
  #14  
Old   
Serge Rielau
 
Posts: n/a

Default Re: Any gotchas for STMM - 06-21-2010 , 02:48 PM



Mark,

How about sending a note to Adam with your company affiliation so he can
drill down on the issues your company had on his own?
That's not much work for you and saves Adam from divining up what might
have been wrong in your case.

Cheers
Serge


--
Serge Rielau
SQL Architect DB2 for LUW
IBM Toronto Lab

Reply With Quote
  #15  
Old   
Mark A
 
Posts: n/a

Default Re: Any gotchas for STMM - 06-21-2010 , 05:19 PM



"Serge Rielau" <srielau (AT) ca (DOT) ibm.com> wrote

Quote:
Mark,

How about sending a note to Adam with your company affiliation so he can
drill down on the issues your company had on his own?
That's not much work for you and saves Adam from divining up what might
have been wrong in your case.

Cheers
Serge
Ok. I will do that.

Reply With Quote
  #16  
Old   
Frederik Engelen
 
Posts: n/a

Default Re: Any gotchas for STMM - 06-22-2010 , 06:04 AM



Mark,

I'm not going into more detail regarding the whole STMM thing except
saying that it works pretty well for us, as long as we fix the
instance_memory parameter.

What I am curious for is why you would only assign 50% of server
memory to the bufferpools. Give or take a few gigs for OS and database
housekeeping, that would leave half of your server memory unused, no?

Kind regards,

Frederik

Reply With Quote
  #17  
Old   
Mark A
 
Posts: n/a

Default Re: Any gotchas for STMM - 06-22-2010 , 08:45 AM



"Frederik Engelen" <engelenfrederik (AT) gmail (DOT) com> wrote

Quote:
Mark,

I'm not going into more detail regarding the whole STMM thing except
saying that it works pretty well for us, as long as we fix the
instance_memory parameter.

What I am curious for is why you would only assign 50% of server
memory to the bufferpools. Give or take a few gigs for OS and database
housekeeping, that would leave half of your server memory unused, no?

Kind regards,

Frederik
I would normally assign more than 50% of total system memory to buffepools.
But most DB2 novices used the defaults in 8.2 or very small amounts which
are closer to 1% or less, so 50% would be a huge improvement over that. One
might go as high as 75-80% depending on total server memory and other
factors, but in most situations one would not notice much difference between
50% and 75%. Also, at least with Linux, DB2 servers do tend to run out of
memory for various reasons.

The documenation of Linux kernel parameters is someitmes contradictory in
the manuals, or sometimes has been completely lacking.

For example, although not mentioned anywhere in the official doc, some
Redbooks recomend:

vm.swappiness=0 (default for RHEL is 60)
vm.dirty_ratio=10
vm.dirty_background_ratio=5

Recommendations for SHMALL have been all over the place, from 90% of system
memory, 100% of system memory, to now apparently 200% of system memory. Any
changes to Linux Kernel Parm recommendations should be a Hiper doc APAR and
not slipstreamed in the InfoCenter.

I noticed in the 9.7 Fixpack 2 Info Center webpages, they are now enforcing
Linux kernel parms automatically that were not enfoced even in 9.7.1. This
is obviously to fix the problems that customers have been experiencing with
memory on DB2 Linux systems, especially with STMM activated.

Reply With Quote
  #18  
Old   
Mark A
 
Posts: n/a

Default Re: Any gotchas for STMM - 06-22-2010 , 09:13 AM



"Mark A" <noone (AT) nowhere (DOT) com> wrote

Quote:
Recommendations for SHMALL have been all over the place, from 90% of
system memory, 100% of system memory, to now apparently 200% of system
memory. Any changes to Linux Kernel Parm recommendations should be a Hiper
doc APAR and not slipstreamed in the InfoCenter.
Here is where it states SHMALL should be 200% (and now enfoced that way in
9.7.2).
"2 * <size of RAM in bytes> (setting is in 4K pages)"
http://publib.boulder.ibm.com/infoce.../c0057140.html

Here is where it states that SHMALL should be 90%:
"...whereas the parameter SHMALL should be set to 90% of the available
memory on the database server."
http://publib.boulder.ibm.com/infoce.../c0054689.html

Is anyone at IBM awake these days?

Reply With Quote
  #19  
Old   
Serge Rielau
 
Posts: n/a

Default Re: Any gotchas for STMM - 06-22-2010 , 11:25 AM



On 6/22/2010 9:13 AM, Mark A wrote:
Quote:
"Mark A"<noone (AT) nowhere (DOT) com> wrote in message
news:hvqba4$h5f$1 (AT) news (DOT) eternal-september.org...
Recommendations for SHMALL have been all over the place, from 90% of
system memory, 100% of system memory, to now apparently 200% of system
memory. Any changes to Linux Kernel Parm recommendations should be a Hiper
doc APAR and not slipstreamed in the InfoCenter.
Here is where it states SHMALL should be 200% (and now enfoced that way in
9.7.2).
"2 *<size of RAM in bytes> (setting is in 4K pages)"
http://publib.boulder.ibm.com/infoce.../c0057140.html

Here is where it states that SHMALL should be 90%:
"...whereas the parameter SHMALL should be set to 90% of the available
memory on the database server."
http://publib.boulder.ibm.com/infoce.../c0054689.html

Is anyone at IBM awake these days?
We are under G20 lock down, tasered into unconsciousness...

The recommendation has been changed from 90% top 200%. So the 90% is
outdated.
I have used the Feedback button in the wrong doc to get this fixed
(Hint, hint, it does NOT require an IBM employee to use this
button...docs are big, mistakes happen)

Cheer
Serge
--
Serge Rielau
SQL Architect DB2 for LUW
IBM Toronto Lab

Reply With Quote
  #20  
Old   
The Boss
 
Posts: n/a

Default Re: Any gotchas for STMM - 06-22-2010 , 12:46 PM



On Jun 22, 5:25*pm, Serge Rielau <srie... (AT) ca (DOT) ibm.com> wrote:
Quote:
On 6/22/2010 9:13 AM, Mark A wrote:



"Mark A"<no... (AT) nowhere (DOT) com> *wrote in message
news:hvqba4$h5f$1 (AT) news (DOT) eternal-september.org...
*Recommendations for SHMALL have been all over the place, from 90%of
*system memory, 100% of system memory, to now apparently 200% of system
*memory. Any changes to Linux Kernel Parm recommendations should be a Hiper
*doc APAR and not slipstreamed in the InfoCenter.
Here is where it states SHMALL should be 200% (and now enfoced that wayin
9.7.2).
"2 *<size of RAM in bytes> *(setting is in 4K pages)"
http://publib.boulder.ibm.com/infoce...pic/com.ibm.db...

Here is where it states that SHMALL should be 90%:
"...whereas the parameter SHMALL should be set to 90% of the available
memory on the database server."
http://publib.boulder.ibm.com/infoce...pic/com.ibm.db...

Is anyone at IBM awake these days?

We are under G20 lock down, tasered into unconsciousness...

The recommendation has been changed from 90% top 200%. So the 90% is
outdated.
I have used the Feedback button in the wrong doc to get this fixed
(Hint, hint, it does NOT require an IBM employee to use this
button...docs are big, mistakes happen)

Cheer
Serge
What I would like to see in the docs (and "Best Practice" documents)
is a rationale for these kind of recommendations.
Why is 200% better than 90%?
And is this valid under all circumstances, like running (many)
virtualised Linux-boxes under zVM (or VMware)?
As is, figures like these are just 'silver bullets' and should be
handled with great caution.

--
Jeroen

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.