![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
This weekend, we upgraded from PSQL 2000i SP4+Hotfixes to Pervasive.SQL V8.1 + SRDE and MKDE Hotfixes on our production server (Win2k SP3, 2CPU, 2GB RAM, unlimited license). This was after upgrading and testing V8 on two other, smaller servers (Win2K SP3, 1 or 2 processors, ~1.3GB RAM, 50 user licenses each). After running for a period of time less than 8 hours, the server 'sort of' hung on us in an odd way. Existing Btrieve connections seemed fine, and it looked as if new btrieve connections could be created. However, relational connections could not be made, and there were problems with ??some?? existing connections. Since this was a production machine, I could not take an excessive amount of time to identify the issues. Also, the time was ~3:30am, and I am on Vicodan for pain when sleeping, so I was not at my sharpest (to say the least). We could not find a specific problem, but identified an 'issue' in the installation procedure that we thought might have been the cause (the server rolled back some changes the install program made because they were protected o/s files). To fix this, we got a later MDAC version (these were ODBC and other data access bits), and installed, then restarted the server. Less than two hours after this install and reboot, the problem happened again. Behavior was about the same as before, but then Btrieve connections started hanging. When trying to shut down services, I could stop the relational service, but the transactional service would not close. The Pervasive log file recorded a series of MKDE errors about the Invalidated Page List being full, starting some time after the problem occured (problem occured about 6:05am, errors started around 6:15am) (Unfortunately, I did not think to save off a copy of the log before rolling the server back to 2000i. Mea Culpa. The rollback was urgent on the production machine, and I was groggy.) Has anyone else seen *anything* like this before? We work in a relatively high-volume envoronment with mixed Btrieve and ODBC-through-ADO access, including a number of Stored Procedures. I *suspect* a problem in caching (the invalidated page message implies this to me), and noted that the DB process in NT had peaked at 1.8GB of memory in use both times, but was currently at about 1.3GB of RAM in use. Perhaps simply changing the max memory from 90% to (say) 75% would be sufficient to fix the problem, but I cannot afford to test on a live system supporting over 200 users / 1000 user connections, and the other systems we have have both (1) smaller total RAM, and (2) fewer licenses. Testing on these had not turned up any similar issues. I'm not even sure how to set up a test case. In both cases, different programs were running on multiple (between 12 and 60) machines over the network, using a combination of Btrieve and ODBC access. Other than the configuration values (which were pretty much standard/default except for the communications buffer at 16kB), I am not sure what more I can give you. I have an incident in at Pervasive, but just wanted to see if anyone here had seen anything like this before with 8.1 + Hotfixes, of with V8 any edition. Brendt Hess bhess-at-rsmedical-dot-com |
#3
| |||
| |||
|
|
Brendt: On a large system with that many users, I would recommend disabling the client-side cache and see if that helps. Also, we've found that on large systems, you may be better off with a smaller L2 cache (50-60% instead of 90%) and re-enabling the System Cache. (This is especially true if you have 3GB+ of memory.) I know that V8.5 does have a setting in the PCC to disable the client cache from the server, but I'm not sure about V8.1 w/Hotfixes. If not, the registry key should at least exist to disable it. If you can't find it, let me know and I'll dig it up for you. The most likely culprits will be low memory (causing the OS to complain, perhaps, or restricting the ability to build dynamic indices for the SQL engine) and a sudden influx of queries causing a timing issue. Getting rid of the CCE for a while may help initially, as would shrinking the cache size for the first issue. If the second issue is the culprit, then you'll need to see if the NTDBSMGR.EXE process crashes and get a core dump from it. Goldstar Software Inc. Building on Btrieve(R) for the Future(SM) Bill Bach BillBach (AT) goldstarsoftware (DOT) com http://www.goldstarsoftware.com *** Pervasive.SQL Service & Support Classes *** Chicago: March, 2004: See our web site for details! Brendt Hess wrote: This weekend, we upgraded from PSQL 2000i SP4+Hotfixes to Pervasive.SQL V8.1 + SRDE and MKDE Hotfixes on our production server (Win2k SP3, 2CPU, 2GB RAM, unlimited license). This was after upgrading and testing V8 on two other, smaller servers (Win2K SP3, 1 or 2 processors, ~1.3GB RAM, 50 user licenses each). After running for a period of time less than 8 hours, the server 'sort of' hung on us in an odd way. Existing Btrieve connections seemed fine, and it looked as if new btrieve connections could be created. However, relational connections could not be made, and there were problems with ??some?? existing connections. Since this was a production machine, I could not take an excessive amount of time to identify the issues. Also, the time was ~3:30am, and I am on Vicodan for pain when sleeping, so I was not at my sharpest (to say the least). We could not find a specific problem, but identified an 'issue' in the installation procedure that we thought might have been the cause (the server rolled back some changes the install program made because they were protected o/s files). To fix this, we got a later MDAC version (these were ODBC and other data access bits), and installed, then restarted the server. Less than two hours after this install and reboot, the problem happened again. Behavior was about the same as before, but then Btrieve connections started hanging. When trying to shut down services, I could stop the relational service, but the transactional service would not close. The Pervasive log file recorded a series of MKDE errors about the Invalidated Page List being full, starting some time after the problem occured (problem occured about 6:05am, errors started around 6:15am) (Unfortunately, I did not think to save off a copy of the log before rolling the server back to 2000i. Mea Culpa. The rollback was urgent on the production machine, and I was groggy.) Has anyone else seen *anything* like this before? We work in a relatively high-volume envoronment with mixed Btrieve and ODBC-through-ADO access, including a number of Stored Procedures. I *suspect* a problem in caching (the invalidated page message implies this to me), and noted that the DB process in NT had peaked at 1.8GB of memory in use both times, but was currently at about 1.3GB of RAM in use. Perhaps simply changing the max memory from 90% to (say) 75% would be sufficient to fix the problem, but I cannot afford to test on a live system supporting over 200 users / 1000 user connections, and the other systems we have have both (1) smaller total RAM, and (2) fewer licenses. Testing on these had not turned up any similar issues. I'm not even sure how to set up a test case. In both cases, different programs were running on multiple (between 12 and 60) machines over the network, using a combination of Btrieve and ODBC access. Other than the configuration values (which were pretty much standard/default except for the communications buffer at 16kB), I am not sure what more I can give you. I have an incident in at Pervasive, but just wanted to see if anyone here had seen anything like this before with 8.1 + Hotfixes, of with V8 any edition. Brendt Hess bhess-at-rsmedical-dot-com |
#4
| |||
| |||
|
|
More specifically set the L2 cache to no higher than 60% with 3GB ram and no higher than 45% with 4GB ram. Also be sure to apply the Pervasive.SQL V8 SP1 PLUS the Hot Fix which has a fix for this. The percentage settings are for default thread settings. If you raise the threads significantly those numbers would have to be set even lower. Leonard On 31 Jan 2004 12:56:51 EST, Bill Bach <bbach (AT) cncdsl (DOT) com> wrote: Brendt: On a large system with that many users, I would recommend disabling the client-side cache and see if that helps. Also, we've found that on large systems, you may be better off with a smaller L2 cache (50-60% instead of 90%) and re-enabling the System Cache. (This is especially true if you have 3GB+ of memory.) I know that V8.5 does have a setting in the PCC to disable the client cache from the server, but I'm not sure about V8.1 w/Hotfixes. If not, the registry key should at least exist to disable it. If you can't find it, let me know and I'll dig it up for you. The most likely culprits will be low memory (causing the OS to complain, perhaps, or restricting the ability to build dynamic indices for the SQL engine) and a sudden influx of queries causing a timing issue. Getting rid of the CCE for a while may help initially, as would shrinking the cache size for the first issue. If the second issue is the culprit, then you'll need to see if the NTDBSMGR.EXE process crashes and get a core dump from it. Goldstar Software Inc. Building on Btrieve(R) for the Future(SM) Bill Bach BillBach (AT) goldstarsoftware (DOT) com http://www.goldstarsoftware.com *** Pervasive.SQL Service & Support Classes *** Chicago: March, 2004: See our web site for details! Brendt Hess wrote: This weekend, we upgraded from PSQL 2000i SP4+Hotfixes to Pervasive.SQL V8.1 + SRDE and MKDE Hotfixes on our production server (Win2k SP3, 2CPU, 2GB RAM, unlimited license). This was after upgrading and testing V8 on two other, smaller servers (Win2K SP3, 1 or 2 processors, ~1.3GB RAM, 50 user licenses each). After running for a period of time less than 8 hours, the server 'sort of' hung on us in an odd way. Existing Btrieve connections seemed fine, and it looked as if new btrieve connections could be created. However, relational connections could not be made, and there were problems with ??some?? existing connections. Since this was a production machine, I could not take an excessive amount of time to identify the issues. Also, the time was ~3:30am, and I am on Vicodan for pain when sleeping, so I was not at my sharpest (to say the least). We could not find a specific problem, but identified an 'issue' in the installation procedure that we thought might have been the cause (the server rolled back some changes the install program made because they were protected o/s files). To fix this, we got a later MDAC version (these were ODBC and other data access bits), and installed, then restarted the server. Less than two hours after this install and reboot, the problem happened again. Behavior was about the same as before, but then Btrieve connections started hanging. When trying to shut down services, I could stop the relational service, but the transactional service would not close. The Pervasive log file recorded a series of MKDE errors about the Invalidated Page List being full, starting some time after the problem occured (problem occured about 6:05am, errors started around 6:15am) (Unfortunately, I did not think to save off a copy of the log before rolling the server back to 2000i. Mea Culpa. The rollback was urgent on the production machine, and I was groggy.) Has anyone else seen *anything* like this before? We work in a relatively high-volume envoronment with mixed Btrieve and ODBC-through-ADO access, including a number of Stored Procedures. I *suspect* a problem in caching (the invalidated page message implies this to me), and noted that the DB process in NT had peaked at 1.8GB of memory in use both times, but was currently at about 1.3GB of RAM in use. Perhaps simply changing the max memory from 90% to (say) 75% would be sufficient to fix the problem, but I cannot afford to test on a live system supporting over 200 users / 1000 user connections, and the other systems we have have both (1) smaller total RAM, and (2) fewer licenses. Testing on these had not turned up any similar issues. I'm not even sure how to set up a test case. In both cases, different programs were running on multiple (between 12 and 60) machines over the network, using a combination of Btrieve and ODBC access. Other than the configuration values (which were pretty much standard/default except for the communications buffer at 16kB), I am not sure what more I can give you. I have an incident in at Pervasive, but just wanted to see if anyone here had seen anything like this before with 8.1 + Hotfixes, of with V8 any edition. Brendt Hess bhess-at-rsmedical-dot-com |
#5
| |||
| |||
|
|
Question for you, Leonard: Obviously, the 60% of 3MB and 45% of 4MB is designed to keep the process space under 2GB. One would also assume that ~22% of 8GB would be the right answer. Does the engine stop allocating resources at the 2GB barrier, or does it try to keep going, resulting in the aforementioned problem? Goldstar Software Inc. Building on Btrieve(R) for the Future(SM) Bill Bach BillBach (AT) goldstarsoftware (DOT) com http://www.goldstarsoftware.com *** Pervasive.SQL Service & Support Classes *** Chicago: March, 2004: See our web site for details! Leonard wrote: More specifically set the L2 cache to no higher than 60% with 3GB ram and no higher than 45% with 4GB ram. Also be sure to apply the Pervasive.SQL V8 SP1 PLUS the Hot Fix which has a fix for this. The percentage settings are for default thread settings. If you raise the threads significantly those numbers would have to be set even lower. Leonard On 31 Jan 2004 12:56:51 EST, Bill Bach <bbach (AT) cncdsl (DOT) com> wrote: Brendt: On a large system with that many users, I would recommend disabling the client-side cache and see if that helps. Also, we've found that on large systems, you may be better off with a smaller L2 cache (50-60% instead of 90%) and re-enabling the System Cache. (This is especially true if you have 3GB+ of memory.) I know that V8.5 does have a setting in the PCC to disable the client cache from the server, but I'm not sure about V8.1 w/Hotfixes. If not, the registry key should at least exist to disable it. If you can't find it, let me know and I'll dig it up for you. The most likely culprits will be low memory (causing the OS to complain, perhaps, or restricting the ability to build dynamic indices for the SQL engine) and a sudden influx of queries causing a timing issue. Getting rid of the CCE for a while may help initially, as would shrinking the cache size for the first issue. If the second issue is the culprit, then you'll need to see if the NTDBSMGR.EXE process crashes and get a core dump from it. Goldstar Software Inc. Building on Btrieve(R) for the Future(SM) Bill Bach BillBach (AT) goldstarsoftware (DOT) com http://www.goldstarsoftware.com *** Pervasive.SQL Service & Support Classes *** Chicago: March, 2004: See our web site for details! Brendt Hess wrote: This weekend, we upgraded from PSQL 2000i SP4+Hotfixes to Pervasive.SQL V8.1 + SRDE and MKDE Hotfixes on our production server (Win2k SP3, 2CPU, 2GB RAM, unlimited license). This was after upgrading and testing V8 on two other, smaller servers (Win2K SP3, 1 or 2 processors, ~1.3GB RAM, 50 user licenses each). After running for a period of time less than 8 hours, the server 'sort of' hung on us in an odd way. Existing Btrieve connections seemed fine, and it looked as if new btrieve connections could be created. However, relational connections could not be made, and there were problems with ??some?? existing connections. Since this was a production machine, I could not take an excessive amount of time to identify the issues. Also, the time was ~3:30am, and I am on Vicodan for pain when sleeping, so I was not at my sharpest (to say the least). We could not find a specific problem, but identified an 'issue' in the installation procedure that we thought might have been the cause (the server rolled back some changes the install program made because they were protected o/s files). To fix this, we got a later MDAC version (these were ODBC and other data access bits), and installed, then restarted the server. Less than two hours after this install and reboot, the problem happened again. Behavior was about the same as before, but then Btrieve connections started hanging. When trying to shut down services, I could stop the relational service, but the transactional service would not close. The Pervasive log file recorded a series of MKDE errors about the Invalidated Page List being full, starting some time after the problem occured (problem occured about 6:05am, errors started around 6:15am) (Unfortunately, I did not think to save off a copy of the log before rolling the server back to 2000i. Mea Culpa. The rollback was urgent on the production machine, and I was groggy.) Has anyone else seen *anything* like this before? We work in a relatively high-volume envoronment with mixed Btrieve and ODBC-through-ADO access, including a number of Stored Procedures. I *suspect* a problem in caching (the invalidated page message implies this to me), and noted that the DB process in NT had peaked at 1.8GB of memory in use both times, but was currently at about 1.3GB of RAM in use. Perhaps simply changing the max memory from 90% to (say) 75% would be sufficient to fix the problem, but I cannot afford to test on a live system supporting over 200 users / 1000 user connections, and the other systems we have have both (1) smaller total RAM, and (2) fewer licenses. Testing on these had not turned up any similar issues. I'm not even sure how to set up a test case. In both cases, different programs were running on multiple (between 12 and 60) machines over the network, using a combination of Btrieve and ODBC access. Other than the configuration values (which were pretty much standard/default except for the communications buffer at 16kB), I am not sure what more I can give you. I have an incident in at Pervasive, but just wanted to see if anyone here had seen anything like this before with 8.1 + Hotfixes, of with V8 any edition. Brendt Hess bhess-at-rsmedical-dot-com |
![]() |
| Thread Tools | |
| Display Modes | |
| |