dbTalk Databases Forums  

Slow OS calls on one chip type

comp.databases.informix comp.databases.informix


Discuss Slow OS calls on one chip type in the comp.databases.informix forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Cesar Inacio Martins
 
Posts: n/a

Default Re: Slow OS calls on one chip type - 05-03-2010 , 06:44 PM






I'm not expert in C so, I just can provide a little more information to you:

About wait4 : http://www.unix.com/apropos-man/All/0/wait4/

To identify who is calling this function, run replacing the -c to -r on strace.

I not sure, but I believed there some environment set* , like LD_LIBRARY , LD_PRELOAD what force the BSD library first in the path... but this is just a guess, some developer expert or linux suport maybe can give some light....

try compare the env / set output of the two environment...
but this is just shooting in the dark

Regards
César

--- Em seg, 3/5/10, Neil Truby <neil.truby (AT) ardenta (DOT) com> escreveu:

De: Neil Truby <neil.truby (AT) ardenta (DOT) com>
Assunto: Re: Slow OS calls on one chip type
Para: informix-list (AT) iiug (DOT) org
Data: Segunda-feira, 3 de Maio de 2010, 16:40




Quote:
Well, why not do this (as previously posted, but removing the database
connection) :

strace -o slow1.out -c -f dbaccess - slow1

where slow1 contains your 64 !echo "Hello World"

Also, just try it on the actual machine rather than via ssh
This is the output of* strace -o slow1.out -fc dbaccess sysmaster <
crap5.sql :
% time* ***seconds* usecs/call* ***calls* * errors syscall
------ ----------- ----------- --------- --------- ----------------
88.47* * 0.415934* * * * ***0***5760352***5759744 close
* 7.74* * 0.036405* * * ***560* * * * 65* * * * ***execve
* 1.46* * 0.006865* * * ***107* * * * 64* * * * ***clone
* 0.69* * 0.003256* * * * ***2* * * 1601* * * 1058 open
* 0.39* * 0.001811* * * * ***3* * ***572* * * * ***read
* 0.32* * 0.001496* * * * ***1* * * 1292* * ***842 stat
* 0.26* * 0.001223* * * * ***1* * * 1072* * * * ***mmap
* 0.10* * 0.000466* * * * ***1* * ***537* * * * ***fstat
* 0.09* * 0.000443* * * * ***1* * ***452* * * * ***rt_sigprocmask
* 0.09* * 0.000417* * * * ***1* * ***403* * * * ***mprotect
* 0.09* * 0.000401* * * * ***1* * ***327* * * * 67 access
* 0.07* * 0.000327* * * * ***2* * ***196* * * * ***brk
* 0.05* * 0.000241* * * * ***2* * ***141* * * * ***munmap
* 0.04* * 0.000211* * * * ***3* * * * 71* ** * ***write
* 0.03* * 0.000128* * * * ***2* * * * 64* ** * ***setgid
* 0.02* * 0.000114* * * * ***0* * ***917* * * * ***rt_sigaction
* 0.02* * 0.000072* * * * ***1* * * * 64* ** * ***setuid
* 0.01* * 0.000070* * * * ***1* * * * 65* ** * ***arch_prctl
* 0.01* * 0.000066* * * * ***1* * * * 64* ** * ***getppid
* 0.01* * 0.000055* * * * ***0* * ***132* * * * ***uname
* 0.01* * 0.000050* * * * ***0* * ***132* * * * ***getuid
* 0.01* * 0.000043* * * * ***1* * * * 64* ** * ***getpid
* 0.01* * 0.000039* * * * ***0* * ***284* * * * ***lseek
* 0.00* * 0.000000* * * * ***0* * * * 68* ** * 67 ioctl
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***shmat
* 0.00* * 0.000000* * * * ***0* * * ***2** * * ***socket
* 0.00* * 0.000000* * * * ***0* * * ***2** * ***2 connect
* 0.00* * 0.000000* * * * ***0* * * * 64* ** * 64 getpeername
* 0.00* * 0.000000* * * * ***0* * * * 64* ** * 64 wait4
* 0.00* * 0.000000* * * * ***0* * * * 25* ** * ***semop
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***semctl
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***shmdt
* 0.00* * 0.000000* * * * ***0* * * ***8** * * ***fcntl
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***getcwd
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***readlink
* 0.00* * 0.000000* * * * ***0* * * * 66* ** * ***getrlimit
* 0.00* * 0.000000* * * * ***0* * ***129* * * * ***getgid
* 0.00* * 0.000000* * * * ***0* * * * 64* ** * ***geteuid
* 0.00* * 0.000000* * * * ***0* * * * 64* ** * ***getegid
* 0.00* * 0.000000* * * * ***0* * * * 64* ** * ***getpgrp
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***setrlimit
* 0.00* * 0.000000* * * * ***0* * * ***2** * * ***futex
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***set_tid_address
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00* * 0.470133* * * * * * ***5769561***5761908 total

And the output of strace -o slow2.out -c -f dbaccess - crap5.sql:
% time* ***seconds* usecs/call* ***calls* * errors syscall
------ ----------- ----------- --------- --------- ----------------
98.20***23.093423* * * 360835* * * * 64* * * ****wait4
* 1.47* * 0.346367* * * * ***0***5760348***5759680 close
* 0.23* * 0.054686* * * ***841* * * * 65* * * * ***execve
* 0.02* * 0.005324* * * * * 83* * * * 64* * ** ***clone
* 0.02* * 0.004632* * * * ***3* * * 1597* * * 1058 open
* 0.01* * 0.002853* * * * ***5* * ***562* * * * ***read
* 0.01* * 0.002439* * * * ***2* * * 1070* * * * ***mmap
* 0.01* * 0.001597* * * * ***1* * * 1292* * ***842 stat
* 0.00* * 0.000942* * * * ***2* * ***403* * * * ***mprotect
* 0.00* * 0.000836* * * * ***1* * ***790* * * * ***rt_sigaction
* 0.00* * 0.000502* * * * ***8* * * * 64* ** * ***write
* 0.00* * 0.000481* * * * ***1* * ***325* * * * 66 access
* 0.00* * 0.000417* * * * ***2* * ***265* * * * ***lseek
* 0.00* * 0.000341* * * * ***1* * ***535* * * * ***fstat
* 0.00* * 0.000309* * * * ***1* * ***387* * * * ***rt_sigprocmask
* 0.00* * 0.000237* * * * ***2* * ***140* * * * ***munmap
* 0.00* * 0.000182* * * * ***1* * ***129* * * * ***uname
* 0.00* * 0.000118* * * * ***1* * ***128* * * * ***getuid
* 0.00* * 0.000118* * * * ***2* * * * 64* ** * ***setgid
* 0.00* * 0.000115* * * * ***2* * * * 64* ** * ***geteuid
* 0.00* * 0.000090* * * * ***1* * * * 65* ** * ***arch_prctl
* 0.00* * 0.000079* * * * ***1* * * * 64* ** * ***setuid
* 0.00* * 0.000075* * * * ***1* * * * 64* ** * ***getpid
* 0.00* * 0.000071* * * * ***1* * * * 64* ** * ***getppid
* 0.00* * 0.000009* * * * ***0* * * * 66* ** * ***getrlimit
* 0.00* * 0.000000* * * * ***0* * ***196* * * * ***brk
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***ioctl
* 0.00* * 0.000000* * * * ***0* * * ***3** * * ***alarm
* 0.00* * 0.000000* * * * ***0* * * ***2** * * ***socket
* 0.00* * 0.000000* * * * ***0* * * ***2** * ***2 connect
* 0.00* * 0.000000* * * * ***0* * * * 64* ** * 64 getpeername
* 0.00* * 0.000000* * * * ***0* * * * 10* ** * ***fcntl
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***readlink
* 0.00* * 0.000000* * * * ***0* * ***128* * * * ***getgid
* 0.00* * 0.000000* * * * ***0* * * * 64* ** * ***getegid
* 0.00* * 0.000000* * * * ***0* * * * 64* ** * ***getpgrp
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***setrlimit
* 0.00* * 0.000000* * * * ***0* * * ***2** * * ***futex
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***set_tid_address
* 0.00* * 0.000000* * * * ***0* * * ***1** * * ***set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00***23.516243* * * * * * ***5769219***5761712 total

_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list

Reply With Quote
  #12  
Old   
Bartlomiej Lidke
 
Posts: n/a

Default Re: Slow OS calls on one chip type - 05-05-2010 , 02:31 PM






Art Kagel <art.kagel (AT) gmail (DOT) com> wrote:
Quote:
The BIG problem are the 6.7MILLION failed calls to the OS close()
system call!
they propably come from a post fork/clone loop that is clearing
the file descriptors before a new shell is being called using execve
(something like for (i=3; i<65536; i++) close(i))

each '!echo "something"' line produces extra 65536-3 failed (EBADF)
'close' calls

'close' on non-valid file desciptors can be easily benchmarked on
both systems just to be sure that it is not a kernel issue

also the strace can show what happens between execve and write(1,..)
call (from "echo") on both systems (like scanning NFS directories
that are on the PATH or LD_LIBRARY_PATH or something like that)

--
Best Regards
butthead

Reply With Quote
  #13  
Old   
Neil Truby
 
Posts: n/a

Default Re: Slow OS calls on one chip type - 05-05-2010 , 05:38 PM



"Bartlomiej Lidke" <ohggurnq (AT) rcs (DOT) cy.rot13.invalid> wrote

Quote:
Art Kagel <art.kagel (AT) gmail (DOT) com> wrote:
The BIG problem are the 6.7MILLION failed calls to the OS close()
system call!

they propably come from a post fork/clone loop that is clearing
the file descriptors before a new shell is being called using execve
(something like for (i=3; i<65536; i++) close(i))

each '!echo "something"' line produces extra 65536-3 failed (EBADF)
'close' calls

'close' on non-valid file desciptors can be easily benchmarked on
both systems just to be sure that it is not a kernel issue

also the strace can show what happens between execve and write(1,..)
call (from "echo") on both systems (like scanning NFS directories
that are on the PATH or LD_LIBRARY_PATH or something like that)
Art's reply is nowhere present on comp.databases.informix. I STILL think
there's a problem with the feed ....

Anyway, could somcseone else with RHEL 5.3 on AMD procs try it?
Thanks!!!!:
$ time dbaccess sysmaster < crap5.sql

Database selected.

Hello World
Hello World
Hello World
....
Hello World
Hello World


Database closed.


real 0m6.714s
user 0m0.017s
sys 0m0.041s

$ cat crap5.sql
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"

Reply With Quote
  #14  
Old   
Art Kagel
 
Posts: n/a

Default Re: Slow OS calls on one chip type - 05-05-2010 , 05:59 PM



Closest I have quick access to is Ubuntu 9 on AMD Athlon 64 X2 4200+:

$ uname -a
Linux galadriel-ii 2.6.31-21-generic #59-Ubuntu SMP Wed Mar 24 07:28:27 UTC
2010 x86_64 GNU/Linux
$
$ time dbaccess sysmaster - <crap5.sql

Database selected.

Hello World
Hello World
....
Hello World
Hello World


Database closed.


real 0m0.570s
user 0m0.000s
sys 0m0.050s
$
$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 43
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
stepping : 1
cpu MHz : 1000.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm
3dnowext 3dnow rep_good pni la
hf_lm cmp_legacy
bogomips : 1990.15
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 43
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
stepping : 1
cpu MHz : 1000.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm
3dnowext 3dnow rep_good pni la
hf_lm cmp_legacy
bogomips : 1990.15
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
IIUG Board of Directors (art (AT) iiug (DOT) org)

See you at the 2010 IIUG Informix Conference
April 25-28, 2010
Overland Park (Kansas City), KS
www.iiug.org/conf

Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Advanced DataTools, the IIUG, nor any other
organization with which I am associated either explicitly, implicitly, or by
inference. Neither do those opinions reflect those of other individuals
affiliated with any entity with which I am affiliated nor those of the
entities themselves.



On Wed, May 5, 2010 at 6:38 PM, Neil Truby <neil.truby (AT) ardenta (DOT) com> wrote:

Quote:
"Bartlomiej Lidke" <ohggurnq (AT) rcs (DOT) cy.rot13.invalid> wrote in message
news:hrsh37$v2i$1 (AT) news (DOT) onet.pl...
Art Kagel <art.kagel (AT) gmail (DOT) com> wrote:
The BIG problem are the 6.7MILLION failed calls to the OS close()
system call!

they propably come from a post fork/clone loop that is clearing
the file descriptors before a new shell is being called using execve
(something like for (i=3; i<65536; i++) close(i))

each '!echo "something"' line produces extra 65536-3 failed (EBADF)
'close' calls

'close' on non-valid file desciptors can be easily benchmarked on
both systems just to be sure that it is not a kernel issue

also the strace can show what happens between execve and write(1,..)
call (from "echo") on both systems (like scanning NFS directories
that are on the PATH or LD_LIBRARY_PATH or something like that)

Art's reply is nowhere present on comp.databases.informix. I STILL think
there's a problem with the feed ....

Anyway, could somcseone else with RHEL 5.3 on AMD procs try it?
Thanks!!!!:
$ time dbaccess sysmaster < crap5.sql

Database selected.

Hello World
Hello World
Hello World
...
Hello World
Hello World


Database closed.


real 0m6.714s
user 0m0.017s
sys 0m0.041s

$ cat crap5.sql
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"
!echo "Hello World"






_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list

Reply With Quote
  #15  
Old   
Neil Truby
 
Posts: n/a

Default Re: Slow OS calls on one chip type - resolved! - 05-19-2010 , 12:39 PM



"Neil Truby" <neil.truby (AT) ardenta (DOT) com> wrote

Quote:
IDS 11.5FC6 on RHEL 5.3

Here's a starange one. When running SQL which interacts with OS to output
messages, it's very, very much slower at a new customer site than it is at
our own or others. The difference seems to be the AMD processors the
customer has (we have Intel) - the OS and IDS versions are identical.
This turns out to have been caused by an RHEL kernel parameter setting being
made too tight.

Thanks to Uwe Weber from IBM, and several from c.d.i. but in particular Art
Kagel, for their help in getting to the solution.

Reply With Quote
  #16  
Old   
david@smooth1.co.uk
 
Posts: n/a

Default Re: Slow OS calls on one chip type - resolved! - 05-19-2010 , 04:57 PM



On 19 May, 18:39, "Neil Truby" <neil.tr... (AT) ardenta (DOT) com> wrote:
Quote:
"Neil Truby" <neil.tr... (AT) ardenta (DOT) com> wrote in message

news:82pgrnFo58U1 (AT) mid (DOT) individual.net...

IDS 11.5FC6 on RHEL 5.3

Here's a starange one. *When running SQL which interacts with OS to output
messages, it's very, very much slower at a new customer site than it isat
our own or others. *The difference seems to be the AMD processors the
customer has (we have Intel) - the OS and IDS versions are identical.

This turns out to have been caused by an RHEL kernel parameter setting being
made too tight.

Thanks to Uwe Weber from IBM, and several from c.d.i. but in particular Art
Kagel, for their help in getting to the solution.
Which kernel setting?

Reply With Quote
  #17  
Old   
Neil Truby
 
Posts: n/a

Default Re: Slow OS calls on one chip type - resolved! - 05-20-2010 , 03:57 PM



<david (AT) smooth1 (DOT) co.uk> wrote

On 19 May, 18:39, "Neil Truby" <neil.tr... (AT) ardenta (DOT) com> wrote:
Quote:
"Neil Truby" <neil.tr... (AT) ardenta (DOT) com> wrote in message

news:82pgrnFo58U1 (AT) mid (DOT) individual.net...

IDS 11.5FC6 on RHEL 5.3

Here's a starange one. When running SQL which interacts with OS to
output
messages, it's very, very much slower at a new customer site than it is
at
our own or others. The difference seems to be the AMD processors the
customer has (we have Intel) - the OS and IDS versions are identical.

This turns out to have been caused by an RHEL kernel parameter setting
being
made too tight.

Thanks to Uwe Weber from IBM, and several from c.d.i. but in particular
Art
Kagel, for their help in getting to the solution.

Which kernel setting?
A sysadmin writes ...

Hi Neil

I am not sure what parameter exactly was the problem but we commented out
the following lines from /etc/security/limits.conf - which in the end left
us with a default limits.conf file:

# - stack - max stack size (KB)
#* soft stack 32768
#* hard stack 32768


# - nofile - max number of open files
#* soft nofile 90000
#* hard nofile 90000


# - nproc - max number of processes
#* soft nproc 16384
#* hard nproc 16384

These were in place on recommendation from another vendor for other systems
and were mistakenly applied to the database servers. The setting with the
most impact in the case of a database I am thinking would be the number of
open files.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.