dbTalk Databases Forums  

Re: [BUGS] Re: [PATCH] Use CC atomic builtins if available [was:

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss Re: [BUGS] Re: [PATCH] Use CC atomic builtins if available [was: in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Heikki Linnakangas
 
Posts: n/a

Default Re: [BUGS] Re: [PATCH] Use CC atomic builtins if available [was: - 12-19-2011 , 03:25 PM






On 19.12.2011 22:12, Tom Lane wrote:
Quote:
Noah Misch<noah (AT) leadboat (DOT) com> writes:
On Mon, Dec 19, 2011 at 05:09:11PM +0200, Heikki Linnakangas wrote:
That is not sufficient on platforms with a weak memory model, like Itanium.

Other processors may observe the lock as held after its release, but there's no
correctness problem.

How weak is the memory model, exactly?

A correctness problem would ensue if out-of-order stores are possible,
ie other processors could observe the lock being freed (and then acquire
it) before seeing changes to shared variables that had been made while
holding the lock.

I'm a little dubious that this applies to Itanium, because I don't see
any memory fence instruction in the TAS macro. If we needed one in
S_UNLOCK I'd rather expect there to be one in TAS as well.
There's a pretty good manual called "Implementing Spinlocks on Itanium
and PA-RISC" from HP at:
http://h21007.www2.hp.com/portal/dow.../spinlocks.pdf.
It says, in section "3.2.1 Try to get a spinlock":

Quote:
Note also, that the ‘xchg’ instruction always
has the ‘acquire’ semantics, so it enforces the correct memory ordering.
But the current implementation seems to be safe, anyway. In section
"3.2.3 Freeing a spinlock", that manual says:

Quote:
Note, that a simple assignment statement into a volatile variable
will not work, as we are
assuming that the +Ovolatile=__unordered compile option is being used.
So +Ovolatile=__unordered would allow the kind of optimization that I
thought was possible, and we would have a problem if we used it. But the
default is more conservative, and a simple assignment does work.

I compiled the attached test program on an HP Itanium box, using the
same flags you get from PostgreSQL's configure on that box. The relevant
assembler output is:

xchg4 r14 = [r15], r14 // M [slocktest.c: 66/3]
//file/line/col slocktest.c/67/3
ld1.acq r16 = [r11] // M [slocktest.c: 67/3]
nop.i 0 // I
//file/line/col slocktest.c/68/3
st1.rel [r11] = r10 ;; // M [slocktest.c: 68/3]
//file/line/col slocktest.c/69/3
st4.rel [r15] = r0 // M [slocktest.c: 69/3]
//file/line/col slocktest.c/70/1


The trick I missed is that the compiler attaches .rel to all the stores
and .acq to all the loads through a volatile pointer. gcc seems to do
the same. So we're safe.


It would be interesting to test whether using +Ovolatile=__unordered
would make a difference to performance. Tacking those memory fence
modifiers to all the volatile loads and stores might be expensive.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #2  
Old   
Noah Misch
 
Posts: n/a

Default Re: [BUGS] Re: [PATCH] Use CC atomic builtins if available [was: - 12-19-2011 , 03:53 PM






On Mon, Dec 19, 2011 at 11:25:06PM +0200, Heikki Linnakangas wrote:
Quote:
I compiled the attached test program on an HP Itanium box, using the
same flags you get from PostgreSQL's configure on that box. The relevant
assembler output is:

xchg4 r14 = [r15], r14 // M [slocktest.c: 66/3]
//file/line/col slocktest.c/67/3
ld1.acq r16 = [r11] // M [slocktest.c: 67/3]
nop.i 0 // I
//file/line/col slocktest.c/68/3
st1.rel [r11] = r10 ;; // M [slocktest.c: 68/3]
//file/line/col slocktest.c/69/3
st4.rel [r15] = r0 // M [slocktest.c: 69/3]
//file/line/col slocktest.c/70/1


The trick I missed is that the compiler attaches .rel to all the stores
and .acq to all the loads through a volatile pointer. gcc seems to do
the same. So we're safe.
The Intel compiler appears not to follow this convention:
http://software.intel.com/sites/prod...e-volatile.htm

If you have that compiler installed, could you see which opcode it generates?

Thanks,
nm

--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #3  
Old   
Bruce Momjian
 
Posts: n/a

Default Re: [BUGS] Re: [PATCH] Use CC atomic builtins if available [was: - 02-02-2012 , 07:48 PM



Sorry for the late reply, but Heikki, can you get this Itanium
information into s_lock.h as a comment, particularly the information
about the +Ovolatile=__unordered flag?

---------------------------------------------------------------------------

On Mon, Dec 19, 2011 at 11:25:06PM +0200, Heikki Linnakangas wrote:
Quote:
On 19.12.2011 22:12, Tom Lane wrote:
Noah Misch<noah (AT) leadboat (DOT) com> writes:
On Mon, Dec 19, 2011 at 05:09:11PM +0200, Heikki Linnakangas wrote:
That is not sufficient on platforms with a weak memory model, like Itanium.

Other processors may observe the lock as held after its release, but there's no
correctness problem.

How weak is the memory model, exactly?

A correctness problem would ensue if out-of-order stores are possible,
ie other processors could observe the lock being freed (and then acquire
it) before seeing changes to shared variables that had been made while
holding the lock.

I'm a little dubious that this applies to Itanium, because I don't see
any memory fence instruction in the TAS macro. If we needed one in
S_UNLOCK I'd rather expect there to be one in TAS as well.

There's a pretty good manual called "Implementing Spinlocks on
Itanium and PA-RISC" from HP at: http://h21007.www2.hp.com/portal/dow.../spinlocks.pdf.
It says, in section "3.2.1 Try to get a spinlock":

Note also, that the ‘xchg’ instruction always
has the ‘acquire’ semantics, so it enforces the correct memory ordering.

But the current implementation seems to be safe, anyway. In section
"3.2.3 Freeing a spinlock", that manual says:

Note, that a simple assignment statement into a volatile variable
will not work, as we are
assuming that the +Ovolatile=__unordered compile option is being used.

So +Ovolatile=__unordered would allow the kind of optimization that
I thought was possible, and we would have a problem if we used it.
But the default is more conservative, and a simple assignment does
work.

I compiled the attached test program on an HP Itanium box, using the
same flags you get from PostgreSQL's configure on that box. The
relevant assembler output is:

xchg4 r14 = [r15], r14 // M [slocktest.c: 66/3]
//file/line/col slocktest.c/67/3
ld1.acq r16 = [r11] // M [slocktest.c: 67/3]
nop.i 0 // I
//file/line/col slocktest.c/68/3
st1.rel [r11] = r10 ;; // M [slocktest.c: 68/3]
//file/line/col slocktest.c/69/3
st4.rel [r15] = r0 // M [slocktest.c: 69/3]
//file/line/col slocktest.c/70/1


The trick I missed is that the compiler attaches .rel to all the
stores and .acq to all the loads through a volatile pointer. gcc
seems to do the same. So we're safe.


It would be interesting to test whether using +Ovolatile=__unordered
would make a difference to performance. Tacking those memory fence
modifiers to all the volatile loads and stores might be expensive.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#include <stdio.h

#if defined(__GNUC__) || defined(__INTEL_COMPILER)
#if defined(__ia64__) || defined(__ia64) /* Intel Itanium */
#define HAS_TEST_AND_SET

typedef unsigned int slock_t;

#define TAS(lock) tas(lock)

/* On IA64, it's a win to use a non-locking test before the xchg proper */
#define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock))

#ifndef __INTEL_COMPILER

static __inline__ int
tas(volatile slock_t *lock)
{
long int ret;

__asm__ __volatile__(
" xchg4 %0=%1,%2 \n"
: "=r"(ret), "+m"(*lock)
: "r"(1)
: "memory");
return (int) ret;
}

#else /* __INTEL_COMPILER */

static __inline__ int
tas(volatile slock_t *lock)
{
int ret;

ret = _InterlockedExchange(lock,1); /* this is a xchg asm macro */

return ret;
}

#endif /* __INTEL_COMPILER */
#endif /* __ia64__ || __ia64 */
#endif /* defined(__GNUC__) || defined(__INTEL_COMPILER) */

#if defined(__hpux) && defined(__ia64) && !defined(__GNUC__)

#define HAS_TEST_AND_SET

typedef unsigned int slock_t;

#include <ia64/sys/inline.h
#define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE)
/* On IA64, it's a win to use a non-locking test before the xchg proper */
#define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock))

#endif /* HPUX on IA64, non gcc */

slock_t lock;
char shared2;

int main(int argc, char **argv)
{
volatile char *p = &shared2;
char local;

TAS(&lock);
local = *p;
*p = 123;
*((volatile slock_t *) &lock) = 0;
}


--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
Bruce Momjian <bruce (AT) momjian (DOT) us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #4  
Old   
Heikki Linnakangas
 
Posts: n/a

Default Re: [BUGS] Re: [PATCH] Use CC atomic builtins if available [was: - 02-06-2012 , 05:24 AM



On 03.02.2012 02:48, Bruce Momjian wrote:
Quote:
Sorry for the late reply, but Heikki, can you get this Itanium
information into s_lock.h as a comment, particularly the information
about the +Ovolatile=__unordered flag?
Good idea. I came up with the attached, hope that explains it.

Looking back at the discussions, we concluded that the current code is
safe on gcc, because it implicitly adds the .rel/.acq opcodes to
volatile accesses, and HP's compiler does the same as long as you don't
explicitly disable it with +Ovolatile=__unordered. But what about
Intel's icc compiler? Presumably it's also safe, but looking at Intel's
manuals that I found, I'm not completely sure about it. There's an
option, -m[no-]serialize-volatile, that controls it, but I couldn't
figure out which is the default. Looking at the docs on that from Intel
that I found [1], it seems to me that on Linux, the default is *not*
safe, but on Windows it is.

Sergey, you have dugong in the buildfarm that uses Intel's compiler on
Itanium. Could you verify whether the -mno-serialize-volatile is the
default? If you could for example extract the assembler code generated
by icc for xlog.c, and send it over. Whether it's generating the
..rel/.acq opcodes should be easy to see in the generated code of the
XLogGetLastRemoved() function, for example, which doesn't do much else
than grab a spinlock. On gcc, the -s flag generates the assembly files,
I presume it's the same on icc.

Perhaps we should set those compiler flags explicitly in configure,
regardless of the defaults.

[1]
http://software.intel.com/sites/prod...e-volatile.htm

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
  #5  
Old   
Heikki Linnakangas
 
Posts: n/a

Default Re: [BUGS] Re: [PATCH] Use CC atomic builtins if available [was: - 03-16-2012 , 03:45 AM



On 06.02.2012 13:24, Heikki Linnakangas wrote:
Quote:
On 03.02.2012 02:48, Bruce Momjian wrote:
Sorry for the late reply, but Heikki, can you get this Itanium
information into s_lock.h as a comment, particularly the information
about the +Ovolatile=__unordered flag?

Good idea. I came up with the attached, hope that explains it.
Committed these additional comments.

Quote:
Looking back at the discussions, we concluded that the current code is
safe on gcc, because it implicitly adds the .rel/.acq opcodes to
volatile accesses, and HP's compiler does the same as long as you don't
explicitly disable it with +Ovolatile=__unordered. But what about
Intel's icc compiler? Presumably it's also safe, but looking at Intel's
manuals that I found, I'm not completely sure about it. There's an
option, -m[no-]serialize-volatile, that controls it, but I couldn't
figure out which is the default. Looking at the docs on that from Intel
that I found [1], it seems to me that on Linux, the default is *not*
safe, but on Windows it is.

Sergey, you have dugong in the buildfarm that uses Intel's compiler on
Itanium. Could you verify whether the -mno-serialize-volatile is the
default? If you could for example extract the assembler code generated
by icc for xlog.c, and send it over. Whether it's generating the
.rel/.acq opcodes should be easy to see in the generated code of the
XLogGetLastRemoved() function, for example, which doesn't do much else
than grab a spinlock. On gcc, the -s flag generates the assembly files,
I presume it's the same on icc.
Sergey confirmed off-list that icc defaults to -mserialize-volatile, so
we're safe.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.