![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
|
Noah Misch<noah (AT) leadboat (DOT) com> writes: On Mon, Dec 19, 2011 at 05:09:11PM +0200, Heikki Linnakangas wrote: That is not sufficient on platforms with a weak memory model, like Itanium. Other processors may observe the lock as held after its release, but there's no correctness problem. How weak is the memory model, exactly? A correctness problem would ensue if out-of-order stores are possible, ie other processors could observe the lock being freed (and then acquire it) before seeing changes to shared variables that had been made while holding the lock. I'm a little dubious that this applies to Itanium, because I don't see any memory fence instruction in the TAS macro. If we needed one in S_UNLOCK I'd rather expect there to be one in TAS as well. |
|
Note also, that the ‘xchg’ instruction always has the ‘acquire’ semantics, so it enforces the correct memory ordering. |
|
Note, that a simple assignment statement into a volatile variable will not work, as we are assuming that the +Ovolatile=__unordered compile option is being used. |
#2
| |||
| |||
|
|
I compiled the attached test program on an HP Itanium box, using the same flags you get from PostgreSQL's configure on that box. The relevant assembler output is: xchg4 r14 = [r15], r14 // M [slocktest.c: 66/3] //file/line/col slocktest.c/67/3 ld1.acq r16 = [r11] // M [slocktest.c: 67/3] nop.i 0 // I //file/line/col slocktest.c/68/3 st1.rel [r11] = r10 ;; // M [slocktest.c: 68/3] //file/line/col slocktest.c/69/3 st4.rel [r15] = r0 // M [slocktest.c: 69/3] //file/line/col slocktest.c/70/1 The trick I missed is that the compiler attaches .rel to all the stores and .acq to all the loads through a volatile pointer. gcc seems to do the same. So we're safe. |
#3
| |||
| |||
|
|
On 19.12.2011 22:12, Tom Lane wrote: Noah Misch<noah (AT) leadboat (DOT) com> writes: On Mon, Dec 19, 2011 at 05:09:11PM +0200, Heikki Linnakangas wrote: That is not sufficient on platforms with a weak memory model, like Itanium. Other processors may observe the lock as held after its release, but there's no correctness problem. How weak is the memory model, exactly? A correctness problem would ensue if out-of-order stores are possible, ie other processors could observe the lock being freed (and then acquire it) before seeing changes to shared variables that had been made while holding the lock. I'm a little dubious that this applies to Itanium, because I don't see any memory fence instruction in the TAS macro. If we needed one in S_UNLOCK I'd rather expect there to be one in TAS as well. There's a pretty good manual called "Implementing Spinlocks on Itanium and PA-RISC" from HP at: http://h21007.www2.hp.com/portal/dow.../spinlocks.pdf. It says, in section "3.2.1 Try to get a spinlock": Note also, that the ‘xchg’ instruction always has the ‘acquire’ semantics, so it enforces the correct memory ordering. But the current implementation seems to be safe, anyway. In section "3.2.3 Freeing a spinlock", that manual says: Note, that a simple assignment statement into a volatile variable will not work, as we are assuming that the +Ovolatile=__unordered compile option is being used. So +Ovolatile=__unordered would allow the kind of optimization that I thought was possible, and we would have a problem if we used it. But the default is more conservative, and a simple assignment does work. I compiled the attached test program on an HP Itanium box, using the same flags you get from PostgreSQL's configure on that box. The relevant assembler output is: xchg4 r14 = [r15], r14 // M [slocktest.c: 66/3] //file/line/col slocktest.c/67/3 ld1.acq r16 = [r11] // M [slocktest.c: 67/3] nop.i 0 // I //file/line/col slocktest.c/68/3 st1.rel [r11] = r10 ;; // M [slocktest.c: 68/3] //file/line/col slocktest.c/69/3 st4.rel [r15] = r0 // M [slocktest.c: 69/3] //file/line/col slocktest.c/70/1 The trick I missed is that the compiler attaches .rel to all the stores and .acq to all the loads through a volatile pointer. gcc seems to do the same. So we're safe. It would be interesting to test whether using +Ovolatile=__unordered would make a difference to performance. Tacking those memory fence modifiers to all the volatile loads and stores might be expensive. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com #include <stdio.h #if defined(__GNUC__) || defined(__INTEL_COMPILER) #if defined(__ia64__) || defined(__ia64) /* Intel Itanium */ #define HAS_TEST_AND_SET typedef unsigned int slock_t; #define TAS(lock) tas(lock) /* On IA64, it's a win to use a non-locking test before the xchg proper */ #define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock)) #ifndef __INTEL_COMPILER static __inline__ int tas(volatile slock_t *lock) { long int ret; __asm__ __volatile__( " xchg4 %0=%1,%2 \n" : "=r"(ret), "+m"(*lock) : "r"(1) : "memory"); return (int) ret; } #else /* __INTEL_COMPILER */ static __inline__ int tas(volatile slock_t *lock) { int ret; ret = _InterlockedExchange(lock,1); /* this is a xchg asm macro */ return ret; } #endif /* __INTEL_COMPILER */ #endif /* __ia64__ || __ia64 */ #endif /* defined(__GNUC__) || defined(__INTEL_COMPILER) */ #if defined(__hpux) && defined(__ia64) && !defined(__GNUC__) #define HAS_TEST_AND_SET typedef unsigned int slock_t; #include <ia64/sys/inline.h #define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE) /* On IA64, it's a win to use a non-locking test before the xchg proper */ #define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock)) #endif /* HPUX on IA64, non gcc */ slock_t lock; char shared2; int main(int argc, char **argv) { volatile char *p = &shared2; char local; TAS(&lock); local = *p; *p = 123; *((volatile slock_t *) &lock) = 0; } -- Sent via pgsql-bugs mailing list (pgsql-bugs (AT) postgresql (DOT) org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs |
#4
| |||
| |||
|
|
Sorry for the late reply, but Heikki, can you get this Itanium information into s_lock.h as a comment, particularly the information about the +Ovolatile=__unordered flag? |
#5
| |||
| |||
|
|
On 03.02.2012 02:48, Bruce Momjian wrote: Sorry for the late reply, but Heikki, can you get this Itanium information into s_lock.h as a comment, particularly the information about the +Ovolatile=__unordered flag? Good idea. I came up with the attached, hope that explains it. |
|
Looking back at the discussions, we concluded that the current code is safe on gcc, because it implicitly adds the .rel/.acq opcodes to volatile accesses, and HP's compiler does the same as long as you don't explicitly disable it with +Ovolatile=__unordered. But what about Intel's icc compiler? Presumably it's also safe, but looking at Intel's manuals that I found, I'm not completely sure about it. There's an option, -m[no-]serialize-volatile, that controls it, but I couldn't figure out which is the default. Looking at the docs on that from Intel that I found [1], it seems to me that on Linux, the default is *not* safe, but on Windows it is. Sergey, you have dugong in the buildfarm that uses Intel's compiler on Itanium. Could you verify whether the -mno-serialize-volatile is the default? If you could for example extract the assembler code generated by icc for xlog.c, and send it over. Whether it's generating the .rel/.acq opcodes should be easy to see in the generated code of the XLogGetLastRemoved() function, for example, which doesn't do much else than grab a spinlock. On gcc, the -s flag generates the assembly files, I presume it's the same on icc. |
![]() |
| Thread Tools | |
| Display Modes | |
| |