Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
noinline to avoid compiler reading TOC before UCM_BISTRO_PROLOGUE
I don't have an actual failure case in UCX to point to here, but I can show how the current code is dangerous. In src/ucm/util/replace.h the _UCM_DEFINE_REPLACE_FUNC macro creates the code for functions like ucm_override_brk() which begin and end with UCM_BISTRO_PROLOGUE / EPILOGUE. I'll show how the code inside that function is sensitive to compiler behavior that can sometimes save a copy of the r2 register before the PROLOGUE macro (the purpose of those macros is to save/modify/restore r2 since the table of contents for the function we just branched into is different than wherever we just branched from, like brk()). If you modify the macro in src/ucm/util/replace.h to have the extra lines: (before the UCM_BISTRO_PROLOGUE): typedef void (*MyFunction_t) (); volatile MyFunction_t fn; \ (after the UCM_BISTRO_PROLOGUE): fn = foo; \ fn(); \ And define foo() over in src/ucm/util/replace.c: void foo () { // make foo() something that can't be optimized out static int counter = 0; ++counter; if(counter==1000000) { printf("hi\n"); counter = 0; } } Then rebuild with % touch src/ucm/util/replace.c src/ucm/event/event.c % make Then to see the resulting dangerous assembly: % gdb ./src/ucm/.libs/libucm.so (gdb) disassemble ucm_override_brk 0x000000000000d6dc <+28>: std r2,24(r1) <--[gcc saving r2 before PROLOGUE] 0x000000000000d6e0 <+32>: std r2,96(r1) <--[this and the NOPs below are the PROLOGUE. 0x000000000000d6e4 <+36>: nop When the memory interception 0x000000000000d6e8 <+40>: nop is installed, the NOPs get 0x000000000000d6ec <+44>: nop replaced by code that saves 0x000000000000d6f0 <+48>: nop then resets r2 to this 0x000000000000d6f4 <+52>: nop function's desired table of 0x000000000000d6f8 <+56>: nop contents] 0x000000000000d6fc <+60>: ld r9,-32648(r2) 0x000000000000d700 <+64>: std r9,104(r1) 0x000000000000d704 <+68>: ori r2,r2,0 0x000000000000d708 <+72>: ld r9,104(r1) 0x000000000000d70c <+76>: mtctr r9 0x000000000000d710 <+80>: mr r12,r9 0x000000000000d714 <+84>: bctrl 0x000000000000d718 <+88>: ld r2,24(r1) <--[after fn(), this loads the pre-interception r2] 0x000000000000d71c <+92>: nop 0x000000000000d720 <+96>: ld r9,-32640(r2) 0x000000000000d724 <+100>: lwz r9,0(r9) 0x000000000000d728 <+104>: cmpwi cr7,r9,5 0x000000000000d72c <+108>: bgt cr7,0xd770 <ucm_override_brk+176> 0x000000000000d730 <+112>: bl 0x4160 <00000041.plt_call.pthread_self@@GLIBC_2.17> 0x000000000000d734 <+116>: ld r2,24(r1) What happened above is the old r2 that was correct for brk() that we branched from is saved at $r1+24. Then those NOP become instructions that reset r2 to the table of contents for the interception function ucm_override_brk(). But when it branches to foo() via function pointer fn() and comes back, it loads r2 from $r1+24 which is the old r2. Then a little later when it branches to pthread_self() the disassembly there is: 0x0000000000004160 <+0>: std r2,24(r1) 0x0000000000004164 <+4>: ld r12,-31312(r2) 0x0000000000004168 <+8>: mtctr r12 0x000000000000416c <+12>: bctr 0x0000000000004170 <+16>: .long 0x0 0x0000000000004174 <+20>: .long 0x0 0x0000000000004178 <+24>: .long 0x0 0x000000000000417c <+28>: .long 0x0 so that load from $r2-31312 is offset from the wrong TOC. If r2 had been restored to the new value from the r2 reset in the NOP instructions, then the pthread_self branch above would be fine. So that early save of r2 in front of the PROLOGUE is a problem. I opened a bug with gcc here: https://www.mail-archive.com/[email protected]/msg654754.html saying that it seems logical for a function call to boil down to save current r2 somewhere branch to new function and use a new r2 while there come back and restore r2 and performing the "save current r2" way before the function call and before lines that modify r2 should be considered a gcc bug. But they don't agree. They didn't say much, but my impression is they consider r2 management to be the exclusive purview of the compiler, and patcher code that interferes with r2 isn't going to get their blessing. The workaround that's in place in OMPI and IBM PAMI to avoid the issue is to have an extra level of indirection, so that ucm_override_brk() would do as little as possible, like PROLOGUE res = ucm_brk() EPILOGUE and a recent hiccup in that workaround was that ucm_brk() might get inlined thus negating the workaround. The code constructed by _UCM_DEFINE_REPLACE_FUNC is a little bigger than my ideal tiny function above, but not by too much. And empirically I think the thing that made the called function dangerous in practice was if it called anything via function pointer. That's when gcc's r2 management took a direction that could be incompatible with the patcher. So my "fix" here is currently not to add an extra level of indirection, instead supposing that the fairly small function defined by _UCM_DEFINE_REPLACE_FUNC is okay. And just to make sure that the function it calls, eg ucm_brk() etc, won't get inlined. Signed-off-by: Mark Allen <[email protected]>
- Loading branch information