Skip to content

[WebAssembly] Generate a call to __wasm_apply_global_tls_relocs in __wasm_init_memory #149832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Arshia001
Copy link

Motivation

We recently implemented the WebAssembly exception handling proposal in Wasmer 6.0. As a result, we can now take advantage of clang's support for compiling SjLj and C++ exceptions to WASM EH. This PR fixes a wasm-ld issue that breaks the use of C++ exception handling in WASI(X) modules.

Note: I use WASI(X) to mean either wasi preview 1 or WASIX modules.

Error details

When compiling C++ code that uses exceptions, clang generates a GOT.data.internal.__wasm_lpad_context global, which points to the wasm landing pad context that's shared between compiler code and libunwind. This global is initialized in the __wasm_apply_global_tls_relocs function.

TLS initialization happens in two separate places; for the "main thread", __wasm_init_memory runs as the (start) function of the WASM module, initializing all memory segments (including TLS), while also initializing the main thread's __tls_base to the space reserved for it by the compiler, and signalling this fact to other threads via an atomic. Other threads need to run __wasm_init_tls after getting their respective __tls_base global initialized externally.

As it stands, __wasm_apply_global_tls_relocs is only called through __wasm_init_tls, meaning if code doesn't call __wasm_init_tls, any globals that are initialized in __wasm_apply_global_tls_relocs do not get initialized. This is the case for the main thread.

It is important to note that exception handling code generated by the compiler uses GOT.data.internal.__wasm_lpad_context, while the code in _Unwind_CallPersonality goes through __tls_base + offset directly. Because GOT.data.internal.__wasm_lpad_context is not initialized in the main thread, the compiler and _Unwind_CallPersonality do not agree on where the landing pad context is stored. This results in scan_eh_tab not getting the correct LSDA pointer. Exception handling is then completely broken; the catch-all block runs for every exception due to a lack of any type information at runtime.

This PR allows a call to __wasm_apply_global_tls_relocs to be generated in __wasm_init_memory if needed, which should fix the value of GOT.data.internal.__wasm_lpad_context in modules' main threads. Interestingly, through all of our recent work on dynamic linking and PIC modules, we never encountered __wasm_apply_global_tls_relocs, and I don't know if it's used for anything besides GOT.data.internal.__wasm_lpad_context.

But how does emscripten work if this is broken?

Good question! Emscripten calls __wasm_init_tls redundantly for main threads, and thus initializes the TLS area twice. This has no observable effect besides being slower, and does indeed fix C++ exception handling.

This is a workaround that we can use in WASIX as well. However, as far as I understand, the current behavior is wasm-ld is broken, since __wasm_init_memory and __wasm_init_tls should behave similarly with respect to TLS initialization, but feel free to disagree with me here.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-wasm

Author: None (Arshia001)

Changes

Motivation

We recently implemented the WebAssembly exception handling proposal in Wasmer 6.0. As a result, we can now take advantage of clang's support for compiling SjLj and C++ exceptions to WASM EH. This PR fixes a wasm-ld issue that breaks the use of C++ exception handling in WASI(X) modules.

Note: I use WASI(X) to mean either wasi preview 1 or WASIX modules.

Error details

When compiling C++ code that uses exceptions, clang generates a GOT.data.internal.__wasm_lpad_context global, which points to the wasm landing pad context that's shared between compiler code and libunwind. This global is initialized in the __wasm_apply_global_tls_relocs function.

TLS initialization happens in two separate places; for the "main thread", __wasm_init_memory runs as the (start) function of the WASM module, initializing all memory segments (including TLS), while also initializing the main thread's __tls_base to the space reserved for it by the compiler, and signalling this fact to other threads via an atomic. Other threads need to run __wasm_init_tls after getting their respective __tls_base global initialized externally.

As it stands, __wasm_apply_global_tls_relocs is only called through __wasm_init_tls, meaning if code doesn't call __wasm_init_tls, any globals that are initialized in __wasm_apply_global_tls_relocs do not get initialized. This is the case for the main thread.

It is important to note that exception handling code generated by the compiler uses GOT.data.internal.__wasm_lpad_context, while the code in _Unwind_CallPersonality goes through __tls_base + offset directly. Because GOT.data.internal.__wasm_lpad_context is not initialized in the main thread, the compiler and _Unwind_CallPersonality do not agree on where the landing pad context is stored. This results in scan_eh_tab not getting the correct LSDA pointer. Exception handling is then completely broken; the catch-all block runs for every exception due to a lack of any type information at runtime.

This PR allows a call to __wasm_apply_global_tls_relocs to be generated in __wasm_init_memory if needed, which should fix the value of GOT.data.internal.__wasm_lpad_context in modules' main threads. Interestingly, through all of our recent work on dynamic linking and PIC modules, we never encountered __wasm_apply_global_tls_relocs, and I don't know if it's used for anything besides GOT.data.internal.__wasm_lpad_context.

But how does emscripten work if this is broken?

Good question! Emscripten calls __wasm_init_tls redundantly for main threads, and thus initializes the TLS area twice. This has no observable effect besides being slower, and does indeed fix C++ exception handling.

This is a workaround that we can use in WASIX as well. However, as far as I understand, the current behavior is wasm-ld is broken, since __wasm_init_memory and __wasm_init_tls should behave similarly with respect to TLS initialization, but feel free to disagree with me here.


Full diff: https://github.com/llvm/llvm-project/pull/149832.diff

1 Files Affected:

  • (modified) lld/wasm/Writer.cpp (+9)
diff --git a/lld/wasm/Writer.cpp b/lld/wasm/Writer.cpp
index b704677d36c93..3cd6a73fb1a31 100644
--- a/lld/wasm/Writer.cpp
+++ b/lld/wasm/Writer.cpp
@@ -1366,6 +1366,15 @@ void Writer::createInitMemoryFunction() {
           writeUleb128(os, s->index, "segment index immediate");
           writeU8(os, 0, "memory index immediate");
         }
+
+        // After initializing the TLS segment, we also need to apply TLS
+        // relocations in the same way __wasm_init_tls does.
+        if (ctx.arg.sharedMemory && s->isTLS() &&
+            ctx.sym.applyGlobalTLSRelocs) {
+          writeU8(os, WASM_OPCODE_CALL, "CALL");
+          writeUleb128(os, ctx.sym.applyGlobalTLSRelocs->getFunctionIndex(),
+                      "function index");
+        }
       }
     }
 

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment for createApplyGlobalTLSRelocationsFunction it cannot be called during the start function: (

// Similar to createApplyGlobalRelocationsFunction but for
// TLS symbols. This cannot be run during the start function
// but must be delayed until __wasm_init_tls is called.
void Writer::createApplyGlobalTLSRelocationsFunction() {
.

I don't remember exactly why this is...


// After initializing the TLS segment, we also need to apply TLS
// relocations in the same way __wasm_init_tls does.
if (ctx.arg.sharedMemory && s->isTLS() &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.arg.sharedMemory is probably redundant here since without it applyGlobalTLSRelocs would never be created.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Arshia001
Copy link
Author

Arshia001 commented Jul 22, 2025

@sbc100 thanks for the review!

it cannot be called during the start function:

I can't think of anything, except the fact that it needs __tls_base to be initialized before it can work. Maybe, at some point, __tls_base wasn't initialized in __wasm_init_memory, and nobody updated that comment after this behavior changed? I'll dig into the commit history to see if I can figure this out.

In the meantime, do you have other suggestions on how to fix this? I suppose making __wasm_apply_global_tls_relocs public would at least allow it to be called from the WASIX module's _start function.

@Arshia001
Copy link
Author

This is the change that introduced the comment: https://github.com/llvm/llvm-project/blob/ef8c9135efcb3847fc0e5bbdb55eae18751090df/lld/wasm/Writer.cpp

Looking over the code, it seems that back then, __tls_base wasn't being initialized during __wasm_memory_init:

writeU8(os, WASM_OPCODE_END, "end $init");
for (const OutputSegment *s : segments) {
if (needsPassiveInitialization(s)) {
// destination address
writePtrConst(os, s->startVA, is64, "destination address");
if (config->isPic) {
writeU8(os, WASM_OPCODE_GLOBAL_GET, "GLOBAL_GET");
writeUleb128(os, WasmSym::memoryBase->getGlobalIndex(),
"memory_base");
writeU8(os, is64 ? WASM_OPCODE_I64_ADD : WASM_OPCODE_I32_ADD,
"i32.add");
}
// source segment offset
writeI32Const(os, 0, "segment offset");
// memory region size
writeI32Const(os, s->size, "memory region size");
// memory.init instruction
writeU8(os, WASM_OPCODE_MISC_PREFIX, "bulk-memory prefix");
writeUleb128(os, WASM_OPCODE_MEMORY_INIT, "memory.init");
writeUleb128(os, s->index, "segment index immediate");
writeU8(os, 0, "memory index immediate");
}
}

Around a year later, static allocation of the TLS section was added in:

// When we initialize the TLS segment we also set the `__tls_base`
// global. This allows the runtime to use this static copy of the
// TLS data for the first/main thread.
if (config->sharedMemory && s->isTLS()) {
if (config->isPic) {
// Cache the result of the addionion in local 0
writeU8(os, WASM_OPCODE_LOCAL_TEE, "local.tee");
writeUleb128(os, 1, "local 1");
} else {
writePtrConst(os, s->startVA, is64, "destination address");
}
writeU8(os, WASM_OPCODE_GLOBAL_SET, "GLOBAL_SET");
writeUleb128(os, WasmSym::tlsBase->getGlobalIndex(),
"__tls_base");
if (config->isPic) {
writeU8(os, WASM_OPCODE_LOCAL_GET, "local.tee");
writeUleb128(os, 1, "local 1");
}
}

But __wasm_apply_global_tls_relocs probably flew under the radar and the comment was never removed. I assume the correct thing to do here would be to remove that comment as well. What do you think, @sbc100?

…nsFunction`

* Remove redundant condition when generating call to `__wasm_apply_global_tls_relocs` in `lld::wasm::Writer::createInitMemoryFunction`
@Arshia001
Copy link
Author

@sbc100 I believe we're at the one-week ping threshold :)

@sbc100
Copy link
Collaborator

sbc100 commented Jul 28, 2025

I'm hoping to get some more time to look into this soon.

My main concern is around the timing of application of relocations. The dynamic linking scenario its generally not safe to apply relocations until all libraries have been loaded (i.e. all symbols have been resolved). At last that is true for relocations in general. Perhaps its true that TLS relocations always resolve to internal locations? In which case this might be safe.

But this is the reason way __wasm_apply_data_relocs is never called from the wasm start function: Symbol resolution is not necessarily complete when the start functions runs (i.e. when a given module is loaded).

@Arshia001
Copy link
Author

Arshia001 commented Jul 28, 2025

The dynamic linking aspect is very important to us as well, since we just so happen to have released a dynamic linker less than a month ago. Now I'm wondering whether our handling of TLS symbols is correct, at least when a symbol with the same name is exported from multiple side modules... Gonna have to look at it tomorrow.

Another interesting case is the catching of C++ exceptions across module boundaries. Note, my entire description of the issue was based on a single module with local-exec TLS. I assume the behaviour will be different with global-dynamic.

In the meantime, I'll wait for your review with a healthy dose of excitement.

@sbc100
Copy link
Collaborator

sbc100 commented Aug 1, 2025

Taking another look at the code it seem like this should actually be safe since __wasm_apply_global_tls_relocs only contains relocations for internalGotSymbols which are symbols that resolve to DSO-local addresses.

@@ -1366,6 +1366,14 @@ void Writer::createInitMemoryFunction() {
writeUleb128(os, s->index, "segment index immediate");
writeU8(os, 0, "memory index immediate");
}

// After initializing the TLS segment, we also need to apply TLS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "After initializing the TLS segment and setting __tls_base we can call __wasm_apply_global_tls_relocs"

@sbc100
Copy link
Collaborator

sbc100 commented Aug 1, 2025

Note that you will still need to call __wasm_init_tls on all threads, including the main thread because this function also calls __wasm_apply_tls_relocs. This would need to be called on all threads including the main thread, but it cannot be part of the start function and (like __wasm_apply_data_relocs can only be called once all symbols are resolved).

I suppose we could have __wasm_apply_data_relocs call __wasm_apply_tls_relocs on the assumption that __tls_base has been set by then... but that would require a bunch of refactoring. I'm not sure its worth it.

I assume you are calling __wasm_apply_data_relocs somewhere in your dynamic linker?

@Arshia001
Copy link
Author

Arshia001 commented Aug 1, 2025

you will still need to call __wasm_init_tls on all threads

Yes, this already happens in both wasix-libc and wasi-libc as part of the thread creation routine.

including the main thread because this function also calls __wasm_apply_tls_relocs

Not quite sure why this would be the case, considering the statically-allocated TLS section exists and is initialized by __wasm_init_memory? Neither of the wasi(x)-libc's make this call today. Note that __wasm_init_memory initializes __tls_base for the main thread as well, so the call to __wasm_apply_tls_relocs is safe in that regard.

Edit: NVM, my mistake.

but it cannot be part of the start function and (like __wasm_apply_data_relocs can only be called once all symbols are resolved).

If there are external symbols, then yes. However, I have only ever seen (and you pointed this out as well) that __wasm_apply_tls_relocs only initializes DSO-local symbols, which should be safe to call within the start function. Unless I'm missing something here?

Edit: NVM, my mistake

I assume you are calling __wasm_apply_data_relocs somewhere in your dynamic linker?

Yes, at the very last stage when every module is already loaded in and instantiated, since we need everything to be resolved.

I did a bit more digging, and the same problem also exists in DL modules; however, there the problem doesn't show itself (at least in anything we've compiled). Breakdown of the situation:

  • Non-DL module: the pre-allocated TLS area for the main thread lives at offset 1024. Hence, the global needs to be relocated to account for a __tls_base of 1024. We get an error.
  • DL module, running on the Wasmer linker: the dynamic linker puts __memory_base for the main module at offset 0, and our compile settings put the TSD area before the stack, so __memory_base = __tls_base = 0 for the main module, on the main thread. Hence, the initial value for the $GOT.data.internal TLS symbols are correct by chance, and we don't see any issues.

While this problem also exists in DL modules, I will again stress the fact that __wasm_apply_global_tls_relocs is also generated for non-DL modules, where no dynamic linker comes into play at any point, so there's no "link finalization phase" from which to call __wasm_apply_global_tls_relocs.

As long as __wasm_apply_global_tls_relocs only relocates DSO-local symbols, I don't see why it should be unsafe to be called from the start function.

@sbc100
Copy link
Collaborator

sbc100 commented Aug 5, 2025

Just to clarify did you mean to write __wasm_apply_global_tls_relocs rather than __wasm_apply_tls_relocs in that last comment? (if so maybe update it and I'll delete this comment).

@sbc100
Copy link
Collaborator

sbc100 commented Aug 5, 2025

I agree that any relocation functions that is guaranteed to only refer to DSO-local symbols are safe to be called from the start function (once __tls_base is set).

Assuming that __wasm_apply_global_tls_relocs only refers to DSO-local symbols then this change would be safe. From my reading of the code I believe that you could be correct about that.

However, IIUC there is a difference between __wasm_apply_tls_relocs and __wasm_apply_global_tls_relocs in that regard. It seems likely that the later will only ever refer to DSO-local symbols, but not the former could contain references to any symbol at all.

@Arshia001
Copy link
Author

Just to clarify did you mean to write __wasm_apply_global_tls_relocs rather than __wasm_apply_tls_relocs

I did not, in the sense that I didn't know there are two variations. Let me go over the code one more time.

@Arshia001
Copy link
Author

Yes, I did mean __wasm_apply_global_tls_relocs. I'll edit my previous comment. I didn't know of __wasm_apply_tls_relocs, so I made little sense there.

@Arshia001
Copy link
Author

Side note: since I didn't know about __wasm_apply_tls_relocs, we weren't accounting for it in our linker implementation, so thanks for pointing that out!

@Arshia001
Copy link
Author

Arshia001 commented Aug 5, 2025

But this creates a second problem: __wasm_apply_tls_relocs is a hidden symbol, and can't be exported via --export AFAIK. I assume the fix there would be to make the symbol WASM_SYMBOL_VISIBILITY_DEFAULT | WASM_SYMBOL_EXPORTED, so it can be exported and called by linkers? That's working for me locally, so I can push it if you think it's correct in principle.


Edit: __wasm_apply_tls_relocs can also be called during startup from wasix-libc. But that also requires the same visibility change to the symbol.

Edit 2: that doesn't work for side modules though. Better to do it in the linker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants