forked from openucx/ucx
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding missing openmp compile guards #1
Open
ct-clmsn
wants to merge
12
commits into
rbradford:20230628-riscv-enabling
Choose a base branch
from
ct-clmsn:rv64-perf-update
base: 20230628-riscv-enabling
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
adding missing openmp compile guards #1
ct-clmsn
wants to merge
12
commits into
rbradford:20230628-riscv-enabling
from
ct-clmsn:rv64-perf-update
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This patch is the derived from the work done by Chris Taylor and John Leidel (from this PR: https://github.com/openucx/ucx/pull/8246/commits) The ways in which this patch differs from that PR are: * Use __clear_cache() for i-cache updates with correct syscall fallback (need to pass 0 for all CPUs) (myself) * Use appropriate fence instructions for IO and CPU fences (myself) * Coding standard fixes (myself) * Removal of unused rv64/atomic.h (myself) * Header guard fixes (myself) * Removal of unused UCS_CPU_MODEL_RV64IMAFDC (myself) * Corrected patch address immediate calculation (Evan Green) * Handling of zero for clz (Evan Green) * Use generic time functions (Evan Green) Original-author: Chris Taylor <[email protected]> Original-author: John Leidel <[email protected]> Co-authored-by: Evan Green <[email protected]> Signed-off-by: Rob Bradford <[email protected]> Signed-off-by: Evan Green <[email protected]>
Using UCX_MEM_MMAP_HOOK_MODE=reloc was crashing when running on RISC-V because ucm_dl_populate_symbols() was reaching through a symbol table using exactly the value it found out of the ELF dynamic section entry. This entry is a virtual addres if and only if the image in question has been loaded at address 0 (which is architecture dependent and not the case on RISC-V). Treating it as as virtual address when not loaded at 0 is therefore not appropriate (the pointer is dangling.) Add the dlpi_addr to dynamic entries that are treated as virtual addresses to compute their correct value. Signed-off-by: Evan Green <[email protected]> Co-developed-by: Rob Bradford <[email protected]>
Rather than relying on RTLD_NEXT to return the symbol and falling back to RTLD_DEFAULT instead lookup the symbol from the current shared library directly. This resolves an issue where on RISC-V in some circumstances the symbol returned by RTLD_NEXT is null and RTLD_DEFAULT then provides a pointer into the PLT stub. Since the PLT stub is too small to be patched by directly resolving the symbol here the PLT is avoided and the C library function can be directly patched. Signed-off-by: Rob Bradford <[email protected]>
The RV64 port does not have a custom memcpy() implementation. Signed-off-by: Rob Bradford <[email protected]>
Signed-off-by: Rob Bradford <[email protected]>
Use ucm_trace over ucm_debug and use lower case throughout. Signed-off-by: Rob Bradford <[email protected]>
And relocate to top of the function definition. Signed-off-by: Rob Bradford <[email protected]>
The arguments were wrongly labelled. Signed-off-by: Rob Bradford <[email protected]>
Signed-off-by: Rob Bradford <[email protected]>
On RISC-V for an atomic update the destination must be naturally aligned with the size of the type. Since the destinations in this case are addresses of library functions to patch we cannot control the addresses. Fortunately the instructions must be aligned to a 16-bit alignment which also matches the size of the patch on RISC-V (compressed J instruction.) For other architectures the size of patch is either 4 bytes (aarch64) or 2 bytes (x86-64) which are both handled by this patch. Signed-off-by: Rob Bradford <[email protected]>
Counting the trailing zeroes only works if the input value is strictly a power of two. Signed-off-by: Rob Bradford <[email protected]>
8628a4a
to
f60b36c
Compare
64f6f4b
to
c8e9502
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
These 3 files are missing compile guards around openmp pragmas.
./src/tools/perf/perftest.c
./src/tools/perf/lib/libperf.c
./test/apps/test_init_mt.c