Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol lookup error with precompiled binaries #24944

Open
mdorier opened this issue Jan 27, 2025 · 7 comments
Open

Symbol lookup error with precompiled binaries #24944

mdorier opened this issue Jan 27, 2025 · 7 comments
Labels
kind/bug Something isn't working

Comments

@mdorier
Copy link

mdorier commented Jan 27, 2025

Version & Environment

Redpanda version: (use rpk version): 24.2.7
Operating system: SUSE Linux Enterprise Server 15 SP5

What went wrong?

Trying to run the pre-compiled binaries for amd64 on this machine gives the following error:

[redacted path]/redpanda: symbol lookup error: /lib64/libdl.so.2: undefined symbol: _dl_catch_error, version GLIBC_PRIVATE

My understanding is that the release binaries include the libraries it depends on, and bin/redpanda sets LD_LIBRARY_PATH before running libexec/redpanda to pick up the right library. However, the archive doesn't ship libdl.so.2, so on my system it picks up the one in /lib64, which must be a different version.

JIRA Link: CORE-8920

@mdorier mdorier added the kind/bug Something isn't working label Jan 27, 2025
@dotnwat
Copy link
Member

dotnwat commented Jan 28, 2025

@mdorier thanks for the report. we primarily see ubuntu and rhel, so this may due to testing gap. i think it should be easy to fix this i'll report back.

@travisdowns
Copy link
Member

@mdorier can you clarify how you installed the Redpanda? Was it using the .rpm we provide?

@travisdowns
Copy link
Member

travisdowns commented Jan 28, 2025

FWIW using a SLES 15 SP5 container I was able to install the .rpm using zypper in redpanda=24.2.7-1 (after installing the Redpanda rpm source in the documented way), and it did start fine.

The redpanda binary in /opt/redpanda/libexec does not depend on libdl.so.2, so the most likely cause of this error is that some other library was loaded from an unexpected location which in turns tries to load libdl, e.g., the system libc.

Note that redpanda hard-codes the ELF interpreter path to /opt/redpanda/lib/ld.so so things should be unpacked under that directory so the interpreter is in the expected location.

@mdorier
Copy link
Author

mdorier commented Jan 28, 2025

Ok I have fixed the problem (got another problem though, but this particular symbol linking problem is fixed). For reference here is an explanation of what I'm doing:

I have noticed indeed that the binaries are supposed to go in /opt. However because I can't do that (no admin privileges), I did the following:

  • Install the binaries in "prefix" (e.g $HOME/redpanda);
  • Patch all the files in "prefix"/bin to replace /opt with "prefix" (so for instance bin/redpanda now has LD_LIBRARY_PATH pointing to "prefix"/lib, "prefix"/bin is prepended to PATH, and the exec command has the path to the redpanda binary in "prefix"/libexec);
  • Run patchelf --set-interpreter <prefix>/lib/ld.so <file> for all binary <file> in "prefix"/libexec.

This allowed redpanda to work on my Ubuntu 24.04 ARM64 machine. Without this I would get link errors caused by redpanda picking up the machine's system libraries instead of its own (I don't remember exactly the error but I believe it was coming from libc.so). However doing the exact same on the SLES machine wasn't working, and was giving me the link error with libdl.so.2.

I checked right now and on my Ubuntu, indeed, none of the libraries depend on libdl.so.2. On my SLES machine however, where I do the same tricks, all the libraries show a dependency to libdl.so.2. It also showed a dependency on a random .so file that I didn't know about, located in a strange path (/soft/xalt/3.0.2-202408282050/lib64/libxalt_init.so), and I figured out that this was the library asking for libdl.so, and it was coming from LD_PRELOAD set by the admins of the machine. Unsetting LD_PRELOAD before running redpanda made the problem disappear.

Now I have this problem:

WARN  2025-01-28 09:33:47,542 seastar - Requested AIO slots too large, please increase request capacity in /proc/sys/fs/aio-max-nr. configured:65536 available:65536 requested:705664
Could not initialize seastar: std::runtime_error (Could not setup Async I/O: Not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application)

I've seen this problem being discussed on some other issues, with the solution being to change /proc/sys/fs/aio-max-nr, which I cannot do (again, no admin privilege). I'll try using taskset to reduce the CPU set redpanda is allowed to use (the machine has 64 CPUs), but if you have any other suggestions, I'm all ears.

Feel free to close the issue.

@travisdowns
Copy link
Member

@mdorier thanks for the details! You could also consider running Redpanda in a container (assuming you can do that) which avoids the need to patch the binaries to reset the interpreter path.

@mdorier
Copy link
Author

mdorier commented Jan 28, 2025

Yes, though I would need to convert the redpanda image into an Apptainer image as I cannot run Docker (again, privilege issues). This would have been my next step, had I not found the cause of the libdl.so issue.

@travisdowns
Copy link
Member

@mdorier wrote:

Now I have this problem:

... the error message doesn't mention it, but you can also reduce demand for aio slots using the --max-networking-io-control-block command line argument. IIRC we request 2 aio slots of each of those, per shard. So try setting that to around 500 or 1000 and you should be able to start Redpanda.

The downside is that you can only support about that many connections efficiently (modulo a factor or 2 or so) per shard, so if you need more than a few hundred connections (per shard) things would slow down a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants