-
-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use zmq-anyio #1291
base: main
Are you sure you want to change the base?
Use zmq-anyio #1291
Conversation
I can see with this PR that ipykernel runs on trio when hard-coding the backend here, but how can we choose e.g. from JupyterLab to set |
Nice! So it looks like for compatibility with Windows, you've gone with spawning a
This should presumably be the same as any other Kernel configuration option, so a |
Yes, and I think Tornado does something similar here. This means a Thanks for the kernel configuration, I'll try that 👍 |
It does. The big difference is Tornado starts one selector thread per event loop which is far more scalable, whereas zmq-anyio starts one per socket. This makes sense from a library simplicity standpoint since there isn't a thread running when you are done with a socket, but definitely isn't scalable and probably isn't what we should do long term (and why I think anyio should have this built-in, just like tornado). As I understand it, this means anyio will spawn up to 40 threads by default. That's might be okay for ipykernel, but I'd say it does mean we shouldn't use zmq-anyio in any client places like jupyter-client or jupyter-server. Because as soon as you've got 40 idle zmq sockets waiting for a message (what they spend most of their time doing), any subsequent calls to You might be able to provoke this in ipykernel by spawning 41 subshells and not using them, since I think each one adds a socket that will be idle. You could limit the starvation by making the |
Interesting, I hadn't thought about that. |
Sure! I think that's sensible. I don't have enough experience with the task group hierarchy stuff to know what that should look like. I think it's probably appropriate to have some tests in zmq-anyio with a lot of idle sockets (at least more than the thread count, which I think can be set to 1 or 2) to probe this stuff. If I were the one writing it, I'd implement a
You should be able to base it on anyio.wait_socket_readable which assumes A smaller, but maybe less clean and less efficient version with a one-time monkeypatch: if windows and asyncio and proactor:
# only needed once per asyncio event loop, this is the only situation where a patch is needed
loop = asyncio.get_running_loop()
loop.add_reader = selector_add_reader # from tornado's AddThreadSelector
loop.remove_reader = selector_remove_reader # from tornado's AddThreadSelector
...
# assume wait_socket_readable works, which it should now
await anyio.wait_socket_readable(socket.fromfd(zmq_sock.FD))
# hopefully anyio will fix integer FD support to match underlying asyncio and trio If you did any of those, there would be the advantage that no actual thread is spawned except in the Windows + Proactor + asyncio case, which would get exactly one thread. fwiw, I started to extract the tornado feature into its own package, but haven't tested it enough to publish a release if there's some reason to not depend on tornado for this feature (I don't think there is): https://github.com/minrk/async-selector-thread. Requiring tornado for this doesn't mean the tornado IOLoop object ever needs to be created, as the SelectorThread logic is pure asyncio, so there's really no reason not to require tornado for this as long as it's the only package with the required feature. |
Thanks @minrk, that was very helpful. |
8c04773
to
90f12c2
Compare
I had another thought where you could shutdown the thread if nothing is waiting (when remove_reader is called). This might play nicer with anyio's design of shutting things down when they aren't in use, and you don't need anything hooked up to close unless it's called while waiting on a socket. But it comes at a performance cost because you are probably going to recreate the thread a whole bunch of times (once per message if you only have one socket). I don't actually think we should do that, but it's an idea if there are objections to leaving an idle thread running. But really by far the most efficient approach is ZMQStream's event-driven on_recv, which registers the FD exactly once and calls handle_events whenever there might be a message, rather than calling add_reader and remove_reader for every message. |
90f12c2
to
ee38f9e
Compare
e7a0fde
to
1fe492a
Compare
Still a few tests failing, and trio is not enabled in tests (more failures), but this is taking shape. |
8709b51
to
1834b58
Compare
I'm not sure about the tests that time out, it never happens locally on my machine. |
83cde40
to
19d61d0
Compare
f0818b6
to
99f57ef
Compare
99f57ef
to
b5d7542
Compare
I would like to understand how this PR fits into v7 of ipykernel and the motivation for this change, is there an issue describing what are the benefits for users/developers and the plan in more general? |
Ipykernel is now based on AnyIO, but since pyzmq doesn't support AnyIO, it prevents ipykernel using a Trio event loop. After talking with @minrk, I started zmq-anyio to fill that gap and we upstreamed Windows support for reading sockets in the |
Apologies for rather fundamental questions, I am just not very familiar with the topic.
Does using Trio event loop in 6.x require users to opt-in? How widely is it used? What are its advantages? |
Yes, it was opt-in.
Hard to say if it is used a lot in ipykernel, but Trio is getting more and more popular:
Trio pioneered structured concurrency, which is what AnyIO brings to asyncio. We use it in pycrdt-websocket, jupyverse... In general, level-based cancellation is considered to be a superior solution to edge-based cancellation. |
@minrk this is using an AnyIO-compatible pyzmq API (from https://github.com/davidbrochart/zmq-anyio), as discussed in zeromq/pyzmq#2045.