Skip to content

umbrella bug: asyncio interoperability #171

Closed
@njsmith

Description

@njsmith

There are several discussions here that are somewhat logically independent, but linked:

  • Should we use the asyncio (or some other third party) event loop internally to implement our lowest-level IO primitives? This wouldn't necessarily change anything user visible; it would just outsource the dirty business of calling epoll and friends to someone else. Let's call this feature io-via-asyncio.

  • Should it be possible to run trio on top of an asyncio event loop? e.g., loop.run_until_complete(trio.run_in_asyncio, ...), to allow asyncio applications to call into trio code. Let's call this trio-libs-on-asyncio.

  • How can we best allow asyncio libraries to be used from trio? Let's call this asyncio-libs-on-trio.

Some initial thoughts:

io-via-asyncio

The major challenges here would be in coming up with a shim layer to implement trio's semantics in terms of the asyncio APIs, and extend those APIs where necessary. @1st1 has offered to add whatever APIs we need, which is great, but it isn't immediately obvious where to start.

In practice, it's very unlikely we could actually use the stdlib asyncio default event loop, because we'll definitely need at least some enhancements and bug fixes and the stdlib doesn't really get those on any kind of useful schedule. (Curio's experience with the selectors module has also made me wary of depending on the stdlib for this kind of thing.) So the assumption should be that we'd be using a third-party implementation like uvloop, or a hot-off-the-presses unstable version of asyncio ripped out of cpython master. (And this makes it faster to get bug fixes, but I think enhancements would still need to go on the PEP / CPython release timescale?)

uvloop doesn't play well with pypy (because of the cython), and inherits a number of limitations from libuv, e.g. no pluggable clock support, no cancellation support for most operations on windows (libuv source code does not contain the string CancelIoEx), and I don't see how to make wait_all_tasks_blocked work without some pretty extreme workarounds. If the pitch is "but this way you don't have to write your own I/O code!" then the part where I end up having to write a bunch of I/O code in C makes it somewhat less compelling :-(

And obviously the stdlib asyncio loops have similar limitations, or else uvloop couldn't be a drop-in replacement, in addition to the part where they aren't currently shipped in a usable form. And e.g. AFAICT from a quick look asyncio's IOCP cancellation is just broken (_winapi.Overlapped.cancel is synchronous, but CancelIoEx is asynchronous – see e.g. and search for "Wait for the I/O subsystem to acknowledge our cancellation"). Plus – and this is perhaps the most important issue – the current API is rather limited and using it would require overcoming some extreme abstraction skew. Implementing a fake socket object on top of a protocol/transport pair is going to involve a lot of complicated and relatively inefficient code, and then how do we implement sendmsg? Or raw or seqpacket or AF_BLUETOOTH sockets? trio supports all this stuff right now.

That's assuming we use the "protocol" APIs, which are the main ones and the only fully portable ones. We could also potentially restrict ourselves to just using add_reader and add_writer, and sticking to the select reactor on Windows (since the iocp reactor doesn't support these); that's actually enough to implement all the things we really support right now. But then we can't properly implement things like subprocess support (requires access to the kqueue object on kqueue platforms), or ever support iocp (see #52).

With sufficient effort, many of these limitations can be overcome or lived with. But it does look like some substantial effort. And it'll likely make things slower and more brittle architecturally.

Against this, the primary advantage would be that we don't have to maintain our own IO code. Given that IO code tends to be extremely tricky and have many obscure corner cases, this would be good. If we were using asyncio, then we could automatically take advantage of their bug fixes, and any work we put into testing and fixing bugs would automatically benefit all of asyncio's users.

This advantage isn't urgent, though, in the sense that what we have right now works, and (assuming the issues above are somehow fixed/worked around) we could switch what we do internally at any time.

And fundamentally, this isn't on trio's critical path: trio is an experiment, and the question we're trying to discover the answer to is whether trio's developer experience is so overwhelmingly better than traditional callback-based libraries that it can overcome their head start on ecosystems / maturity / familiarity. A 50% increase or reduction in the number of rare and obscure I/O bugs is not going to change the answer to this question.

So in the short/medium term this suggests that we should stick with our code, see how much trouble it causes, and continue to weigh that against the costs of switching. If we find ourselves wasting weeks trying to figure out why our tcp stack is flaking out then that'd be a pretty good sign that we're on the wrong path. I honestly find it hard to predict; trio's code is written extremely carefully, and taking full advantage of every existing source I can get my hands on (by which I mean: blatantly stealing twisted's hard-won knowledge at every opportunity), but IO is hard.

The other advantage that this doesn't consider is that io-via-asyncio might help with the trio-libs-on-asyncio or asyncio-libs-on-trio features, so lets consider those.

trio-libs-on-asyncio

This pretty much has io-via-asyncio as a minimal prerequisite, so see above. (We wouldn't necessarily have to switch to using only asyncio, but we would at least need to implement an asyncio backend, which is basically all the work.) In addition it would require some rearrangement of the run interface, which is not a big deal. And... there currently aren't any trio libs that people want to run on asyncio right now, so it doesn't seem super urgent :-). Perhaps it would attract people to trio if they thought that it was a good way to make libraries that work everywhere? But if that's your main motivation then even if we implemented this you would probably still be better off asyncio (or twisted or gevent) instead of trio.

The thing is, even if we made trio.run work as an asyncio coroutine, there still wouldn't be any sensible way for the asyncio and trio worlds to talk to each other. I guess it's fine if the only thing your library needs to expose is one-and-done functions that can be executed via await trio.run_under_asyncio(...) calls, but that leaves out a lot of use cases.

I'm not sure what a sensible communications channel would look like. Some sort of cross-world Queue object?

In general, it seems unlikely that using asyncio libraries on trio is never going to feel very natural, and ditto using trio libraries on asyncio is never going to feel very natural, because they have such different idiomatic ways of structuring code.

asyncio-libs-on-trio

At least in the short term, this seems like the most interesting option. (And it might be very helpful in getting us over the early adoption hump!) What concerns me is how to let asyncio into the trio world in a controlled fashion that doesn't end up breaking all of trio's carefully created invariants. Maybe this is silly and we'd be better of just YOLOing it, worse-is-better style, but it makes me nervous. For example: if there's a global event loop full of callback spaghetti coexisting with trio's tidy task tree, then how do we do things like figure out when to exit? (Maybe 3.7's asyncio will be better in this regard; I know Yury is planning to propose a curio/trio-inspired loop.run in his asyncio updates PEP.)

One possible way to do this would be to not have a global loop object, but instead treat loops as something like nurseries: a specific place bound to a specific task. with open_asyncio_loop() as loop: await loop.run_until_complete(...).

I have no idea how silly this would be. Technically what it would require is essentially implementing a custom asyncio event loop on top of trio's public API (so at least it has the advantage that it's something that doesn't require rearchitecting the entire trio core, and in fact could live in a separate library). I don't think we'd know how difficult this is until we try it – there's a bunch of stuff in the event loop interface, but most of it seems like it should map pretty straightforwardly onto relatively short trio code? And asyncio is designed to support the creation of new event loops by subclassing (sigh).

This seems like the most interesting place to experiment in the short term, though.

note

These are very much "initial thoughts" as mentioned above; lets use this thread for further discussion.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions