WIP / RFC: Define IO reading interface #57982

jakobnissen · 2025-04-02T09:08:30Z

This is a work in progress proposal to define the interface of the abstract type IO, and provide a robust, mid-level API that makes it possible to implement efficient, generic IO operations.

Discussion is very welcome, see https://hackmd.io/UgeBwtIkTTipkgZSQnXc8w#Discussion for more details and discussion of design decisions.

Having used this API for some of my own work, I must say I really like it.

This PR will, in time, have several parts:

Define and document the core IO interface
Use Base.IOBuffer as a test case for the new interface, to see if a) the API is pleasant, b) the performance is expected, and most importantly, c) this is non-breaking.
Add a set of generic IO objects in Test and test the generic definitions using those, to make sure the generic fallbacks work for buffered IOs, unbuffered IOs, and IOs that do not implement this new interface at all (i.e. previously existing IOs)

TODO

Add generic IO types in Test and test all new IO methods
Figure out what to do about the current system's dependence on pointer APIs.
Add documentation, including a manual section
Add NEWS

Decisions

See https://hackmd.io/UgeBwtIkTTipkgZSQnXc8w for points of discussion

Timeline

I hope to finish this before the feature freeze of 1.13, but I'd rather have this be done well than done soon.

Things left out of this PR

Cancellation: The original IO discussion included a discussion of a cancellation API. This is somewhat orthogonal to the IO interface, and IOBuffer, being purely in-memory, does not need to work with cancellation. Therefore, this can be left to a future PR.
Writing interface: To start with, I'll only implement the reading part of the interface. If people are happy with this, I'll move on to the writing part in a later PR.
Deserialization / serialization: Methods like read(::IO, ::Type{Int}) are one of the main uses of IO. However, this is a different problem with a different design space.
Most generic methods: This PR was originally intended to also include new, efficient generic methods for IO. However, this is more successful than expected, in that a lot of current IO functions can be expressed efficiently in terms of the new IO, so this change would involve re-writing most of Base's IO, which would be several thousand LOC. I will leave that to a series of future PRs, if the core interface is accepted.

Closes #55835
Closes #47771

base/io.jl

nhz2 · 2025-04-02T19:34:30Z

base/io.jl

+    GC.@preserve ref unsafe_read(io, Ptr{UInt8}(pointer(ref))::Ptr{UInt8}, nbytes)
+end
+
+function unsafe_read(io::IO, dst::Ptr{UInt8}, nbytes::UInt)


This is tricky because there is an existing fallback method that uses only read(io, UInt8).

If I understand correctly, fillbuffer can fallback to return 0 and getbuffer can fallback to return an empty vector for IO without an underlying buffer. This would hit the isempty(buf) && iszero(nfilled) case.

Currently, you have this throw an EOFError, but you could instead do:

unsafe_store!(dst, read(io, UInt8)::UInt8) dst += 1 nbytes -= 1

fillbuffer can fall back to 0, but getbuffer should only be implemented if the IO is buffered. And having it buffered implies that when the buffer is empty, and fillbuffer returns 0, the IO is EOF. I'll clarify the documentation.

W.r.t the fallback calling read(io, UInt8) you're right. I just drew out the call graph for the current generic IO functions, and it's not too complex, actually. All reading functions fall back to read(io, UInt8).
I'm thinking of circumventing this by adding a check similar to this:

if readbuffering(typeof(io)) == NotBuffered() || hasmethod(getbuffer, Tuple{typeof(io)}) # use methods relying on new interface else # use method relying only on read(io, UInt8) end

This will work, because, since IOs are buffered by default, IOs that have either opted out of buffering, or implemented getbuffer must be aware of the new interface. It's an ugly solution, but it'll work.

Another consideration for this function specifically, is whether we can write a fast, generic fallback for unsafe_read(::IO, src::Any, ::UInt). To do this, we need to be able to dispatch on whether we can write to src using a pointer, which we don't currently have any abstractions for. This is a hobby horse of mine, but for a reason; you really run into it again and again. I hope to be able to address this in this PR.

nhz2 · 2025-04-02T20:03:36Z

base/io.jl

+    isempty(v) && return 0
+    buffer = @something get_nonempty_reading_buffer(io) return 0
+    mn = min(length(v), length(buffer))
+    copyto!(v, firstindex(v), buffer, 1, mn)
+    mn


Ideally, this would somehow fall back to readbytes! to work well with existing IO types. For example, you could check isnothing(buffer) && !eof(io) (which should only happen for legacy IO types) and then fallback to readbytes! in that case.

Edit: This is a new function, so I think it would actually be a boon if it failed for old IOs. That will push people to implement this new API for old IO types, without breaking any existing code.
It will also simplify the implementation because we don't have to do hacky workarounds to support IOs which don't adhere to the (new) interface.

vtjnash · 2025-04-10T20:08:07Z

I think there is a bit too much going on here to handle over a single PR review, though it is nice to have an overall view of what you're thinking. The main complaints I would have are:

Specific error types is going to be annoying to users, making the interface more difficult to use, and a breaking change. None of that is good. When you have multiple different things to return from the same interface (throw), that is a clear sign that those should be values, not types.
The current getbuffer design isn't usable when started with multiple threads (or even multiple Tasks) which would need to addressed before we could consider something like this

nsajko added io Involving the I/O subsystem: libuv, read, write, etc. design Design of APIs or of the language itself labels Apr 2, 2025

nhz2 reviewed Apr 2, 2025

View reviewed changes

jakobnissen changed the title ~~WIP / RFC: Define IO interface~~ WIP / RFC: Define IO reading interface Apr 3, 2025

jakobnissen mentioned this pull request Apr 7, 2025

Deprecate mark, unmark, ismarked, reset #58034

Open

jakobnissen added 11 commits April 7, 2025 15:46

WIP: Define IO interface

0adefdd

Begin transitioning IOBuffer to use interface

bcd20e3

Implement read of byte

8e28beb

Exports

64639d8

Address comments

87442c4

Start writing tests

70a21dc

Add tests for generic IOs

0fca02d

Add functions and interface to docs

20b6585

Fixup: Docs

2a600c6

Test read and readeach

dbda007

Misc documentation changes

917d048

jakobnissen force-pushed the io_interface branch from 7914676 to 917d048 Compare April 7, 2025 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP / RFC: Define IO reading interface #57982

WIP / RFC: Define IO reading interface #57982

jakobnissen commented Apr 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nhz2 Apr 2, 2025

Uh oh!

jakobnissen Apr 3, 2025

Uh oh!

nhz2 Apr 2, 2025

Uh oh!

jakobnissen Apr 3, 2025 •

edited

Loading

Uh oh!

vtjnash commented Apr 10, 2025

Uh oh!

Uh oh!

Uh oh!

WIP / RFC: Define IO reading interface #57982

Are you sure you want to change the base?

WIP / RFC: Define IO reading interface #57982

Conversation

jakobnissen commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Decisions

Timeline

Things left out of this PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nhz2 Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

jakobnissen Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

nhz2 Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

jakobnissen Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vtjnash commented Apr 10, 2025

Uh oh!

Uh oh!

jakobnissen commented Apr 2, 2025 •

edited

Loading

jakobnissen Apr 3, 2025 •

edited

Loading