Skip to content

WIP / RFC: Define IO reading interface #57982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

jakobnissen
Copy link
Member

@jakobnissen jakobnissen commented Apr 2, 2025

This is a work in progress proposal to define the interface of the abstract type IO, and provide a robust, mid-level API that makes it possible to implement efficient, generic IO operations.

Discussion is very welcome, see https://hackmd.io/UgeBwtIkTTipkgZSQnXc8w#Discussion for more details and discussion of design decisions.

Having used this API for some of my own work, I must say I really like it.

This PR will, in time, have several parts:

  1. Define and document the core IO interface
  2. Use Base.IOBuffer as a test case for the new interface, to see if a) the API is pleasant, b) the performance is expected, and most importantly, c) this is non-breaking.
  3. Add a set of generic IO objects in Test and test the generic definitions using those, to make sure the generic fallbacks work for buffered IOs, unbuffered IOs, and IOs that do not implement this new interface at all (i.e. previously existing IOs)

TODO

Decisions

See https://hackmd.io/UgeBwtIkTTipkgZSQnXc8w for points of discussion

Timeline

I hope to finish this before the feature freeze of 1.13, but I'd rather have this be done well than done soon.

Things left out of this PR

  • Cancellation: The original IO discussion included a discussion of a cancellation API. This is somewhat orthogonal to the IO interface, and IOBuffer, being purely in-memory, does not need to work with cancellation. Therefore, this can be left to a future PR.
  • Writing interface: To start with, I'll only implement the reading part of the interface. If people are happy with this, I'll move on to the writing part in a later PR.
  • Deserialization / serialization: Methods like read(::IO, ::Type{Int}) are one of the main uses of IO. However, this is a different problem with a different design space.
  • Most generic methods: This PR was originally intended to also include new, efficient generic methods for IO. However, this is more successful than expected, in that a lot of current IO functions can be expressed efficiently in terms of the new IO, so this change would involve re-writing most of Base's IO, which would be several thousand LOC. I will leave that to a series of future PRs, if the core interface is accepted.

Closes #55835
Closes #47771

@nsajko nsajko added io Involving the I/O subsystem: libuv, read, write, etc. design Design of APIs or of the language itself labels Apr 2, 2025
base/io.jl Outdated
GC.@preserve ref unsafe_read(io, Ptr{UInt8}(pointer(ref))::Ptr{UInt8}, nbytes)
end

function unsafe_read(io::IO, dst::Ptr{UInt8}, nbytes::UInt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because there is an existing fallback method that uses only read(io, UInt8).

If I understand correctly, fillbuffer can fallback to return 0 and getbuffer can fallback to return an empty vector for IO without an underlying buffer. This would hit the isempty(buf) && iszero(nfilled) case.

Currently, you have this throw an EOFError, but you could instead do:

unsafe_store!(dst, read(io, UInt8)::UInt8)
dst += 1
nbytes -= 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fillbuffer can fall back to 0, but getbuffer should only be implemented if the IO is buffered. And having it buffered implies that when the buffer is empty, and fillbuffer returns 0, the IO is EOF. I'll clarify the documentation.

W.r.t the fallback calling read(io, UInt8) you're right. I just drew out the call graph for the current generic IO functions, and it's not too complex, actually. All reading functions fall back to read(io, UInt8).
I'm thinking of circumventing this by adding a check similar to this:

if readbuffering(typeof(io)) == NotBuffered() || hasmethod(getbuffer, Tuple{typeof(io)})
    # use methods relying on new interface
else
    # use method relying only on read(io, UInt8)
end

This will work, because, since IOs are buffered by default, IOs that have either opted out of buffering, or implemented getbuffer must be aware of the new interface. It's an ugly solution, but it'll work.

Another consideration for this function specifically, is whether we can write a fast, generic fallback for unsafe_read(::IO, src::Any, ::UInt). To do this, we need to be able to dispatch on whether we can write to src using a pointer, which we don't currently have any abstractions for. This is a hobby horse of mine, but for a reason; you really run into it again and again. I hope to be able to address this in this PR.

base/io.jl Outdated
Comment on lines 192 to 199
isempty(v) && return 0
buffer = @something get_nonempty_reading_buffer(io) return 0
mn = min(length(v), length(buffer))
copyto!(v, firstindex(v), buffer, 1, mn)
mn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this would somehow fall back to readbytes! to work well with existing IO types. For example, you could check isnothing(buffer) && !eof(io) (which should only happen for legacy IO types) and then fallback to readbytes! in that case.

Copy link
Member Author

@jakobnissen jakobnissen Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: This is a new function, so I think it would actually be a boon if it failed for old IOs. That will push people to implement this new API for old IO types, without breaking any existing code.
It will also simplify the implementation because we don't have to do hacky workarounds to support IOs which don't adhere to the (new) interface.

@jakobnissen jakobnissen changed the title WIP / RFC: Define IO interface WIP / RFC: Define IO reading interface Apr 3, 2025
@vtjnash
Copy link
Member

vtjnash commented Apr 10, 2025

I think there is a bit too much going on here to handle over a single PR review, though it is nice to have an overall view of what you're thinking. The main complaints I would have are:

  • Specific error types is going to be annoying to users, making the interface more difficult to use, and a breaking change. None of that is good. When you have multiple different things to return from the same interface (throw), that is a clear sign that those should be values, not types.
  • The current getbuffer design isn't usable when started with multiple threads (or even multiple Tasks) which would need to addressed before we could consider something like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

print(iobuffer, number) without calling string(number)? Unified I/O error type
4 participants