-
-
Notifications
You must be signed in to change notification settings - Fork 175
core: Better handle internally-used fds being shifted #2448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
013a8cc to
445ed77
Compare
|
Marked another internal file as persistent, fixing this issue: |
02102af to
f5789b9
Compare
|
Thanks for working on this! My first comment is that it's great the test passes, and nothing else fails! I'm trying to think of any other counterexamples - Let me look at the I think _SHELL_MIN_FD would no longer makes as much sense as a name, but let's see |
|
OK yeah there are some other Open() calls It might be that EVERY fd with Open() is persistent, in which case we don't need the param In that case we could get rid of the param, and the values for I want to look at some strace now too And try other things other than Sorry you had the hiccup with the tell() stuff ... It's a bit surprising that had to be added I think it may have been due to the pre-existing smell I mentioned -- the bad cast (which only accidentally works) |
|
|
||
| def ReplaceFd(self, fd): | ||
| # type: (mylib.LineReader) -> None | ||
| tell = self.f.tell() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I actually don't see why this works? Why do we need tell() and seek() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially - we want to restore the position we've read the previous file up to.
the position that DUP preserves doesn't really cover this use case - it's a bit confusing because what gets preserved automatically is how far Python has internally read the file into a buffer, and not how much we have actually read from the file. Here, without restoring our position explicitly, we lose most of the file because to the OS, we've already read it to the end:
>>> import posix
>>> import fcntl
>>> fd = posix.open('test.osh', posix.O_RDONLY, 0o666)
>>> f = posix.fdopen(fd)
>>> f.readline()
'# first line\n'
>>> f.readline()
'# second line\n'
>>> f.tell()
27
>>> new_fd = fcntl.fcntl(fd, fcntl.F_DUPFD, 25)
>>> new_fd
25
>>> g = posix.fdopen(new_fd)
>>> g.tell()
3229
>>> g.readline()
'' <<< (should be '# third line', instead it's EOF)(it's a bit different in the C++ implementation, but it's the same principle)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, that is interesting, but based on experience, I'm a bit "scared" of doing this
Actually one thing that became apparent to me is that shells PREDATE libc, and they don't use libc I/O -- they use POSIX I/O
Oils is unusual in that it uses libc I/O (which has buffering), and I might want to change that
that is
- libc:
FILE*fwrite, fread, fseek, ftell - posix: read, write, lseek, fcntl
i.e. it might be a bit bad that we are mixiing fcntl and fwrite/fwrite -- most shells don't do that
There can be buffering bugs
That said, this is a great change to explore the problem
I'm glad you were able to fix the test so quickly, and also understand mycpp so quickly! (even with some unfortunate warts)
I think we should look into the busybox ash algorithm, as mentioned on Zulip
A funny thing is that it's basically the dash source concatenated into a big 13K line file
https://github.com/brgl/busybox/blob/master/shell/ash.c
(btw one thing I did once is just upload this whole file into Claude, and then ask it questions ... it does hallucinate and get confused of course, but I think it was net helpful at least a few times ... I have to be real skeptical)
|
Hm to compare this PR against other shells, how about we change _SHELL_MIN_FD to 10: Right now it still fails And then I would like to compare the I'm not sure how they compare right now, nor exactly what algorithm busybox uses ! |
f5789b9 to
278d550
Compare
I've dropped it from 25 to 10. Bash actually handles ulimit < 10 as well: |
278d550 to
e3335e2
Compare
I've gone through all the uses of |
|
Added spec tests for the issues covered in this PR ( |
|
I wrote a script to compare the left is dash, right is OSH So it looks like we have 2 openat() calls now, which I don't think we should have |
|
and then here is busybox ash versus OSH on this branch for descriptor 10 Now you can see the Let me look at the I think that we have a lot more info than we did , and we are much closer ! But we should do something more like busybox (a minor difference between dash/OSH and busybox ash is that busybox uses |
|
Aside: I noticed that the flag busybox uses is in POSIX, so OSH could use it too https://linux.die.net/man/2/fcntl
|
|
FWIW I submitted the script I used to generate those traces: f10f21e And of course we can figure out what busybox is doing together -- that is still a bit of a mystery to me I tried to get it to "break" by messing with its FD, but it doesn't |
|
BTW I realized this extra part of the OSH trace is not related to this PR This is related ot the fact that I was thrown off by that a bit, and changed the |
Since _SHELL_MIN_FD=100, osh fails to build some Alpine packages:
$ osh -c 'ulimit -n 64; echo hi >out'
[ -c flag ]:1: I/O error applying redirect: Invalid argument'
(This is because osh attempts to save stdout's fd before redirection to
a descriptor >=100, which 'ulimit -n 64' doesn't allow)
Currently we can't drop _SHELL_MIN_FD lower, because scripts like this
currently crash osh (users are allowed to use fds<=99):
$ cat test.osh
exec 10>out
echo hello>&10
cat out
$ bin/osh test.osh
hello
oils I/O error (main): Bad file descriptor
This happens due to osh mishandling "internal" file descriptors, in this
case the script file itself, as can be seen from the following strace
log:
openat(AT_FDCWD, "test.osh", O_RDONLY) = 3 <<<<
fcntl(3, F_DUPFD, 10) = 10 <<<<
close(3) = 0
openat(AT_FDCWD, "out", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fcntl(10, F_DUPFD, 10) = 11 <<<<<
close(10) = 0
dup2(3, 10) = 10
close(3) = 0
close(11) = 0 <<<< it's closed immediately after being saved
fcntl(1, F_DUPFD, 10) = 11
close(1) = 0
dup2(10, 1) = 1
dup2(11, 1) = 1
close(11) = 0
openat(AT_FDCWD, "out", O_RDONLY) = 3
close(3) = 0
hello
read(10, 0x55bfa557e240, 4096) = -1 EBADF (Bad file descriptor) <<< still 10
oils I/O error (main): Bad file descriptor
close(10) = 0
So, add some bookkeeping to FdState - it now tracks whether a file
descriptor is persistent and invokes a callback when such persistent
file descriptors are moved by osh.
Add a callback to FileLineReader - it seek()-s in the new file to
restore the position in the current file and changes file descriptors.
There is currently only one usage of a persistent fd with a callback -
FileLineReader for the script file, since that's what triggers the bug
above. Additional users might need to be changed as more bugs are
discovered - these are currently ignoring the second return value from
fd_state.Open()
Drop _SHELL_MIN_FD to 10, justify the reasoning in a comment.
With these changes, the above test.osh script works as intended, with
internal fds being shifted away from user-requested fds. This
potentially allows us to drop _SHELL_MIN_FD further in the future and
untangle it from the limit on fds users can use in redirects.
Signed-off-by: Andrii Sultanov <[email protected]>
This fixes the following issue:
$ cat source.osh
exec 25>out
echo hello>&25
cat out
$ bin/osh -c '. ./source.osh'
hello
[ -c flag ]:1: . builtin I/O error: Bad file descriptor
Signed-off-by: Andrii Sultanov <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
This fixes the following issue:
$ cat source.osh
exec 10>out
echo hello>&10
cat out
$ bin/osh --eval source.osh
hello
oils I/O error (main): Bad file descriptor
Signed-off-by: Andrii Sultanov <[email protected]>
This fixes the following issue:
$ cat ~/.config/oils/oshrc
exec 10>out
echo hello>&10
cat out
$ bin/osh
hello
oils I/O error (main): Bad file descriptor
Signed-off-by: Andrii Sultanov <[email protected]>
91bc0f0 to
56df453
Compare
Since this PR dropped _SHELL_MIN_FD below 64, the ulimit test started passing. Signed-off-by: Andrii Sultanov <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
56df453 to
81b1dac
Compare
|
I've rebased on top of |
Since
_SHELL_MIN_FD=100, osh fails to build some Alpine packages (#2335):(This is because osh attempts to save stdout's fd before redirection to a descriptor >=100, which
ulimit -n 64doesn't allow)Currently we can't drop
_SHELL_MIN_FDlower, because scripts like this crash osh (users are allowed to use fds<=99):This happens due to osh mishandling "internal" file descriptors, in this case the script file itself, as can be seen from the following strace log:
So, add some bookkeeping to
FdState- it now tracks whether a file descriptor is persistent and invokes a callback when such persistent file descriptors are moved by osh.There is currently only one usage of a persistent fd with a callback -
FileLineReaderfor the script file, since that's what triggers the bug above. Additional users might need to be added as more bugs are discovered.Drop
_SHELL_MIN_FDto 25 for now.With these changes, the above
test.oshscript works as intended, with internal fds being shifted away from user-requested fds. This potentially allows us to drop _SHELL_MIN_FD further in the future and untangle it from the limit on fds users can use in redirects.