Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not close FDs 0, 1, or 2 #186

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

DemiMarie
Copy link
Contributor

If they are closed, another file descriptor could be created with these numbers, and so standard library functions that use them might write to an unwanted place. dup2() a file descriptor to /dev/null over them instead.

Copy link

codecov bot commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 54.26357% with 59 lines in your changes missing coverage. Please review.

Project coverage is 78.96%. Comparing base (63e0699) to head (61a0261).

Files with missing lines Patch % Lines
agent/qrexec-agent.c 55.10% 44 Missing ⚠️
libqrexec/exec.c 53.33% 7 Missing ⚠️
agent/qrexec-exec-program.c 50.00% 6 Missing ⚠️
agent/qrexec-fork-server.c 0.00% 1 Missing ⚠️
daemon/qrexec-daemon.c 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #186      +/-   ##
==========================================
+ Coverage   78.84%   78.96%   +0.11%     
==========================================
  Files          55       56       +1     
  Lines       10146    10183      +37     
==========================================
+ Hits         8000     8041      +41     
+ Misses       2146     2142       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DemiMarie
Copy link
Contributor Author

Codecov appears to not be testing what happens in the child process after fork() and the error path of “cannot open /dev/null”.

@marmarek
Copy link
Member

AFAIR unit tests do not cover the PAM handling part, as they are not running as a system service, test runners don't have necessary PAM configuration etc.

@qubesos-bot
Copy link

qubesos-bot commented Jan 11, 2025

OpenQA test summary

Complete test suite and dependencies: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2025021202-4.3&flavor=pull-requests

Test run included the following:

New failures, excluding unstable

Compared to: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2025020404-4.3&flavor=update

Failed tests

51 failures

Fixed failures

Compared to: https://openqa.qubes-os.org/tests/127852#dependencies

30 fixed
  • system_tests_suspend

    • mount_and_boot_options: unnamed test (unknown)
    • mount_and_boot_options: Failed (test died)
      # Test died: no candidate needle with tag(s) 'x11' matched...
  • system_tests_dispvm

    • TC_20_DispVM_fedora-41-xfce: test_100_open_in_dispvm (failure)
      AssertionError: './open-file test.txt' failed with ./open-file test...
  • system_tests_devices

    • TC_00_List_whonix-gateway-17: test_000_list_loop (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-gateway-17: test_001_list_loop_mounted (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-gateway-17: test_010_list_dm (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-gateway-17: test_011_list_dm_mounted (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-gateway-17: test_012_list_dm_delayed (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-gateway-17: test_013_list_dm_removed (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-gateway-17: test_020_list_loop_partition (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-gateway-17: test_021_list_loop_partition_mounted (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-workstation-17: test_000_list_loop (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-workstation-17: test_001_list_loop_mounted (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-workstation-17: test_010_list_dm (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-workstation-17: test_011_list_dm_mounted (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-workstation-17: test_012_list_dm_delayed (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-workstation-17: test_013_list_dm_removed (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-workstation-17: test_020_list_loop_partition (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_00_List_whonix-workstation-17: test_021_list_loop_partition_mounted (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_10_Attach_whonix-gateway-17: test_000_attach_reattach (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

    • TC_10_Attach_whonix-workstation-17: test_000_attach_reattach (error)
      subprocess.CalledProcessError: Command 'set -e;truncate -s 128M /tm...

  • system_tests_kde_gui_interactive

    • clipboard_and_web: unnamed test (unknown)
    • clipboard_and_web: Failed (test died)
      # Test died: no candidate needle with tag(s) 'menu-vm-firefox' matc...
  • system_tests_audio

  • system_tests_qrexec_perf@hw1

    • TC_00_QrexecPerf_debian-12-xfce: test_110_simple_data_duplex (failure)
      AssertionError: '/usr/lib/qubes/tests/qrexec_perf.py --vm1=test-ins...
  • system_tests_storage_perf@hw1

    • integ: storage_perf (error)
      ModuleNotFoundError: No module named 'qubes.tests.integ.storage_perf'
  • system_tests_basic_vm_qrexec_gui_ext4

    • switch_pool: Failed (test died)
      # Test died: command 'printf "label: gpt\n,,L" | sfdisk /dev/sdb' f...
  • system_tests_backup

    • TC_10_BackupVM_whonix-gateway-17: test_110_send_to_vm_no_space (error)
      subprocess.CalledProcessError: Command 'mknod /dev/loop0 b 7 0;trun...

    • TC_10_BackupVM_whonix-workstation-17: test_110_send_to_vm_no_space (error)
      subprocess.CalledProcessError: Command 'mknod /dev/loop0 b 7 0;trun...

Unstable tests

## Performance Tests ### Performance degradation: No issues ### Remaining performance tests:
52 tests
  • debian-12-xfce_exec: 8.32
  • debian-12-xfce_exec-root: 29.00
  • debian-12-xfce_socket: 8.68
  • debian-12-xfce_socket-root: 8.45
  • debian-12-xfce_exec-data-simplex: 43.35
  • debian-12-xfce_exec-data-duplex: 48.60
  • debian-12-xfce_exec-data-duplex-root: 65.68
  • debian-12-xfce_socket-data-duplex: 77.61
  • fedora-41-xfce_exec: 9.04
  • fedora-41-xfce_exec-root: 69.49
  • fedora-41-xfce_socket: 8.58
  • fedora-41-xfce_socket-root: 8.64
  • fedora-41-xfce_exec-data-simplex: 50.97
  • fedora-41-xfce_exec-data-duplex: 50.58
  • fedora-41-xfce_exec-data-duplex-root: 82.83
  • fedora-41-xfce_socket-data-duplex: 74.02
  • whonix-gateway-17_exec: 6.17
  • whonix-gateway-17_exec-root: 38.73
  • whonix-gateway-17_socket: 7.88
  • whonix-gateway-17_socket-root: 7.57
  • whonix-gateway-17_exec-data-simplex: 48.12
  • whonix-gateway-17_exec-data-duplex: 49.30
  • whonix-gateway-17_exec-data-duplex-root: 71.98
  • whonix-gateway-17_socket-data-duplex: 84.49
  • whonix-workstation-17_exec: 8.25
  • whonix-workstation-17_exec-root: 53.38
  • whonix-workstation-17_socket: 8.00
  • whonix-workstation-17_socket-root: 8.17
  • whonix-workstation-17_exec-data-simplex: 44.65
  • whonix-workstation-17_exec-data-duplex: 46.51
  • whonix-workstation-17_exec-data-duplex-root: 79.11
  • whonix-workstation-17_socket-data-duplex: 83.74
  • dom0_root_rand-read 3:read_bandwidth_kb: 10011.00
  • dom0_root_rand-write 3:write_bandwidth_kb: 13604.00
  • dom0_root_seq-read 3:read_bandwidth_kb: 426924.00
  • dom0_root_seq-write 3:write_bandwidth_kb: 185752.00
  • dom0_varlibqubes_rand-read 3:read_bandwidth_kb: 13427.00
  • dom0_varlibqubes_rand-write 3:write_bandwidth_kb: 23004.00
  • dom0_varlibqubes_seq-read 3:read_bandwidth_kb: 527677.00
  • dom0_varlibqubes_seq-write 3:write_bandwidth_kb: 223784.00
  • fedora-41-xfce_root_rand-read 3:read_bandwidth_kb: 9363.00
  • fedora-41-xfce_root_rand-write 3:write_bandwidth_kb: 14741.00
  • fedora-41-xfce_root_seq-read 3:read_bandwidth_kb: 413697.00
  • fedora-41-xfce_root_seq-write 3:write_bandwidth_kb: 160415.00
  • fedora-41-xfce_private_rand-read 3:read_bandwidth_kb: 9115.00
  • fedora-41-xfce_private_rand-write 3:write_bandwidth_kb: 15506.00
  • fedora-41-xfce_private_seq-read 3:read_bandwidth_kb: 412866.00
  • fedora-41-xfce_private_seq-write 3:write_bandwidth_kb: 92761.00
  • fedora-41-xfce_volatile_rand-read 3:read_bandwidth_kb: 9011.00
  • fedora-41-xfce_volatile_rand-write 3:write_bandwidth_kb: 13249.00
  • fedora-41-xfce_volatile_seq-read 3:read_bandwidth_kb: 415243.00
  • fedora-41-xfce_volatile_seq-write 3:write_bandwidth_kb: 67795.00

@marmarek
Copy link
Member

system_tests_qrexec

* TC_00_Qrexec_debian-12-xfce: [test_053_qrexec_vm_service_eof_reverse](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_debian-12-xfce/4) (failure)
  `AssertionError: Timeout, probably EOF wasn't transferred`

* TC_00_Qrexec_debian-12-xfce: [test_092_qrexec_service_socket_dom0_eof_reverse](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_debian-12-xfce/18) (failure)
  `AssertionError: service timeout, probably EOF wasn't transferred fr...`

* TC_00_Qrexec_debian-12-xfce: [test_098_qrexec_service_socket_vm_eof](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_debian-12-xfce/23) (failure)
  `AssertionError: service timeout, probably EOF wasn't transferred to...`

* TC_00_Qrexec_fedora-41-xfce: [test_053_qrexec_vm_service_eof_reverse](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_fedora-41-xfce/4) (failure)
  `AssertionError: Timeout, probably EOF wasn't transferred`

* TC_00_Qrexec_fedora-41-xfce: [test_092_qrexec_service_socket_dom0_eof_reverse](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_fedora-41-xfce/18) (failure)
  `AssertionError: service timeout, probably EOF wasn't transferred fr...`

* TC_00_Qrexec_fedora-41-xfce: [test_098_qrexec_service_socket_vm_eof](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_fedora-41-xfce/23) (failure)
  `AssertionError: service timeout, probably EOF wasn't transferred to...`

* TC_00_Qrexec_whonix-gateway-17: [test_053_qrexec_vm_service_eof_reverse](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_whonix-gateway-17/4) (failure)
  `AssertionError: Timeout, probably EOF wasn't transferred`

* TC_00_Qrexec_whonix-gateway-17: [test_083_qrexec_service_argument_specific_implementation](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_whonix-gateway-17/14) (error)
  `subprocess.CalledProcessError: Command '/usr/lib/qubes/qrexec-clien...`

* TC_00_Qrexec_whonix-gateway-17: [test_092_qrexec_service_socket_dom0_eof_reverse](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_whonix-gateway-17/18) (failure)
  `AssertionError: service timeout, probably EOF wasn't transferred fr...`

* TC_00_Qrexec_whonix-gateway-17: [test_098_qrexec_service_socket_vm_eof](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_whonix-gateway-17/23) (failure)
  `AssertionError: service timeout, probably EOF wasn't transferred to...`

* TC_00_Qrexec_whonix-workstation-17: [test_053_qrexec_vm_service_eof_reverse](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_whonix-workstation-17/4) (failure)
  `AssertionError: Timeout, probably EOF wasn't transferred`

* TC_00_Qrexec_whonix-workstation-17: [test_092_qrexec_service_socket_dom0_eof_reverse](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_whonix-workstation-17/18) (failure)
  `AssertionError: service timeout, probably EOF wasn't transferred fr...`

* TC_00_Qrexec_whonix-workstation-17: [test_098_qrexec_service_socket_vm_eof](https://openqa.qubes-os.org/tests/125357#step/TC_00_Qrexec_whonix-workstation-17/23) (failure)
  `AssertionError: service timeout, probably EOF wasn't transferred to...`

This is the only qrexec PR in this test run, so the above failures seems to be regression caused by this change.

@@ -162,6 +162,11 @@ void buffer_append(struct buffer *b, const char *data, int len);
void buffer_remove(struct buffer *b, int len);
int buffer_len(struct buffer *b);
void *buffer_data(struct buffer *b);
/* Open /dev/null and keep it from being closed before the exec func is called.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it simpler (and safer) to simply open /dev/null just before dup-ing it over 0,1,2 (in the child process already)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s definitely simpler, but I didn’t want the extra call to open(). That said, giving each child process the same open file description is less than great, as it introduces shared state that should not be there. I don’t know if /dev/null has any such state, but still it isn’t great, so I went ahead and switched to your approach.

@marmarek
Copy link
Member

But also, I question usefulness of this PR as a whole - the closing of standard FDs happens in a process that has a single purpose - wait for the child process and then exit, in the very same function as closing happens. There are few PAM cleanup calls, but it's very unlikely for them to be a problem (especially, it isn't a problem now, or for the last 10 or so years).

@DemiMarie
Copy link
Contributor Author

Whether or not the last commit in the PR is merged, I definitely think the other commits should be merged. In particular, it turned out that the “close the FD” functionality had no unit tests because the unit tests took a codepath that was too different from the production code. This PR makes the production and test code follow the same path, with the result that the actual bug (/dev/null FD being closed by fix_fds()) is now caught. I think that this test improvement (and the other bug fixes) is itself useful.

There are few PAM cleanup calls, but it's very unlikely for them to be a problem (especially, it isn't a problem now, or for the last 10 or so years).

PAM cleanup calls into PAM modules, so it can do anything. I suspect Qubes OS only gets away with it because we have a fairly simple PAM stack by default. PAM cleanup is used for e.g. unmounting filesystems and closing encrypted volumes.

The best approach would be for PAM to run with stdin pointed at /dev/null and stdout and stderr pointed at the system log. The FDs would be fixed directly before executing the child process. That’s a bigger refactor, though.

@marmarek
Copy link
Member

marmarek commented Jan 12, 2025

Whether or not the last commit in the PR is merged,

Indeed I was talking about the last commit (which until the last force-push was the only commit in this PR).

PAM cleanup calls into PAM modules, so it can do anything. I suspect Qubes OS only gets away with it because we have a fairly simple PAM stack by default.

Aren't PAM modules expected handle proper logging themselves? I don't think they are supposed to touch calling process's stdin/out/err in any case. And if they would do, that likely would interfere also with cases where they aren't closed (and then replaced with with unrelated thing) - for example it could interfere with an application log file on stderr that is expected in a specific format (different than PAM messages).

@marmarek
Copy link
Member

As for the other commits, won't that have some non-trivial conflicts with #141 (which I hope is quite close to merge-able state)?

@DemiMarie
Copy link
Contributor Author

As for the other commits, won't that have some non-trivial conflicts with #141 (which I hope is quite close to merge-able state)?

I can include them in #141 or rebase this PR on top of it. I can also close this PR if you prefer, but I’d prefer that at least the bug fixes and testability changes go in.

marmarek
marmarek previously approved these changes Feb 9, 2025
@marmarek marmarek dismissed their stale review February 9, 2025 17:31

too soon

@marmarek
Copy link
Member

marmarek commented Feb 9, 2025

PipelineRetry

@marmarek
Copy link
Member

marmarek commented Feb 9, 2025

One of the tests fails, I'll retry just in case, but it might be real bug

@DemiMarie
Copy link
Contributor Author

Bug is real, but this code is just the messenger: #141 uses a shell script to run the command in the no-fork-server case but not otherwise. However, /bin/sh will fall back to interpreting the file as shell commands if execve() fails with -ENOEXEC and the file is not a binary file.

The proper solution is to write an extremely trivial C program (qrexec-exec-qubes-rpc?) that wraps exec_qubes_rpc2() and drop the last argument from that function. For this PR, though, I’ll just fixup the test to work around the issue.

Previously 'qrexec-agent --fork-server-socket' (no argument) would
segfault.
This is the convention used by the rest of qrexec.  This commit should
be backported to stable branches.
These should never happen, but call exit() if they do.
Saves an (admittedly cheap) system call.
No functional change intended.
This will be used by tests later.

No functional change intended.
This will be used by tests later.  No functional change intended.
This also fixes a bug: basename can mutate its argument, so a copy must
be passed to it.
@marmarek
Copy link
Member

Bug is real, but this code is just the messenger: #141 uses a shell script to run the command in the no-fork-server case but not otherwise. However, /bin/sh will fall back to interpreting the file as shell commands if execve() fails with -ENOEXEC and the file is not a binary file.

The proper solution is to write an extremely trivial C program (qrexec-exec-qubes-rpc?) that wraps exec_qubes_rpc2() and drop the last argument from that function. For this PR, though, I’ll just fixup the test to work around the issue.

The reason for using shell there was to load relevant environment variables (wrong PATH is what we noticed, but surely it applies to many others too, including various XDG_, DBUS_, locale etc). So, just dropping the shell will re-introduce the issue.

@marmarek
Copy link
Member

So, just dropping the shell will re-introduce the issue.

Ah, I see what you mean. Ok, I guess that should work.

char **env, const char *cmd, pid_t agent_pid)
{
/* 1 means init, which is completely wrong. */
if (agent_pid < 2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it wrong? I can imagine legitimate case of using qrexec as init (maybe some version of a stubdomain, or some other very specialized domain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qrexec forks a worker process for each connection, so the process ID to signal (to get stdout and stdin on the same stream) is never 1. If getppid() returns 1 it means the qrexec agent died and the best thing to do is to exit.

_exit(QREXEC_EXIT_PROBLEM);
}
const char *buf[] = {"sh", "-lc", "exec \"$@\"", "sh", exec_program, pid_str, prog, cmd, service_path, NULL};
/* TODO: use the user's shell not /bin/sh */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it may sounds like a smart move, it's also problematic, as the script might need to be adjusted for such shell. AFAIR "user shell" doesn't even need to be POSIX compliant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

@marmarek
Copy link
Member

black complains

This makes the unit test code more like the actual code used by
end-users, and therefore makes the tests more accurate.

This trips a bug in the code which will be fixed later, requiring a test
to be changed to compensate.
If they are closed, another file descriptor could be created with these
numbers, and so standard library functions that use them might write to
an unwanted place.  dup2() a file descriptor to /dev/null over them
instead.

Also statically initialize trigger_fd to -1, which is the conventional
value for an invalid file descriptor.

This requires care to avoid closing the file descriptor to /dev/null in
fix_fds(), which took over an hour to debug.
This makes executing a program via the shell consistent with doing so
_not_ via the shell: both handle the environment the same way, and both
produce the same error codes if the program is bad.  This also
significantly simplifiex exec_qubes_rpc2(), which doesn't need to handle
creating a shell command anymore.  It is assumed that the user's startup
files do not modify the positional parameters outside of a function.
Doing so would be extremely buggy and break the R4.2 multiplexer.  The
new code does use spaces in shell script arguments, but if the user's
startup files break when that happens, the user has other problems.

This is an ABI break for libqrexec, but that's okay in an unstable
release.  In the future, libqrexec should really be a static library.

Fixes: 8728002 (Stop using qubes-rpc-multiplexer for service calls)
@marmarek
Copy link
Member

A lot of tests has failed, see openqa report from the bot

@DemiMarie
Copy link
Contributor Author

A lot of tests has failed, see openqa report from the bot

I’ll work on fixing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants