-
-
Notifications
You must be signed in to change notification settings - Fork 348
Rare writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
failure
#2006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting! My guess is that the lack of determinism must have to do with the parallelism which is indeed configured in the tests as well. Another option is that the |
In this case, I think the case sensitivity check gave the correct result. I wouldn't be surprised if it were capable of occasionally being wrong, but I don't know how that would lead to what was seen here. Is the idea that this is a test bug? Shouldn't tests that are running in parallel and accessing files, at least if writing to files, be writing to files in separate temporary directories? Software that uses Might part of what's going on here be that, for some operations where |
No, but I see how that could be the impression. I hoped that if repeated runs could reproduce the issue, say, 1 in 100, then one could try to disable parallelism and repeat the reproduction which should then be impossible. |
To investigate GitoxideLabs#2006, this explodes the occasionally failing test `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` into 1000 identical tests, using `test_case::test_matrix`. When run in series, they all pass, but when run in parallel with `cargo nextest`, some failures always occur, at least when this is done on macOS. This is as predicted in: GitoxideLabs#2006 (comment)
To investigate GitoxideLabs#2006, this explodes the occasionally failing test `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` into 1000 identical tests, using `test_case::test_matrix`. When run in series, they all pass, but when run in parallel with `cargo nextest`, some failures always occur, at least when this is done on macOS. This is as predicted in: GitoxideLabs#2006 (comment) A more specific result in local testing on macOS 15 is that, while the number of failures is 0 with `--test-threads=1`, with 2 or more threads, the number of failures tends to *decrease* with the number of threads. The most failures occur with `--test-threads=2`, about 30 on the system tested, while `--test-threads=16` has about 10 failures.
To investigate GitoxideLabs#2006, this explodes the occasionally failing test `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` into 1000 identical tests, using `test_case::test_matrix`. When run in series, they all pass, but when run in parallel with `cargo nextest`, some failures always occur, at least when this is done on macOS. This is as predicted in: GitoxideLabs#2006 (comment) A more specific result in local testing on macOS 15 is that, while the number of failures is 0 with `--test-threads=1`, with 2 or more threads, the number of failures tends to *decrease* with the number of threads. The most failures occur with `--test-threads=2`, about 30 on the system tested, while `--test-threads=16` has about 10 failures. The tests were run with this command, with any `--test-threads=<N>` argument added: cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
Thanks. I take this to mean that you were not saying that you thought it was a test bug, rather than to mean that I am mistaken about whether separate tests perform writes in separate directories. If I am also mistaken about that, please let me know. Separate checkout directoriesMy understanding is that the
Where When running the test many times in the manner described below, in a few of the runs I simultaneously ran
But that raises the question, then, of how the rare test failure could be due to a race condition triggered by test parallelism. What am I missing here? I'm missing something, because... It only happens when tests are run in parallel
The effect you predicted here is basically what happens--repetition with parallelism is able to reproduce the failures, while repetition without parallelism does not produce it--though I did not find it quite as easy to produce as you suggested. I think the failure occurs organically far less often than once every hundred test runs. When it does, I think it must be interacting with a different test case, probably another test case also in However, what does produce it repeatably, on some systems, is to explode the test case into diff --git a/gix-worktree-state/tests/state/checkout.rs b/gix-worktree-state/tests/state/checkout.rs
index 764b52020..50ad0cd2d 100644
--- a/gix-worktree-state/tests/state/checkout.rs
+++ b/gix-worktree-state/tests/state/checkout.rs
@@ -12,6 +12,7 @@ use gix_object::{bstr::ByteSlice, Data};
use gix_testtools::tempfile::TempDir;
use gix_worktree_state::checkout::Collision;
use once_cell::sync::Lazy;
+use test_case::test_matrix;
use crate::fixture_path;
@@ -103,8 +104,8 @@ fn accidental_writes_through_symlinks_are_prevented_if_overwriting_is_forbidden(
}
}
-#[test]
-fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed() {
+#[test_matrix(0..=999)]
+fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed(_i: i32) {
let mut opts = opts_from_probe();
// with overwrite mode
opts.overwrite_existing = true; At least four identified factors affect the failures. Taking
GNU/Linux detailsOn GNU/Linux, I am completely unable to produce the failure on some systems, even with
And the GNU/Linux systems where I have managed to do it are:
But of those two, it is much easier to achieve on the Arch Linux system than the Ubuntu 24.04 LTS system.
dd if=/dev/zero of=ext4-casefold.img bs=100M count=100 status=progress
mkfs.ext4 -O casefold ext4-casefold.img
mkdir ext4-casefold
sudo mount ext4-casefold.img ext4-casefold
cd ext4-casefold
sudo chown -- "$USER" .
mkdir icase
cd icase
chattr +F .
touch a A
ls -l # Observe that there is only one file.
rm a
gh repo clone gitoxide
cd gitoxide
git switch clash
cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast --test-threads=2 If for some reason you decide to follow those steps, another remote might need to be added, or my fork cloned explicitly, in order for I will open a draft pull request on the Repeat the last command (in which cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast --test-threads=2 If that does not work with the default What failures look like when the test is duplicatedThe failures look similar on all systems, all with the same assertion message and mismatched values, whenever failures occur. Here's a failure from Arch Linux:
This gist contains a full run of that procedure on Arch Linux, the first time I did it on that system, along with some further details including potentially relevant extracts. (Conveniently, the first run of the last command happened to produce the error, that time.) But for the most part I think the information in this comment and the forthcoming investigational PR may be more useful. As noted above, producing this on macOS is significantly easier than on Windows and GNU/Linux, and I do not have full or sustained access to a macOS system. Therefore, some further testing might be challenging for me to carry out. Other findingsIn addition to the I tried inserting The These tests do couple with the outside repositoryI have not fully ruled out the possibility that interaction with the outer containing repository (i.e. the At first thought, it seems unlikely that this would be the trigger, and also like it could be tested easily by attempting to run the tests in the absence of a top-level But this is actually one of the tests that depends--or at least has depended, and I'm fairly sure still does depend--on the presence of the outer In #1687, I mentioned:
There is some further context there, as well as related discussion in comments. But I didn't look into that much more at the time, possibly since it turned out not to be a big a problem for running the test suite in a container on CI as I had expected it to be. For the test case at issue here, two kinds of failures that occurred when the outside
Edit: The investigational PR is #2008. Edit 2: Fixed position of |
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS. The `fail-fast: false` added here can be removed after testing.
Thanks so much for this extremely thorough investigation. To speed up my own processing of it, I ran it through a local Gemma 2.5 for summarisation, and then looked at the ways to reproduce the issue in the original text. AI SummaryThis document details an investigation into a rare, parallel-test-specific failure in the The Problem:
Current Understanding:
Factors Affecting Failure Rate:
GNU/Linux Specifics:
Potential Causes & Investigations:
Key Takeaway: The failure appears to be a subtle race condition triggered by parallel test execution on case-insensitive filesystems, potentially exacerbated by interactions with the outer Something I couldn't determine is if the parallelism was always controlled by the test-suite, or also by the configuration of the invocation itself. That means the status itself can be forced to use only one thread, and there it would be interesting if it still reproduces.
It would probably be useful to allow this to be overridden by an environment variable pointing to the repository to use instead. All it wants is the |
I agree with the generated summary except on two points:
The number of parallel test runs, In contrast, the number of test cases that the original case gitoxide/gix-worktree-state/tests/state/checkout.rs Lines 107 to 108 in 9c42e00
Previously, and on the main branch, #[test] fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed_0() { helper(); }
#[test] fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed_1() { helper(); }
#[test] fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed_2() { helper(); }
... Using
I don't know what you mean by "the status itself." In all of my experiments where I produced failures, they do go away completely when When the failure occurs organically, the test is not duplicated, so it must be interacting with something else, almost certainly something done as a result of another test running at the same time. It seems likely that the other test or tests that can interact to produce this are tests of Since this test failure is extremely rare -- except when deliberately produced as I did in my local testing and in #2008, using a different mechanism from whatever produces it naturally -- I don't view the failure as something that we need to work around. Instead, I'm worried that it may be caused by a bug in the code being tested. Or by "the status" do you mean the separate experiment I did less often where I ran
I'll look into this. I think that we should be able to fix that coupling with the outside However, even if that is done and it makes this failure go away, I worry that whatever concurrency-related oddity is triggering the failure may correspond to something that could happen in real-world use, in situations where we really don't want to inadvertently dereference symlinks and write to their targets. With all that said, I still don't know that interaction with an outside repository has anything at all to do with these failures, nor have I thought of a way that it would cause them. |
I see, thanks for pointing that out. My apologies for the confusion regarding the thread-limit of "the status itself" - I took a shortcut and avoided putting in reference links which I want to correct, besides the fact that this is a checkout, not a status. The worktree checkout call has options that allow to change the amount of threads to use. gitoxide/gix-worktree-state/src/checkout/mod.rs Lines 46 to 49 in 1ca6a3c
The test-case may set this value to a number higher than one. gitoxide/gix-worktree-state/tests/state/checkout.rs Lines 688 to 697 in 6f009d7
This is relying on the
The environment variable idea was more of a quick-hack to make the test. Ideally there was a way to have a test-fixture that is created once per run of the test-suite, and not once per test. All that would be needed is a directory with a
With the |
To investigate GitoxideLabs#2006, this explodes the occasionally failing test `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` into 1000 identical tests, using `test_case::test_matrix`. When run in series, they all pass, but when run in parallel with `cargo nextest`, some failures always occur, at least when this is done on macOS. This is as predicted in: GitoxideLabs#2006 (comment) A more specific result in local testing on macOS 15 is that, while the number of failures is 0 with `--test-threads=1`, with 2 or more threads, the number of failures tends to *decrease* with the number of threads. The most failures occur with `--test-threads=2`, about 30 on the system tested, while `--test-threads=16` has about 10 failures. The tests were run with this command, with any `--test-threads=<N>` argument added: cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS. The `fail-fast: false` added here can be removed after testing.
No problem! The checkout vs. status distinction was what had confused me.
As a quick (non-)update: I'm definitely going to try this out, I just haven't gotten to it yet, in part because I've been looking for a reasonably efficient (i.e. doesn't take hours per run) way to produce the failure on GNU/Linux on CI as well. That way, anything that works around or fixes it can be validated more decisively. But if I don't manage to it then that's okay and I'll proceed without it.
I'm curious: would you expect setting a I think they should have the same effect and thus that it may be sufficient for me to test only one of them, because even if But I'm not entirely sure why any of this would have an effect, since even with multithreaded checkout, the test never fails if all runs of the test are in series (even as each run is still allowed to parallelize the checkout). In any case, I shall certainly test it. |
Actually that should be the same. However, without the In general I'd assume that if
I think the scenario that I thought was untested is if |
I don't think you missed anything--I have not yet tested that. (I was just curious if, when I test that, the effect of disabling |
Wonderful, then I am extra-curious how this will turn out! |
To investigate GitoxideLabs#2006, this explodes the occasionally failing test `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` into 1000 identical tests, using `test_case::test_matrix`. When run in series, they all pass, but when run in parallel with `cargo nextest`, some failures always occur, at least when this is done on macOS. This is as predicted in: GitoxideLabs#2006 (comment) A more specific result in local testing on macOS 15 is that, while the number of failures is 0 with `--test-threads=1`, with 2 or more threads, the number of failures tends to *decrease* with the number of threads. The most failures occur with `--test-threads=2`, about 30 on the system tested, while `--test-threads=16` has about 10 failures. The tests were run with this command, with any `--test-threads=<N>` argument added: cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS. The `fail-fast: false` added here can be removed after testing.
- Test `--test-threads` operands of 1 to 4, rather than 1 to 7. - Do 7 reps instead of 16. In addition to the 1000x repetition in `checkout.rs` of the writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed test case, which in practice always surfaces GitoxideLabs#2006 on CI on macOS and Windows, the `test-ext4-casefold` job as it currenly stands seems sufficient to test whether a possible future solution would fix the bug on GNU/Linux. This looks like about the minmum amount we should reasonably verify still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no longer failing *any* jobs afterwards. But the `test-ext4-casefold` jobs are still much more than would be wanted on the main branch. Assuming the cause of the failures is the same on all systems, and the solution neither explicitly nor implicitly operates differently across systems (when comparing how it works when the filesystem is case-insensitive, that is), it may be enough on CI to regression test only macOS and Windows for GitoxideLabs#2006. Thus, this also adds a TODO comment atop the `test-ext4-casefold` definition reminding that it should not be kept, or not in full.
- Test `--test-threads` operands of 1 to 4, rather than 1 to 7. - Do 7 reps instead of 16. In addition to the 1000x repetition in `checkout.rs` of the writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and Windows without further modifications -- the `test-ext4-casefold` jobs as they currenly stand seem like they would ber sufficient to test whether a possible future solution would fix the bug on GNU/Linux. This looks like about the minmum amount we should reasonably verify still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no longer failing *any* jobs afterwards. But the `test-ext4-casefold` jobs are still much more than would be wanted on the main branch. Assuming the cause of the failures is the same on all systems, and the solution neither explicitly nor implicitly operates differently across systems (when comparing how it works when the filesystem is case-insensitive, that is), it may be enough on CI to regression test only macOS and Windows for GitoxideLabs#2006. Thus, this also adds a TODO comment atop the `test-ext4-casefold` definition reminding that it should not be kept, or not in full.
This (from #36) adds a `test-ext4-casefold` matrix job definition to `ci.yml`, which is intended probably to be temporary, to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition to macOS and Windows), when the GNU/Linux system is using an ext4 volume with a case-insensitive directory tree, i.e., where the ext4 filesystem is created with `-O casefold` and operations are performed in a directory within that filesystem where `chattr +F` has been used to enable case-folding path equivalence. The bug is nondeterministic and challenging to produce on GNU/Linux, even when the case-folding precondition is satisfied (as all the `test-ext4-casefold` jobs achieve). It is much harder to produce on GNU/Linux than on macOS or Windows. Accordingly, the number of duplicate test cases is increased 25-fold from 1000 to 25000 in those tests, and multiple jobs are created, some of them equivalent. As expected, the failures do not occur when the runner does not run tests in parallel, and they do occur most of the time when the runner runs tests in parallel with a maximum number of parallel tests set to 2 or to 4. However, unexpectedly, the failures almost never occur when the test are run in parallel with a maximum number of parallel tests set to 3. It is unclear to what extent, if any, this generalizes, but I have not observed these patterns when testing locally, including when modifying the local procedure to be more similar to the behavior here (such as by suppressing reporting of non-failing tests and suppressing the progress bar). In local testing, I am unable to produce failures on some GNU/Linux systems, but on those on which I am able to produce them, they are easier to produce with all values of the `--test-threads` operand, including 3. Not all operating systems, versions, and kernel builds and options support case-folding on ext4. But support is fairly common on Linux-based systems, and the CI runners have no problem with that. In contrast, although case-folding tmpfs also exists on Linux, it is newer and not (yet) supported as widely, and the GitHub-hosted runners do not support case-folding tmpfs. It is possible to mount a tmpfs filesystem and create an ext4 image in it so that, in principle, fewer operations need to access the disk. That is one of the approaches that was tried, but it does not appear to make it any easier to surface these failures. This squashes multiple commits. (Mistakes, and less illuminating experiments, have already been omitted; the commits being squashes here are those that are expected to be of possible interest, but whose details are almost but not quite important enough to justify the cumbersome effect of having them individually in the history.) The squashed commits' individual messages are shown below. For full changes and CI results from each commit, see: #36 --- * Add a case-folding GNU/Linux CI test job This can likely be removed later. But it might be kept if it finds anything interesting, especially if it helps test a bugfix. In that case, it may make sense to have it run only tests that are of special interest in connection with case-sensitivity. * Case-fold `$TMPDIR` as well in CI casefold test * Run only 2 parallel test processes in test-ext4-casefold This seemed to make the bug *easier* to reproduce (compared to larger values) locally, so maybe it will surface it on CI even in GNU/Linux. * Increase symlink experiment reps 10x in test-ext4-casefold From 1000 (0..=999) to 10000 (0..=9999). * Try even harder to reproduce the failure on GNU/Linux By repeating `cargo nextext` on the affected test group, up to 20 times. * Add `ubuntu-24.04-arm` to `test-ext4-casefold` job In case the failure might somehow be easier to reproduce there. * Use tmpfs casefold instead of ext4 casefold * Use tmpfs-backed ext4 as a workaround For when tmpfs does not suppport `-o casefold` in mounting. This does require that ext4 support `-O casefold`, but that was already working before. Even though that worked before, this could fail when using an ext4 image on tmpfs, because the ext4 image is smaller than it was when tmpfs was not used, in order to allow it to be created in memory. While this allows it to be created, the tests may fail if build artifacts and other files need more space than is available. * Run only the test(s) of interest In `test-linux-casefold`, this runs only the tests that are duplicates of each other, where the failure has been observed, rather than first running the whole test suite. The reason is to try to build fewer crates and test executables, since this is currenntly failing because it runs out of space on the ext4 image whose size is itself constrained by the amount of available memory for the tmpfs filesystem on which the image is created. * Don't attempt to use case-folding tmpfs But check if it is reported as available and, if so, issue a GHA warning that it should be used instead of the current way. * Try to fail at least as often, with less overhead This combines a few changes: - Only report the status of tests that actually fail. - Run the `cargo nextest` run command once, not 20 times. - Multiply the number of test cases by another factor of 10. - Stop after the first failing case. * Move the ext4 image off tmpfs This tries going back to not using tmpfs for I/O speedup, with the idea that the increased complexity and decreased flexibility might be possible to avoid now that other adjustments are made to surface the failure more reliably. * Try to fail more often, with less delay The `test-ext4-casefold` jobs were still not surfacing the error quite reliably enough. The wait was also longer than ideal. This commit makes some changes that hope to improve on both. - Decrease the number of duplicate test cases by a factor of 4. This has the effect that they are increased by 25x compared to the multiplier in the commited code, instead of 100x. - Create 10 times as many CI jobs in the matrix, by adding a `num` variable whose values are those in the range 0..=9. This variable is not actually used, it just serves to produce more CI jobs. - Don't use `Swatinem/rust-cache` in these jobs anymore. It is not obvious that it will make things work better overall, in terms of speedup, cache usage, and cache access, now that we are increasing the number of matrix jobs by a factor of 10. Moving some of the repetition into separate jobs, which may run in parallel, is not only, nor even primarily, to leverage runner parallelism to do more test runs faster. Instead, it is because it looks like some chaotic initial state may contribute to how likely the failure is to occur. No definitive experiment on CI shows this to be the case -- but in local testing, on some systems, I often have runs that fail early over and over again, and then wait a while, come back, and have many runs that don't surface the bug, then come back later and it is easy to produce again, and so on. * Further diversify runners * Undiversify runners but vary test parallelism Since varying `runs-on` didn't surface more errors, nor help smooth out inconsistencies across runs. The `*-arm` runners seem to have fewer errors with the particular way the tests are running now, so this just uses `ubuntu-latest`. This also adjusts `rep`, having it take on more values, so as to more than make up for what would otherwise be a smaller number of jobs. * Test more values for test parallelism This creates additional CI `test-ext4-casefold` jobs, for more values of the `--test-threads` operand passed to `cargo nextest`. The motivation is that the value of 3 curiously seems never to surface the failure on CI. Unlike 1, 3 should is expected to surface the failure comparably to 2 and 4, both of which do fail more often than not. Yet the jobs with `--test-threads=3` keep passing. (This is unlikely to be due to case-sensitivity misdetection, since the assertions for when the filesystem is case-sensitive would also fail, if they fire in a job where the filesystem is case-folding.) If these jobs are kept, then `rep` should probably be adjusted to have fewer, perhaps 7, values. However, this adjusts `rep` to have 16 values (i.e. twice as many as before), since the higher operands to `--test-threads` are expected to produce the failure less often. * Test fewer test parallelism values and reps - Test `--test-threads` operands of 1 to 4, rather than 1 to 7. - Do 7 reps instead of 16. In addition to the 1000x repetition in `checkout.rs` of the writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and Windows without further modifications -- the `test-ext4-casefold` jobs as they currenly stand seem like they would ber sufficient to test whether a possible future solution would fix the bug on GNU/Linux. This looks like about the minmum amount we should reasonably verify still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no longer failing *any* jobs afterwards. But the `test-ext4-casefold` jobs are still much more than would be wanted on the main branch. Assuming the cause of the failures is the same on all systems, and the solution neither explicitly nor implicitly operates differently across systems (when comparing how it works when the filesystem is case-insensitive, that is), it may be enough on CI to regression test only macOS and Windows for GitoxideLabs#2006. Thus, this also adds a TODO comment atop the `test-ext4-casefold` definition reminding that it should not be kept, or not in full.
This attempts to turn off parallelism within individual runs of writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed by passing `--no-default-features` to turn off the `gix-features-parallel` feature in `gix-worktree-state/Cargo.toml` (which is how `gix-features/parallel` is enabled in those tests). This is to check if the multithreading parallelism within the checkout operations participates in GitoxideLabs#2006. See: GitoxideLabs#2006 (comment)
This moves setting `thread_limit: Some(1)` out of being a step done with `sed` in the `test-ext4-casefold` jobs, and into being an actual change to the source code of the `checkout.rs` test module. This change is intended to be temporary. The goal is the same as before, but to observe the effect outside `text-ext4-casefold`. That is, the goal here is to temporarily see if there is a change in results in the other jobs where failure almost always occurs due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and `windows-latest`, and the `test-fixtures-windows` job. Note that, as of this change, the `gix-features/parallel` feature is only turned off in the `test-ext4-casefold` jobs. In others, that feature is still turned on, just a parallelism of 1, i.e., a single thread, is used for the checkout. But a parallelism of 1 seems usually to be special-cased to not use facilities related to multithreading. (This `thread_limit` argument change, as well as the immediately preceding change of disabling the `parallel` feature flag in the `test-ext4-casefold` test jobs, are entirely separate from how many writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed duplicated test cases there are, and also entirely separate from the operand to `--test-threads`, which actually controls the number of test *processes* `cargo nextest` creates to run the tests.)
Use single-threaded checkout in tests This change (from #37), which is intended to be temporary, sets `thread_limit: Some(1)` in the `opts_from_probe()` test helper function in the `checkout.rs` test module, so that tests including `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` will use single-threaded checkout. This applies no matter how the tests are run. In addition, for the `test-ext4-casefold` CI jobs (but not others), this adds the `--no-default-features` option to the `cargo nextest` command, so that the `gix-features-parallel` feature defined in `gix-worktree-state/Cargo.toml`, which when enabled causes `gix-features/parallel` to be enabled for the tests, is not enabled. The purpose of these changes it to examine if, when checkout operations are themselves performed without multithreading, there is any change to GitoxideLabs#2006. This does not make the failures go away. They still occur on all systems on which they had occurred before. It is unclear if it makes any difference to the failures. It seems like they may be slightly less likely to occur on GNU/Linux, but experiments so far are insufficient to confirm this. Furthermore, even if there is an effect, it may be that the effect is more subtle (such as possibly shifting the number of parallel tests where failures peak, from 2 to some higher number). The changes here should not be confused with the duplication of the test case, nor with the argument to `--test-threads`, which actually specifies the number of parallel test *processes*, since `cargo nextest` is being used. This squashes a few commits trying some minor variations. The original messages are shown below. For full changes and CI results from each commit, see: #37 --- * Try to avoid parallel checkout in `test-ext4-casefold` This attempts to turn off parallelism within individual runs of writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed by passing `--no-default-features` to turn off the `gix-features-parallel` feature in `gix-worktree-state/Cargo.toml` (which is how `gix-features/parallel` is enabled in those tests). This is to check if the multithreading parallelism within the checkout operations participates in GitoxideLabs#2006. See: GitoxideLabs#2006 (comment) * Patch the `thread_limit` argument to `Some(1)` too In the `text-ext4-casefold` jobs. * Modify the test code to use single-threaded checkout This moves setting `thread_limit: Some(1)` out of being a step done with `sed` in the `test-ext4-casefold` jobs, and into being an actual change to the source code of the `checkout.rs` test module. This change is intended to be temporary. The goal is the same as before, but to observe the effect outside `text-ext4-casefold`. That is, the goal here is to temporarily see if there is a change in results in the other jobs where failure almost always occurs due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and `windows-latest`, and the `test-fixtures-windows` job. Note that, as of this change, the `gix-features/parallel` feature is only turned off in the `test-ext4-casefold` jobs. In others, that feature is still turned on, just a parallelism of 1, i.e., a single thread, is used for the checkout. But a parallelism of 1 seems usually to be special-cased to not use facilities related to multithreading. (This `thread_limit` argument change, as well as the immediately preceding change of disabling the `parallel` feature flag in the `test-ext4-casefold` test jobs, are entirely separate from how many writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed duplicated test cases there are, and also entirely separate from the operand to `--test-threads`, which actually controls the number of test *processes* `cargo nextest` creates to run the tests.)
Per EliahKagan#27 and #2008 (comment), I believe I've made the checkout operation performed in the test single-threaded. The failure still occurs. Whether or not the checkout itself is allowed to parallelize with multiple threads, the failure:
This is what I would have expected--since the failure was observed to be triggered by running multiple tests at once and not when running only one test, I was never clear on why the failures would be expected to go away when the checkout operation performed within a test is forbidden from using its own parallelism. You may want to take a look at the change, in case I was mistaken to think that this change to gitoxide/gix-worktree-state/tests/state/checkout.rs Lines 693 to 698 in b333ef3
|
Thanks for the summary, setting the thread_limit to 1 should do the trick.
The source of the test is read-only, so whatever happens must be on the writer side. And the writer will write files in the correct order, but it will also only create new files, so case-folding filesystems should report an error if the file already exists. It's a bit unexpected that the test needs collisions to be empty, I would have thought that Maybe this another avenue - trying to understand what's really happening when 'it works', to be able to make additional assertions and maybe get closer to cracking this nut that way. |
To investigate GitoxideLabs#2006, this explodes the occasionally failing test `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` into 1000 identical tests, using `test_case::test_matrix`. When run in series, they all pass, but when run in parallel with `cargo nextest`, some failures always occur, at least when this is done on macOS. This is as predicted in: GitoxideLabs#2006 (comment) A more specific result in local testing on macOS 15 is that, while the number of failures is 0 with `--test-threads=1`, with 2 or more threads, the number of failures tends to *decrease* with the number of threads. The most failures occur with `--test-threads=2`, about 30 on the system tested, while `--test-threads=16` has about 10 failures. The tests were run with this command, with any `--test-threads=<N>` argument added: cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS. The `fail-fast: false` added here can be removed after testing.
This (from #36) adds a `test-ext4-casefold` matrix job definition to `ci.yml`, which is intended probably to be temporary, to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition to macOS and Windows), when the GNU/Linux system is using an ext4 volume with a case-insensitive directory tree, i.e., where the ext4 filesystem is created with `-O casefold` and operations are performed in a directory within that filesystem where `chattr +F` has been used to enable case-folding path equivalence. The bug is nondeterministic and challenging to produce on GNU/Linux, even when the case-folding precondition is satisfied (as all the `test-ext4-casefold` jobs achieve). It is much harder to produce on GNU/Linux than on macOS or Windows. Accordingly, the number of duplicate test cases is increased 25-fold from 1000 to 25000 in those tests, and multiple jobs are created, some of them equivalent. As expected, the failures do not occur when the runner does not run tests in parallel, and they do occur most of the time when the runner runs tests in parallel with a maximum number of parallel tests set to 2 or to 4. However, unexpectedly, the failures almost never occur when the test are run in parallel with a maximum number of parallel tests set to 3. It is unclear to what extent, if any, this generalizes, but I have not observed these patterns when testing locally, including when modifying the local procedure to be more similar to the behavior here (such as by suppressing reporting of non-failing tests and suppressing the progress bar). In local testing, I am unable to produce failures on some GNU/Linux systems, but on those on which I am able to produce them, they are easier to produce with all values of the `--test-threads` operand, including 3. Not all operating systems, versions, and kernel builds and options support case-folding on ext4. But support is fairly common on Linux-based systems, and the CI runners have no problem with that. In contrast, although case-folding tmpfs also exists on Linux, it is newer and not (yet) supported as widely, and the GitHub-hosted runners do not support case-folding tmpfs. It is possible to mount a tmpfs filesystem and create an ext4 image in it so that, in principle, fewer operations need to access the disk. That is one of the approaches that was tried, but it does not appear to make it any easier to surface these failures. This squashes multiple commits. (Mistakes, and less illuminating experiments, have already been omitted; the commits being squashes here are those that are expected to be of possible interest, but whose details are almost but not quite important enough to justify the cumbersome effect of having them individually in the history.) The squashed commits' individual messages are shown below. For full changes and CI results from each commit, see: #36 --- * Add a case-folding GNU/Linux CI test job This can likely be removed later. But it might be kept if it finds anything interesting, especially if it helps test a bugfix. In that case, it may make sense to have it run only tests that are of special interest in connection with case-sensitivity. * Case-fold `$TMPDIR` as well in CI casefold test * Run only 2 parallel test processes in test-ext4-casefold This seemed to make the bug *easier* to reproduce (compared to larger values) locally, so maybe it will surface it on CI even in GNU/Linux. * Increase symlink experiment reps 10x in test-ext4-casefold From 1000 (0..=999) to 10000 (0..=9999). * Try even harder to reproduce the failure on GNU/Linux By repeating `cargo nextext` on the affected test group, up to 20 times. * Add `ubuntu-24.04-arm` to `test-ext4-casefold` job In case the failure might somehow be easier to reproduce there. * Use tmpfs casefold instead of ext4 casefold * Use tmpfs-backed ext4 as a workaround For when tmpfs does not suppport `-o casefold` in mounting. This does require that ext4 support `-O casefold`, but that was already working before. Even though that worked before, this could fail when using an ext4 image on tmpfs, because the ext4 image is smaller than it was when tmpfs was not used, in order to allow it to be created in memory. While this allows it to be created, the tests may fail if build artifacts and other files need more space than is available. * Run only the test(s) of interest In `test-linux-casefold`, this runs only the tests that are duplicates of each other, where the failure has been observed, rather than first running the whole test suite. The reason is to try to build fewer crates and test executables, since this is currenntly failing because it runs out of space on the ext4 image whose size is itself constrained by the amount of available memory for the tmpfs filesystem on which the image is created. * Don't attempt to use case-folding tmpfs But check if it is reported as available and, if so, issue a GHA warning that it should be used instead of the current way. * Try to fail at least as often, with less overhead This combines a few changes: - Only report the status of tests that actually fail. - Run the `cargo nextest` run command once, not 20 times. - Multiply the number of test cases by another factor of 10. - Stop after the first failing case. * Move the ext4 image off tmpfs This tries going back to not using tmpfs for I/O speedup, with the idea that the increased complexity and decreased flexibility might be possible to avoid now that other adjustments are made to surface the failure more reliably. * Try to fail more often, with less delay The `test-ext4-casefold` jobs were still not surfacing the error quite reliably enough. The wait was also longer than ideal. This commit makes some changes that hope to improve on both. - Decrease the number of duplicate test cases by a factor of 4. This has the effect that they are increased by 25x compared to the multiplier in the commited code, instead of 100x. - Create 10 times as many CI jobs in the matrix, by adding a `num` variable whose values are those in the range 0..=9. This variable is not actually used, it just serves to produce more CI jobs. - Don't use `Swatinem/rust-cache` in these jobs anymore. It is not obvious that it will make things work better overall, in terms of speedup, cache usage, and cache access, now that we are increasing the number of matrix jobs by a factor of 10. Moving some of the repetition into separate jobs, which may run in parallel, is not only, nor even primarily, to leverage runner parallelism to do more test runs faster. Instead, it is because it looks like some chaotic initial state may contribute to how likely the failure is to occur. No definitive experiment on CI shows this to be the case -- but in local testing, on some systems, I often have runs that fail early over and over again, and then wait a while, come back, and have many runs that don't surface the bug, then come back later and it is easy to produce again, and so on. * Further diversify runners * Undiversify runners but vary test parallelism Since varying `runs-on` didn't surface more errors, nor help smooth out inconsistencies across runs. The `*-arm` runners seem to have fewer errors with the particular way the tests are running now, so this just uses `ubuntu-latest`. This also adjusts `rep`, having it take on more values, so as to more than make up for what would otherwise be a smaller number of jobs. * Test more values for test parallelism This creates additional CI `test-ext4-casefold` jobs, for more values of the `--test-threads` operand passed to `cargo nextest`. The motivation is that the value of 3 curiously seems never to surface the failure on CI. Unlike 1, 3 should is expected to surface the failure comparably to 2 and 4, both of which do fail more often than not. Yet the jobs with `--test-threads=3` keep passing. (This is unlikely to be due to case-sensitivity misdetection, since the assertions for when the filesystem is case-sensitive would also fail, if they fire in a job where the filesystem is case-folding.) If these jobs are kept, then `rep` should probably be adjusted to have fewer, perhaps 7, values. However, this adjusts `rep` to have 16 values (i.e. twice as many as before), since the higher operands to `--test-threads` are expected to produce the failure less often. * Test fewer test parallelism values and reps - Test `--test-threads` operands of 1 to 4, rather than 1 to 7. - Do 7 reps instead of 16. In addition to the 1000x repetition in `checkout.rs` of the writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and Windows without further modifications -- the `test-ext4-casefold` jobs as they currenly stand seem like they would ber sufficient to test whether a possible future solution would fix the bug on GNU/Linux. This looks like about the minmum amount we should reasonably verify still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no longer failing *any* jobs afterwards. But the `test-ext4-casefold` jobs are still much more than would be wanted on the main branch. Assuming the cause of the failures is the same on all systems, and the solution neither explicitly nor implicitly operates differently across systems (when comparing how it works when the filesystem is case-insensitive, that is), it may be enough on CI to regression test only macOS and Windows for GitoxideLabs#2006. Thus, this also adds a TODO comment atop the `test-ext4-casefold` definition reminding that it should not be kept, or not in full.
This change (from #37), which is intended to be temporary, sets `thread_limit: Some(1)` in the `opts_from_probe()` test helper function in the `checkout.rs` test module, so that tests including `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` will use single-threaded checkout. This applies no matter how the tests are run. In addition, for the `test-ext4-casefold` CI jobs (but not others), this adds the `--no-default-features` option to the `cargo nextest` command, so that the `gix-features-parallel` feature defined in `gix-worktree-state/Cargo.toml`, which when enabled causes `gix-features/parallel` to be enabled for the tests, is not enabled. The purpose of these changes it to examine if, when checkout operations are themselves performed without multithreading, there is any change to GitoxideLabs#2006. This does not make the failures go away. They still occur on all systems on which they had occurred before. It is unclear if it makes any difference to the failures. It seems like they may be slightly less likely to occur on GNU/Linux, but experiments so far are insufficient to confirm this. Furthermore, even if there is an effect, it may be that the effect is more subtle (such as possibly shifting the number of parallel tests where failures peak, from 2 to some higher number). The changes here should not be confused with the duplication of the test case, nor with the argument to `--test-threads`, which actually specifies the number of parallel test *processes*, since `cargo nextest` is being used. This squashes a few commits trying some minor variations. The original messages are shown below. For full changes and CI results from each commit, see: #37 --- * Try to avoid parallel checkout in `test-ext4-casefold` This attempts to turn off parallelism within individual runs of writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed by passing `--no-default-features` to turn off the `gix-features-parallel` feature in `gix-worktree-state/Cargo.toml` (which is how `gix-features/parallel` is enabled in those tests). This is to check if the multithreading parallelism within the checkout operations participates in GitoxideLabs#2006. See: GitoxideLabs#2006 (comment) * Patch the `thread_limit` argument to `Some(1)` too In the `text-ext4-casefold` jobs. * Modify the test code to use single-threaded checkout This moves setting `thread_limit: Some(1)` out of being a step done with `sed` in the `test-ext4-casefold` jobs, and into being an actual change to the source code of the `checkout.rs` test module. This change is intended to be temporary. The goal is the same as before, but to observe the effect outside `text-ext4-casefold`. That is, the goal here is to temporarily see if there is a change in results in the other jobs where failure almost always occurs due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and `windows-latest`, and the `test-fixtures-windows` job. Note that, as of this change, the `gix-features/parallel` feature is only turned off in the `test-ext4-casefold` jobs. In others, that feature is still turned on, just a parallelism of 1, i.e., a single thread, is used for the checkout. But a parallelism of 1 seems usually to be special-cased to not use facilities related to multithreading. (This `thread_limit` argument change, as well as the immediately preceding change of disabling the `parallel` feature flag in the `test-ext4-casefold` test jobs, are entirely separate from how many writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed duplicated test cases there are, and also entirely separate from the operand to `--test-threads`, which actually controls the number of test *processes* `cargo nextest` creates to run the tests.)
To investigate GitoxideLabs#2006, this explodes the occasionally failing test `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` into 1000 identical tests, using `test_case::test_matrix`. When run in series, they all pass, but when run in parallel with `cargo nextest`, some failures always occur, at least when this is done on macOS. This is as predicted in: GitoxideLabs#2006 (comment) A more specific result in local testing on macOS 15 is that, while the number of failures is 0 with `--test-threads=1`, with 2 or more threads, the number of failures tends to *decrease* with the number of threads. The most failures occur with `--test-threads=2`, about 30 on the system tested, while `--test-threads=16` has about 10 failures. The tests were run with this command, with any `--test-threads=<N>` argument added: cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS. The `fail-fast: false` added here can be removed after testing.
This (from #36) adds a `test-ext4-casefold` matrix job definition to `ci.yml`, which is intended probably to be temporary, to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition to macOS and Windows), when the GNU/Linux system is using an ext4 volume with a case-insensitive directory tree, i.e., where the ext4 filesystem is created with `-O casefold` and operations are performed in a directory within that filesystem where `chattr +F` has been used to enable case-folding path equivalence. The bug is nondeterministic and challenging to produce on GNU/Linux, even when the case-folding precondition is satisfied (as all the `test-ext4-casefold` jobs achieve). It is much harder to produce on GNU/Linux than on macOS or Windows. Accordingly, the number of duplicate test cases is increased 25-fold from 1000 to 25000 in those tests, and multiple jobs are created, some of them equivalent. As expected, the failures do not occur when the runner does not run tests in parallel, and they do occur most of the time when the runner runs tests in parallel with a maximum number of parallel tests set to 2 or to 4. However, unexpectedly, the failures almost never occur when the test are run in parallel with a maximum number of parallel tests set to 3. It is unclear to what extent, if any, this generalizes, but I have not observed these patterns when testing locally, including when modifying the local procedure to be more similar to the behavior here (such as by suppressing reporting of non-failing tests and suppressing the progress bar). In local testing, I am unable to produce failures on some GNU/Linux systems, but on those on which I am able to produce them, they are easier to produce with all values of the `--test-threads` operand, including 3. Not all operating systems, versions, and kernel builds and options support case-folding on ext4. But support is fairly common on Linux-based systems, and the CI runners have no problem with that. In contrast, although case-folding tmpfs also exists on Linux, it is newer and not (yet) supported as widely, and the GitHub-hosted runners do not support case-folding tmpfs. It is possible to mount a tmpfs filesystem and create an ext4 image in it so that, in principle, fewer operations need to access the disk. That is one of the approaches that was tried, but it does not appear to make it any easier to surface these failures. This squashes multiple commits. (Mistakes, and less illuminating experiments, have already been omitted; the commits being squashes here are those that are expected to be of possible interest, but whose details are almost but not quite important enough to justify the cumbersome effect of having them individually in the history.) The squashed commits' individual messages are shown below. For full changes and CI results from each commit, see: #36 --- * Add a case-folding GNU/Linux CI test job This can likely be removed later. But it might be kept if it finds anything interesting, especially if it helps test a bugfix. In that case, it may make sense to have it run only tests that are of special interest in connection with case-sensitivity. * Case-fold `$TMPDIR` as well in CI casefold test * Run only 2 parallel test processes in test-ext4-casefold This seemed to make the bug *easier* to reproduce (compared to larger values) locally, so maybe it will surface it on CI even in GNU/Linux. * Increase symlink experiment reps 10x in test-ext4-casefold From 1000 (0..=999) to 10000 (0..=9999). * Try even harder to reproduce the failure on GNU/Linux By repeating `cargo nextext` on the affected test group, up to 20 times. * Add `ubuntu-24.04-arm` to `test-ext4-casefold` job In case the failure might somehow be easier to reproduce there. * Use tmpfs casefold instead of ext4 casefold * Use tmpfs-backed ext4 as a workaround For when tmpfs does not suppport `-o casefold` in mounting. This does require that ext4 support `-O casefold`, but that was already working before. Even though that worked before, this could fail when using an ext4 image on tmpfs, because the ext4 image is smaller than it was when tmpfs was not used, in order to allow it to be created in memory. While this allows it to be created, the tests may fail if build artifacts and other files need more space than is available. * Run only the test(s) of interest In `test-linux-casefold`, this runs only the tests that are duplicates of each other, where the failure has been observed, rather than first running the whole test suite. The reason is to try to build fewer crates and test executables, since this is currenntly failing because it runs out of space on the ext4 image whose size is itself constrained by the amount of available memory for the tmpfs filesystem on which the image is created. * Don't attempt to use case-folding tmpfs But check if it is reported as available and, if so, issue a GHA warning that it should be used instead of the current way. * Try to fail at least as often, with less overhead This combines a few changes: - Only report the status of tests that actually fail. - Run the `cargo nextest` run command once, not 20 times. - Multiply the number of test cases by another factor of 10. - Stop after the first failing case. * Move the ext4 image off tmpfs This tries going back to not using tmpfs for I/O speedup, with the idea that the increased complexity and decreased flexibility might be possible to avoid now that other adjustments are made to surface the failure more reliably. * Try to fail more often, with less delay The `test-ext4-casefold` jobs were still not surfacing the error quite reliably enough. The wait was also longer than ideal. This commit makes some changes that hope to improve on both. - Decrease the number of duplicate test cases by a factor of 4. This has the effect that they are increased by 25x compared to the multiplier in the commited code, instead of 100x. - Create 10 times as many CI jobs in the matrix, by adding a `num` variable whose values are those in the range 0..=9. This variable is not actually used, it just serves to produce more CI jobs. - Don't use `Swatinem/rust-cache` in these jobs anymore. It is not obvious that it will make things work better overall, in terms of speedup, cache usage, and cache access, now that we are increasing the number of matrix jobs by a factor of 10. Moving some of the repetition into separate jobs, which may run in parallel, is not only, nor even primarily, to leverage runner parallelism to do more test runs faster. Instead, it is because it looks like some chaotic initial state may contribute to how likely the failure is to occur. No definitive experiment on CI shows this to be the case -- but in local testing, on some systems, I often have runs that fail early over and over again, and then wait a while, come back, and have many runs that don't surface the bug, then come back later and it is easy to produce again, and so on. * Further diversify runners * Undiversify runners but vary test parallelism Since varying `runs-on` didn't surface more errors, nor help smooth out inconsistencies across runs. The `*-arm` runners seem to have fewer errors with the particular way the tests are running now, so this just uses `ubuntu-latest`. This also adjusts `rep`, having it take on more values, so as to more than make up for what would otherwise be a smaller number of jobs. * Test more values for test parallelism This creates additional CI `test-ext4-casefold` jobs, for more values of the `--test-threads` operand passed to `cargo nextest`. The motivation is that the value of 3 curiously seems never to surface the failure on CI. Unlike 1, 3 should is expected to surface the failure comparably to 2 and 4, both of which do fail more often than not. Yet the jobs with `--test-threads=3` keep passing. (This is unlikely to be due to case-sensitivity misdetection, since the assertions for when the filesystem is case-sensitive would also fail, if they fire in a job where the filesystem is case-folding.) If these jobs are kept, then `rep` should probably be adjusted to have fewer, perhaps 7, values. However, this adjusts `rep` to have 16 values (i.e. twice as many as before), since the higher operands to `--test-threads` are expected to produce the failure less often. * Test fewer test parallelism values and reps - Test `--test-threads` operands of 1 to 4, rather than 1 to 7. - Do 7 reps instead of 16. In addition to the 1000x repetition in `checkout.rs` of the writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and Windows without further modifications -- the `test-ext4-casefold` jobs as they currenly stand seem like they would ber sufficient to test whether a possible future solution would fix the bug on GNU/Linux. This looks like about the minmum amount we should reasonably verify still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no longer failing *any* jobs afterwards. But the `test-ext4-casefold` jobs are still much more than would be wanted on the main branch. Assuming the cause of the failures is the same on all systems, and the solution neither explicitly nor implicitly operates differently across systems (when comparing how it works when the filesystem is case-insensitive, that is), it may be enough on CI to regression test only macOS and Windows for GitoxideLabs#2006. Thus, this also adds a TODO comment atop the `test-ext4-casefold` definition reminding that it should not be kept, or not in full.
This change (from #37), which is intended to be temporary, sets `thread_limit: Some(1)` in the `opts_from_probe()` test helper function in the `checkout.rs` test module, so that tests including `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` will use single-threaded checkout. This applies no matter how the tests are run. In addition, for the `test-ext4-casefold` CI jobs (but not others), this adds the `--no-default-features` option to the `cargo nextest` command, so that the `gix-features-parallel` feature defined in `gix-worktree-state/Cargo.toml`, which when enabled causes `gix-features/parallel` to be enabled for the tests, is not enabled. The purpose of these changes it to examine if, when checkout operations are themselves performed without multithreading, there is any change to GitoxideLabs#2006. This does not make the failures go away. They still occur on all systems on which they had occurred before. It is unclear if it makes any difference to the failures. It seems like they may be slightly less likely to occur on GNU/Linux, but experiments so far are insufficient to confirm this. Furthermore, even if there is an effect, it may be that the effect is more subtle (such as possibly shifting the number of parallel tests where failures peak, from 2 to some higher number). The changes here should not be confused with the duplication of the test case, nor with the argument to `--test-threads`, which actually specifies the number of parallel test *processes*, since `cargo nextest` is being used. This squashes a few commits trying some minor variations. The original messages are shown below. For full changes and CI results from each commit, see: #37 --- * Try to avoid parallel checkout in `test-ext4-casefold` This attempts to turn off parallelism within individual runs of writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed by passing `--no-default-features` to turn off the `gix-features-parallel` feature in `gix-worktree-state/Cargo.toml` (which is how `gix-features/parallel` is enabled in those tests). This is to check if the multithreading parallelism within the checkout operations participates in GitoxideLabs#2006. See: GitoxideLabs#2006 (comment) * Patch the `thread_limit` argument to `Some(1)` too In the `text-ext4-casefold` jobs. * Modify the test code to use single-threaded checkout This moves setting `thread_limit: Some(1)` out of being a step done with `sed` in the `test-ext4-casefold` jobs, and into being an actual change to the source code of the `checkout.rs` test module. This change is intended to be temporary. The goal is the same as before, but to observe the effect outside `text-ext4-casefold`. That is, the goal here is to temporarily see if there is a change in results in the other jobs where failure almost always occurs due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and `windows-latest`, and the `test-fixtures-windows` job. Note that, as of this change, the `gix-features/parallel` feature is only turned off in the `test-ext4-casefold` jobs. In others, that feature is still turned on, just a parallelism of 1, i.e., a single thread, is used for the checkout. But a parallelism of 1 seems usually to be special-cased to not use facilities related to multithreading. (This `thread_limit` argument change, as well as the immediately preceding change of disabling the `parallel` feature flag in the `test-ext4-casefold` test jobs, are entirely separate from how many writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed duplicated test cases there are, and also entirely separate from the operand to `--test-threads`, which actually controls the number of test *processes* `cargo nextest` creates to run the tests.)
To investigate GitoxideLabs#2006, this explodes the occasionally failing test `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` into 1000 identical tests, using `test_case::test_matrix`. When run in series, they all pass, but when run in parallel with `cargo nextest`, some failures always occur, at least when this is done on macOS. This is as predicted in: GitoxideLabs#2006 (comment) A more specific result in local testing on macOS 15 is that, while the number of failures is 0 with `--test-threads=1`, with 2 or more threads, the number of failures tends to *decrease* with the number of threads. The most failures occur with `--test-threads=2`, about 30 on the system tested, while `--test-threads=16` has about 10 failures. The tests were run with this command, with any `--test-threads=<N>` argument added: cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS. The `fail-fast: false` added here can be removed after testing.
This (from #36) adds a `test-ext4-casefold` matrix job definition to `ci.yml`, which is intended probably to be temporary, to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition to macOS and Windows), when the GNU/Linux system is using an ext4 volume with a case-insensitive directory tree, i.e., where the ext4 filesystem is created with `-O casefold` and operations are performed in a directory within that filesystem where `chattr +F` has been used to enable case-folding path equivalence. The bug is nondeterministic and challenging to produce on GNU/Linux, even when the case-folding precondition is satisfied (as all the `test-ext4-casefold` jobs achieve). It is much harder to produce on GNU/Linux than on macOS or Windows. Accordingly, the number of duplicate test cases is increased 25-fold from 1000 to 25000 in those tests, and multiple jobs are created, some of them equivalent. As expected, the failures do not occur when the runner does not run tests in parallel, and they do occur most of the time when the runner runs tests in parallel with a maximum number of parallel tests set to 2 or to 4. However, unexpectedly, the failures almost never occur when the test are run in parallel with a maximum number of parallel tests set to 3. It is unclear to what extent, if any, this generalizes, but I have not observed these patterns when testing locally, including when modifying the local procedure to be more similar to the behavior here (such as by suppressing reporting of non-failing tests and suppressing the progress bar). In local testing, I am unable to produce failures on some GNU/Linux systems, but on those on which I am able to produce them, they are easier to produce with all values of the `--test-threads` operand, including 3. Not all operating systems, versions, and kernel builds and options support case-folding on ext4. But support is fairly common on Linux-based systems, and the CI runners have no problem with that. In contrast, although case-folding tmpfs also exists on Linux, it is newer and not (yet) supported as widely, and the GitHub-hosted runners do not support case-folding tmpfs. It is possible to mount a tmpfs filesystem and create an ext4 image in it so that, in principle, fewer operations need to access the disk. That is one of the approaches that was tried, but it does not appear to make it any easier to surface these failures. This squashes multiple commits. (Mistakes, and less illuminating experiments, have already been omitted; the commits being squashes here are those that are expected to be of possible interest, but whose details are almost but not quite important enough to justify the cumbersome effect of having them individually in the history.) The squashed commits' individual messages are shown below. For full changes and CI results from each commit, see: #36 --- * Add a case-folding GNU/Linux CI test job This can likely be removed later. But it might be kept if it finds anything interesting, especially if it helps test a bugfix. In that case, it may make sense to have it run only tests that are of special interest in connection with case-sensitivity. * Case-fold `$TMPDIR` as well in CI casefold test * Run only 2 parallel test processes in test-ext4-casefold This seemed to make the bug *easier* to reproduce (compared to larger values) locally, so maybe it will surface it on CI even in GNU/Linux. * Increase symlink experiment reps 10x in test-ext4-casefold From 1000 (0..=999) to 10000 (0..=9999). * Try even harder to reproduce the failure on GNU/Linux By repeating `cargo nextext` on the affected test group, up to 20 times. * Add `ubuntu-24.04-arm` to `test-ext4-casefold` job In case the failure might somehow be easier to reproduce there. * Use tmpfs casefold instead of ext4 casefold * Use tmpfs-backed ext4 as a workaround For when tmpfs does not suppport `-o casefold` in mounting. This does require that ext4 support `-O casefold`, but that was already working before. Even though that worked before, this could fail when using an ext4 image on tmpfs, because the ext4 image is smaller than it was when tmpfs was not used, in order to allow it to be created in memory. While this allows it to be created, the tests may fail if build artifacts and other files need more space than is available. * Run only the test(s) of interest In `test-linux-casefold`, this runs only the tests that are duplicates of each other, where the failure has been observed, rather than first running the whole test suite. The reason is to try to build fewer crates and test executables, since this is currenntly failing because it runs out of space on the ext4 image whose size is itself constrained by the amount of available memory for the tmpfs filesystem on which the image is created. * Don't attempt to use case-folding tmpfs But check if it is reported as available and, if so, issue a GHA warning that it should be used instead of the current way. * Try to fail at least as often, with less overhead This combines a few changes: - Only report the status of tests that actually fail. - Run the `cargo nextest` run command once, not 20 times. - Multiply the number of test cases by another factor of 10. - Stop after the first failing case. * Move the ext4 image off tmpfs This tries going back to not using tmpfs for I/O speedup, with the idea that the increased complexity and decreased flexibility might be possible to avoid now that other adjustments are made to surface the failure more reliably. * Try to fail more often, with less delay The `test-ext4-casefold` jobs were still not surfacing the error quite reliably enough. The wait was also longer than ideal. This commit makes some changes that hope to improve on both. - Decrease the number of duplicate test cases by a factor of 4. This has the effect that they are increased by 25x compared to the multiplier in the commited code, instead of 100x. - Create 10 times as many CI jobs in the matrix, by adding a `num` variable whose values are those in the range 0..=9. This variable is not actually used, it just serves to produce more CI jobs. - Don't use `Swatinem/rust-cache` in these jobs anymore. It is not obvious that it will make things work better overall, in terms of speedup, cache usage, and cache access, now that we are increasing the number of matrix jobs by a factor of 10. Moving some of the repetition into separate jobs, which may run in parallel, is not only, nor even primarily, to leverage runner parallelism to do more test runs faster. Instead, it is because it looks like some chaotic initial state may contribute to how likely the failure is to occur. No definitive experiment on CI shows this to be the case -- but in local testing, on some systems, I often have runs that fail early over and over again, and then wait a while, come back, and have many runs that don't surface the bug, then come back later and it is easy to produce again, and so on. * Further diversify runners * Undiversify runners but vary test parallelism Since varying `runs-on` didn't surface more errors, nor help smooth out inconsistencies across runs. The `*-arm` runners seem to have fewer errors with the particular way the tests are running now, so this just uses `ubuntu-latest`. This also adjusts `rep`, having it take on more values, so as to more than make up for what would otherwise be a smaller number of jobs. * Test more values for test parallelism This creates additional CI `test-ext4-casefold` jobs, for more values of the `--test-threads` operand passed to `cargo nextest`. The motivation is that the value of 3 curiously seems never to surface the failure on CI. Unlike 1, 3 should is expected to surface the failure comparably to 2 and 4, both of which do fail more often than not. Yet the jobs with `--test-threads=3` keep passing. (This is unlikely to be due to case-sensitivity misdetection, since the assertions for when the filesystem is case-sensitive would also fail, if they fire in a job where the filesystem is case-folding.) If these jobs are kept, then `rep` should probably be adjusted to have fewer, perhaps 7, values. However, this adjusts `rep` to have 16 values (i.e. twice as many as before), since the higher operands to `--test-threads` are expected to produce the failure less often. * Test fewer test parallelism values and reps - Test `--test-threads` operands of 1 to 4, rather than 1 to 7. - Do 7 reps instead of 16. In addition to the 1000x repetition in `checkout.rs` of the writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and Windows without further modifications -- the `test-ext4-casefold` jobs as they currenly stand seem like they would ber sufficient to test whether a possible future solution would fix the bug on GNU/Linux. This looks like about the minmum amount we should reasonably verify still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no longer failing *any* jobs afterwards. But the `test-ext4-casefold` jobs are still much more than would be wanted on the main branch. Assuming the cause of the failures is the same on all systems, and the solution neither explicitly nor implicitly operates differently across systems (when comparing how it works when the filesystem is case-insensitive, that is), it may be enough on CI to regression test only macOS and Windows for GitoxideLabs#2006. Thus, this also adds a TODO comment atop the `test-ext4-casefold` definition reminding that it should not be kept, or not in full.
This change (from #37), which is intended to be temporary, sets `thread_limit: Some(1)` in the `opts_from_probe()` test helper function in the `checkout.rs` test module, so that tests including `writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed` will use single-threaded checkout. This applies no matter how the tests are run. In addition, for the `test-ext4-casefold` CI jobs (but not others), this adds the `--no-default-features` option to the `cargo nextest` command, so that the `gix-features-parallel` feature defined in `gix-worktree-state/Cargo.toml`, which when enabled causes `gix-features/parallel` to be enabled for the tests, is not enabled. The purpose of these changes it to examine if, when checkout operations are themselves performed without multithreading, there is any change to GitoxideLabs#2006. This does not make the failures go away. They still occur on all systems on which they had occurred before. It is unclear if it makes any difference to the failures. It seems like they may be slightly less likely to occur on GNU/Linux, but experiments so far are insufficient to confirm this. Furthermore, even if there is an effect, it may be that the effect is more subtle (such as possibly shifting the number of parallel tests where failures peak, from 2 to some higher number). The changes here should not be confused with the duplication of the test case, nor with the argument to `--test-threads`, which actually specifies the number of parallel test *processes*, since `cargo nextest` is being used. This squashes a few commits trying some minor variations. The original messages are shown below. For full changes and CI results from each commit, see: #37 --- * Try to avoid parallel checkout in `test-ext4-casefold` This attempts to turn off parallelism within individual runs of writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed by passing `--no-default-features` to turn off the `gix-features-parallel` feature in `gix-worktree-state/Cargo.toml` (which is how `gix-features/parallel` is enabled in those tests). This is to check if the multithreading parallelism within the checkout operations participates in GitoxideLabs#2006. See: GitoxideLabs#2006 (comment) * Patch the `thread_limit` argument to `Some(1)` too In the `text-ext4-casefold` jobs. * Modify the test code to use single-threaded checkout This moves setting `thread_limit: Some(1)` out of being a step done with `sed` in the `test-ext4-casefold` jobs, and into being an actual change to the source code of the `checkout.rs` test module. This change is intended to be temporary. The goal is the same as before, but to observe the effect outside `text-ext4-casefold`. That is, the goal here is to temporarily see if there is a change in results in the other jobs where failure almost always occurs due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and `windows-latest`, and the `test-fixtures-windows` job. Note that, as of this change, the `gix-features/parallel` feature is only turned off in the `test-ext4-casefold` jobs. In others, that feature is still turned on, just a parallelism of 1, i.e., a single thread, is used for the checkout. But a parallelism of 1 seems usually to be special-cased to not use facilities related to multithreading. (This `thread_limit` argument change, as well as the immediately preceding change of disabling the `parallel` feature flag in the `test-ext4-casefold` test jobs, are entirely separate from how many writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed duplicated test cases there are, and also entirely separate from the operand to `--test-threads`, which actually controls the number of test *processes* `cargo nextest` creates to run the tests.)
Current behavior 😯
The test failure
Yesterday, I got a failure of
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
on CI in a macOS job run in my fork:Changes on the branch
This job was run on e0a8dfe in EliahKagan#31, which is a Dependabot PR. (I sometimes have open Dependabot PRs in my fork, to pre-test possible dependency versions.)
But it seems to me that the failure is probably not related to any upgraded dependencies. I ran the workflow four more times, and neither that test, nor any others in that job, nor any other job in the workflow, failed in any other runs. Also...
Vague recollections
The diff of this failure, where the capitalization of
FAKE-FILE
changes, looks very familiar to me. I think I have seen it on CI before. That seems strange: considering the importance of the test, one would think I would've brought it up, had I seen it before--in case it was not a false positive.It could be that this looks familiar because it is a form a true failure could take, which I had seen at some point in the past when experimenting with the test. I wish I remembered.
I am for some reason reminded of #1832. That relates to a different test case. But maybe something related to this was going on at the same time? Sorry about the vagueness of these guesses.
Searching for the failure finds only #960. But the failure there was different--it did not have the same value as is found in the assertion here.
Case-insensitivity
The assertion that failed here was:
gitoxide/gix-worktree-state/tests/state/checkout.rs
Lines 122 to 130 in 9d0c809
Which assertions run in that test case is determined by the value of
opts.fs.ignore_case
. But I expect this typically to betrue
on macOS, and this assertion that failed is in thetrue
branch of that check. Accordingly, I don't think the problem is due to a failure to detect whether the filesystem is case-sensitive.Expected behavior 🤔
The test should always pass.
Git behavior
I'm not sure if there's a specific relevant Git behavior to look at here. But both Git and gitoxide generally avoid dereferencing and writing through symlinks. This test attempts to check that gitoxide does that.
Either there is some circumstance under which gitoxide fails to avoid that (and this circumstance can occur on macOS), or there is a bug in the test or something the test uses, or there was a problem on the runner that somehow caused the filename's case to change without causing any other tests in the run to fail. (All these possibilities are, of course, unexpected.)
Steps to reproduce 🕹
Unfortunately, I don't know to intentionally reproduce this.
The text was updated successfully, but these errors were encountered: