Skip to content

Rare writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed failure #2006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
EliahKagan opened this issue May 9, 2025 · 13 comments
Labels
acknowledged an issue is accepted as shortcoming to be fixed

Comments

@EliahKagan
Copy link
Member

Current behavior 😯

The test failure

Yesterday, I got a failure of writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed on CI in a macOS job run in my fork:

        FAIL [   0.010s] gix-worktree-state-tests::worktree state::checkout::writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
  stdout ───

    running 1 test
    test state::checkout::writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed ... FAILED

    failures:

    failures:
        state::checkout::writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed

    test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 14 filtered out; finished in 0.00s
    
  stderr ───

    thread 'state::checkout::writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed' panicked at gix-worktree-state/tests/state/checkout.rs:122:9:
    assertion `left == right` failed
      left: ["A-dir/a", "A-file", "FAKE-DIR", "fake-file"]
     right: ["A-dir/a", "A-file", "FAKE-DIR", "FAKE-FILE"]
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Changes on the branch

This job was run on e0a8dfe in EliahKagan#31, which is a Dependabot PR. (I sometimes have open Dependabot PRs in my fork, to pre-test possible dependency versions.)

But it seems to me that the failure is probably not related to any upgraded dependencies. I ran the workflow four more times, and neither that test, nor any others in that job, nor any other job in the workflow, failed in any other runs. Also...

Vague recollections

The diff of this failure, where the capitalization of FAKE-FILE changes, looks very familiar to me. I think I have seen it on CI before. That seems strange: considering the importance of the test, one would think I would've brought it up, had I seen it before--in case it was not a false positive.

It could be that this looks familiar because it is a form a true failure could take, which I had seen at some point in the past when experimenting with the test. I wish I remembered.

I am for some reason reminded of #1832. That relates to a different test case. But maybe something related to this was going on at the same time? Sorry about the vagueness of these guesses.

Searching for the failure finds only #960. But the failure there was different--it did not have the same value as is found in the assertion here.

Case-insensitivity

The assertion that failed here was:

assert_eq!(
stripped_prefix(&destination, &worktree_files),
paths([
if cfg!(windows) { r"A-dir\a" } else { "A-dir/a" },
"A-file",
"FAKE-DIR",
"FAKE-FILE"
]),
);

Which assertions run in that test case is determined by the value of opts.fs.ignore_case. But I expect this typically to be true on macOS, and this assertion that failed is in the true branch of that check. Accordingly, I don't think the problem is due to a failure to detect whether the filesystem is case-sensitive.

Expected behavior 🤔

The test should always pass.

Git behavior

I'm not sure if there's a specific relevant Git behavior to look at here. But both Git and gitoxide generally avoid dereferencing and writing through symlinks. This test attempts to check that gitoxide does that.

Either there is some circumstance under which gitoxide fails to avoid that (and this circumstance can occur on macOS), or there is a bug in the test or something the test uses, or there was a problem on the runner that somehow caused the filename's case to change without causing any other tests in the run to fail. (All these possibilities are, of course, unexpected.)

Steps to reproduce 🕹

Unfortunately, I don't know to intentionally reproduce this.

@Byron Byron added the acknowledged an issue is accepted as shortcoming to be fixed label May 9, 2025
@Byron
Copy link
Member

Byron commented May 9, 2025

Thanks for reporting!

My guess is that the lack of determinism must have to do with the parallelism which is indeed configured in the tests as well.
My hypothesis is that the issue reproduces if the test is run repeatedly. Maybe the issue wouldn't resurface if parallelism was disallowed.

Another option is that the opts_from_probe returns different results due to race conditions, even though I thought these were fixed and also think that the case-sensitivity check is not racy by nature.

@EliahKagan
Copy link
Member Author

In this case, I think the case sensitivity check gave the correct result. I wouldn't be surprised if it were capable of occasionally being wrong, but I don't know how that would lead to what was seen here.

Is the idea that this is a test bug? Shouldn't tests that are running in parallel and accessing files, at least if writing to files, be writing to files in separate temporary directories?

Software that uses gix-worktree-state could be running in parallel in production. So if writing through symlinks in separate directories at the same time could result in wrongly dereferencing and writing through a symlink, I would expect that to be able to happen in production too. But my guess is that this is not what you're describing.

Might part of what's going on here be that, for some operations where git always replaces a file, we open and overwrite the file, which could, potentially, in some race conditions, result in overwriting a file through a symlink? (But if so, that would require concurrent unrelated operations on the same repository, wouldn't it?)

@Byron
Copy link
Member

Byron commented May 9, 2025

Is the idea that this is a test bug? Shouldn't tests that are running in parallel and accessing files, at least if writing to files, be writing to files in separate temporary directories?

No, but I see how that could be the impression. I hoped that if repeated runs could reproduce the issue, say, 1 in 100, then one could try to disable parallelism and repeat the reproduction which should then be impossible.
This would tell us it's related to parallelism, and we could see if and how this can be addressed.
It is my expectation that the code can run operations in any order, just as long as the final result is always the same. It's a constraint which might not even be possible to uphold, but if it isn't that would explain why Git only checks out files with a single thread. Initially I thought it's hard to do with their codebase, but maybe it's for correctness in these cases.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 10, 2025
To investigate GitoxideLabs#2006, this explodes the occasionally failing test
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
into 1000 identical tests, using `test_case::test_matrix`.

When run in series, they all pass, but when run in parallel with
`cargo nextest`, some failures always occur, at least when this is
done on macOS.

This is as predicted in:
GitoxideLabs#2006 (comment)
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 10, 2025
To investigate GitoxideLabs#2006, this explodes the occasionally failing test
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
into 1000 identical tests, using `test_case::test_matrix`.

When run in series, they all pass, but when run in parallel with
`cargo nextest`, some failures always occur, at least when this is
done on macOS. This is as predicted in:
GitoxideLabs#2006 (comment)

A more specific result in local testing on macOS 15 is that, while
the number of failures is 0 with `--test-threads=1`, with 2 or more
threads, the number of failures tends to *decrease* with the number
of threads. The most failures occur with `--test-threads=2`, about
30 on the system tested, while `--test-threads=16` has about 10
failures.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 10, 2025
To investigate GitoxideLabs#2006, this explodes the occasionally failing test
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
into 1000 identical tests, using `test_case::test_matrix`.

When run in series, they all pass, but when run in parallel with
`cargo nextest`, some failures always occur, at least when this is
done on macOS. This is as predicted in:
GitoxideLabs#2006 (comment)

A more specific result in local testing on macOS 15 is that, while
the number of failures is 0 with `--test-threads=1`, with 2 or more
threads, the number of failures tends to *decrease* with the number
of threads. The most failures occur with `--test-threads=2`, about
30 on the system tested, while `--test-threads=16` has about 10
failures.

The tests were run with this command, with any `--test-threads=<N>`
argument added:

    cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
@EliahKagan
Copy link
Member Author

EliahKagan commented May 10, 2025

No, but I see how that could be the impression.

Thanks. I take this to mean that you were not saying that you thought it was a test bug, rather than to mean that I am mistaken about whether separate tests perform writes in separate directories. If I am also mistaken about that, please let me know.

Separate checkout directories

My understanding is that the gix-worktree-state checkout tests use temporary directories inside the current directory, but that they are separate directories per test case, regardless of whether, or how, test cases are run in parallel. Specifically, I think it comes down to:

let destination = gix_testtools::tempfile::tempdir_in(std::env::current_dir()?)?;

Where tempdir_in creates a new temporary subdirectory, and does so in a way that is effectively atomic, in the sense that two concurrently running tests will not proceed to use the same subdirectory even in the event that their PRNGs were to generate the same directory name. (That is, my understanding is that they will attempt to create the directory and the operating system will allow at most one process to create it, then any others will retry with another name.)

When running the test many times in the manner described below, in a few of the runs I simultaneously ran git status --short repeatedly to get a sampling of the temporary subdirectories being created--where here git status is operating on the gitoxide repository itself--to verify at least that multiple directories were used with distinct names. That seemed to be so. From one such run:

$ bash -c 'for i in {1..1000000}; do sleep 0.01; git status --short; done'
?? gix-worktree-state/tests/.tmpXcu8zW/
?? gix-worktree-state/tests/make_dangerous_symlink.lock
warning: could not open directory 'gix-worktree-state/tests/.tmppHCjNo/A-dir/': No such file or directory
?? gix-worktree-state/tests/.tmp0G17W0/
?? gix-worktree-state/tests/.tmppHCjNo/
?? gix-worktree-state/tests/.tmptm4xhe/
?? gix-worktree-state/tests/.tmpGXfX46/
warning: could not open directory 'gix-worktree-state/tests/.tmpObZ1Fr/': No such file or directory
?? gix-worktree-state/tests/.tmppvvuHV/
?? gix-worktree-state/tests/.tmp01u1kk/
?? gix-worktree-state/tests/.tmpDaIEB5/
?? gix-worktree-state/tests/.tmpF65J6d/
warning: could not open directory 'gix-worktree-state/tests/.tmpmUOmpE/': No such file or directory
?? gix-worktree-state/tests/make_dangerous_symlink.lock
?? gix-worktree-state/tests/.tmp85C1QP/
?? gix-worktree-state/tests/.tmpNb3oaC/
?? gix-worktree-state/tests/.tmpcyNLIM/
warning: could not open directory 'gix-worktree-state/tests/.tmpbDxnXf/': No such file or directory
?? gix-worktree-state/tests/.tmpNm8Jvr/
?? gix-worktree-state/tests/.tmp8mGi8g/
?? gix-worktree-state/tests/.tmpPyOmIx/
?? gix-worktree-state/tests/.tmpXxek6z/
?? gix-worktree-state/tests/.tmpfydGvh/
?? gix-worktree-state/tests/.tmp1f0ID4/
warning: could not open directory 'gix-worktree-state/tests/.tmpwEQjSY/': No such file or directory

But that raises the question, then, of how the rare test failure could be due to a race condition triggered by test parallelism. What am I missing here? I'm missing something, because...

It only happens when tests are run in parallel

I hoped that if repeated runs could reproduce the issue, say, 1 in 100, then one could try to disable parallelism and repeat the reproduction which should then be impossible.
This would tell us it's related to parallelism, and we could see if and how this can be addressed.

The effect you predicted here is basically what happens--repetition with parallelism is able to reproduce the failures, while repetition without parallelism does not produce it--though I did not find it quite as easy to produce as you suggested.

I think the failure occurs organically far less often than once every hundred test runs. When it does, I think it must be interacting with a different test case, probably another test case also in checkout.rs though I do not know which. I have not managed to trigger a failure by the exact mechanism it occurs naturally.

However, what does produce it repeatably, on some systems, is to explode the test case into $n$ duplicate test cases and have nextest run them all. This is easy to do with test_case::test_matrix:

diff --git a/gix-worktree-state/tests/state/checkout.rs b/gix-worktree-state/tests/state/checkout.rs
index 764b52020..50ad0cd2d 100644
--- a/gix-worktree-state/tests/state/checkout.rs
+++ b/gix-worktree-state/tests/state/checkout.rs
@@ -12,6 +12,7 @@ use gix_object::{bstr::ByteSlice, Data};
 use gix_testtools::tempfile::TempDir;
 use gix_worktree_state::checkout::Collision;
 use once_cell::sync::Lazy;
+use test_case::test_matrix;

 use crate::fixture_path;

@@ -103,8 +104,8 @@ fn accidental_writes_through_symlinks_are_prevented_if_overwriting_is_forbidden(
     }
 }

-#[test]
-fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed() {
+#[test_matrix(0..=999)]
+fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed(_i: i32) {
     let mut opts = opts_from_probe();
     // with overwrite mode
     opts.overwrite_existing = true;

At least four identified factors affect the failures. Taking $n$ to be the number of duplicated test cases and $t$ to be the number of parallel test processes:

  • The failures occur most on macOS, less on Windows, and even less on GNU/Linux. On macOS only, at least one failure will usually occur even when $n$ is decreased from $1000$ to $100$, and even when a default $t=16$ is used rather than $t=2$ (see below). On Windows, a failure often does not occur with $n=1000$ and occasionally does not occur with $n=10000$.

    I tested with macOS 15 arm64, Windows 10 amd64, and a few GNU/Linux systems described below.

  • The failures don't seem ever to occur when the filesystem is case-sensitive. The failing assertion tests that case is preserved in the expected way on a case-insensitive filesystem, and this assertion is not run on case-sensitive filesystems. I have not tried case-sensitive filesystems on macOS and Windows (though that is possible). But on GNU/Linux, I have tested with and without ext4 case-folding. As expected, I can only ever get the problem to happen when case folding is enabled.

    (ext4 casefold, if enabled, allows chattr +F on a directory, to create a directory tree with case-aware but case-insensitive filenames. This is rarely done, and one might wonder, if the test suite is run normally except with gitoxide cloned under such a directory, if everything passes. It turns out everything does pass, including when it is run with GIX_TEST_IGNORE_ARCHIVES=1! Note, however, that I did not test in an environment in which TMPDIR is set to a directory with chattr +F or in which /tmp has chattr +F set, since that did not seem relevant to this issue. After all, the checkout directories are under the current directory rather than somewhere like /tmp -- as verified in the examination of checkout_in() usage, and associated git status experiment, described above.)

  • The failures never happen when the tests are run in series with --test-threads=1 ($t = 1$).

  • The failures happen most with $t = 2$. For example, --test-threads=2 is far more likely to fail, and produces more failures, than --test-threads=16. On GNU/Linux, I have so far only been able to produce any failures with $t = 2$, even when setting $n = 10000$.

GNU/Linux details

On GNU/Linux, I am completely unable to produce the failure on some systems, even with $t = 2$ and trying both $n=1000$ and $n=10000$. I suspect this is because those systems have slower hard drives. But they are also both cloud machines, and they have different kernels, whereas the GNU/Linux systems on which I have produced the failure are Hyper-V virtual machines hosted locally. In case it is relevant, the GNU/Linux systems where I have never managed to do it are:

  • A 16 vCPU IBM cloud VM running Ubuntu 24.04 amd64 system where uname -a gives:

    Linux crow 6.8.0-1025-ibm #25-Ubuntu SMP Thu Apr 24 21:56:50 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
    
  • A 4 vCPU IBM cloud VM running Ubuntu 22.04 s390x system where uname -a gives:

    Linux gull 5.15.0-139-generic #149-Ubuntu SMP Fri Apr 11 22:07:50 UTC 2025 s390x s390x s390x GNU/Linux
    

And the GNU/Linux systems where I have managed to do it are:

  • An 8-core Hyper-V VM running Arch Linux amd64, where uname -a gives:

    Linux catenary 6.14.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 03 May 2025 13:34:12 +0000 x86_64 GNU/Linux
    
  • An 8-core Hyper-V VM running Ubuntu 24.04 LTS amd64, where uname -a gives:

    Linux sup 6.8.0-59-generic #61-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 11 23:16:11 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
    

But of those two, it is much easier to achieve on the Arch Linux system than the Ubuntu 24.04 LTS system.

ext4 filesystems are not usually created with casefold support. So on GNU/Linux I created and mounted an ext4 disk image with casefold enabled, to be able to use chattr +F. The commands I ran were as follows:

dd if=/dev/zero of=ext4-casefold.img bs=100M count=100 status=progress
mkfs.ext4 -O casefold ext4-casefold.img
mkdir ext4-casefold
sudo mount ext4-casefold.img ext4-casefold
cd ext4-casefold
sudo chown -- "$USER" .
mkdir icase
cd icase
chattr +F .
touch a A
ls -l  # Observe that there is only one file.
rm a
gh repo clone gitoxide
cd gitoxide
git switch clash
cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast --test-threads=2

If for some reason you decide to follow those steps, another remote might need to be added, or my fork cloned explicitly, in order for git switch clash to work.

I will open a draft pull request on the clash branch, though I have only tested it locally, and it does not (at least right now) make any effort to fix the bug, whose cause I still do not know.

Repeat the last command (in which $t=2$) up to twenty times, though if it is going to produce a failure then it usually produces one in at least one of the first five runs:

cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast --test-threads=2

If that does not work with the default $n=1000$, try it again repeatedly with $n=10000$ (by changing the test_matrix argument from 1..=999 to 1..=9999).

What failures look like when the test is duplicated

The failures look similar on all systems, all with the same assertion message and mismatched values, whenever failures occur. Here's a failure from Arch Linux:

        FAIL [   0.010s] gix-worktree-state-tests::worktree state::checkout::writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed::_492_expects
  stdout ───

    running 1 test
    test state::checkout::writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed::_492_expects ... FAILED

    failures:

    failures:
        state::checkout::writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed::_492_expects

    test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1013 filtered out; finished in 0.00s

  stderr ───

    thread 'state::checkout::writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed::_492_expects' panicked at gix-worktree-state/tests/state/checkout.rs:123:9:
    assertion `left == right` failed
      left: ["A-dir/a", "A-file", "FAKE-DIR", "fake-file"]
     right: ["A-dir/a", "A-file", "FAKE-DIR", "FAKE-FILE"]
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This gist contains a full run of that procedure on Arch Linux, the first time I did it on that system, along with some further details including potentially relevant extracts. (Conveniently, the first run of the last command happened to produce the error, that time.) But for the most part I think the information in this comment and the forthcoming investigational PR may be more useful.

As noted above, producing this on macOS is significantly easier than on Windows and GNU/Linux, and I do not have full or sustained access to a macOS system. Therefore, some further testing might be challenging for me to carry out.

Other findings

In addition to the writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed test where the failure occurs, I separately also tried #[test_matrix(0..=999)] on the related test accidental_writes_through_symlinks_are_prevented_if_overwriting_is_forbidden. I didn't expect that to produce any failures, and it didn't.

I tried inserting panic!("got here") calls into the unexpected branch in the conditional the checks whether the destination filesystem is case-sensitive. It was not reached. So I don't think misdetection is happening. The assertion message is also not what I would expect to get with case-sensitivity misdetection.

The git status --short test described above is not something I usually did, nor did I notice it to have any effect on the number of failures, nor even whether there are failures. So that was not the cause. However...

These tests do couple with the outside repository

I have not fully ruled out the possibility that interaction with the outer containing repository (i.e. the gitoxide repository itself) could be responsible.

At first thought, it seems unlikely that this would be the trigger, and also like it could be tested easily by attempting to run the tests in the absence of a top-level .git (i.e., where the gitoxide directory is not a repository).

But this is actually one of the tests that depends--or at least has depended, and I'm fairly sure still does depend--on the presence of the outer gitoxide repository.

In #1687, I mentioned:

The gitoxide test suite currently depends, I believe mostly unintentionally, on the gitoxide repository itself existing and having various properties. Various fixture scripts and tests fail if there is no outer repository; or if the outer repository is blocked by configured ceiling directories; or if the tests are run from a linked worktree, rather than its main worktree; or if git operations cannot be performed on the outer repository due to safe.directory errors.

As far as I know, the only coupling to the gitoxide repository that is firmly intended in the design of the test suite is the use of .gitignore information to determine which fixtures are allowed to generate archives. A few uses of the outer repository seem intentional, but not essential to the design, and they can be fixed using a fixture with nested repositories, decoupling completely from gitoxide's own repository. More often, I think this coupling is entirely unintended.

I haven't documented this in full detail yet, and maybe it should have an issue, but see this gist and this repo. (The latter is a full repo rather than a gist to facilitate visual diff via the web editor, and because it benefits from a non-flat directory structure.)

There is some further context there, as well as related discussion in comments.

But I didn't look into that much more at the time, possibly since it turned out not to be a big a problem for running the test suite in a container on CI as I had expected it to be.

For the test case at issue here, two kinds of failures that occurred when the outside gitoxide directory was not the main worktree of a Git repository seem potentially relevant here:

  • When the gitoxide directory was not a repository at all, the test failed on all platforms. This happened in the opts_from_probe call. That's relevant here because it makes it harder to rule out coupling with an external repository as a cause of this issue, since I cannot easily run the test and have it get to those assertions, in the absence of an outside repository.
  • When the gitoxide directory was a linked worktree rather than a main worktree, the test failed on macOS and Windows only. That is, it only happened on systems where the filesystems are, by default, case-insensitive. But I think it is dissimilar to this issue. It seems instead possibly to have been a misdetection of whether the directory was case-sensitive: the expected value in the comparison was the value for a case-sensitive filesystem. That is thus more similar to #960 than to this issue.

Edit: The investigational PR is #2008.

Edit 2: Fixed position of -- argument in one of the commands shown above, to express the correct idea and also to avoid doing the wrong thing in some unusual scripting scenarios.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 10, 2025
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS.

The `fail-fast: false` added here can be removed after testing.
@Byron
Copy link
Member

Byron commented May 10, 2025

Thanks so much for this extremely thorough investigation. To speed up my own processing of it, I ran it through a local Gemma 2.5 for summarisation, and then looked at the ways to reproduce the issue in the original text.

AI Summary

This document details an investigation into a rare, parallel-test-specific failure in the gix-worktree-state crate's checkout tests, specifically within writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed. Here’s a summary:

The Problem:

  • A test occasionally fails when run in parallel, asserting that two lists of strings are equal but they differ by case (e.g., "FAKE-FILE" vs. "fake-file").
  • Repetition with parallelism reproduces the failure roughly 1/100 runs; repetition without parallelism does not.

Current Understanding:

  • The tests create temporary directories inside the current directory, one per test case, and these are created atomically to avoid collisions even during parallel execution. git status --short confirms multiple distinct directories are being used.
  • The failure isn't a simple bug in the test setup; it seems triggered by race conditions when tests run concurrently.

Factors Affecting Failure Rate:

  • Operating System: Most frequent on macOS, less so on Windows, and least on GNU/Linux.
  • Case Sensitivity: Failures only occur on case-insensitive filesystems (the assertion checks for preserved case). They don't happen on case-sensitive systems.
  • Parallelism Level: The failure rate is highest with 2 parallel threads (--test-threads=2) and decreases with more threads.
  • Test Duplication: Duplicating the failing test case (running it many times) reliably reproduces failures, especially on macOS.

GNU/Linux Specifics:

  • Failures are harder to reproduce on GNU/Linux, potentially due to slower hard drives.
  • The issue is observed more easily on Hyper-V VMs than IBM cloud VMs with different kernels.
  • Creating an ext4 filesystem with case folding enabled (using chattr +F) allows reproduction of the failure.

Potential Causes & Investigations:

  • External Repository Interaction: The tests do depend on the outer gitoxide repository existing and having certain properties, which could be a factor. Running the test without an external repo makes it harder to isolate the issue.
  • The author suspects that interaction with other tests in checkout.rs might be involved.
  • A draft pull request (Test overwrite_existing apparent symlink dereferencing #2008) has been opened for further investigation but doesn't yet include a fix.

Key Takeaway: The failure appears to be a subtle race condition triggered by parallel test execution on case-insensitive filesystems, potentially exacerbated by interactions with the outer gitoxide repository and other tests within the suite. The exact cause remains unknown.

Something I couldn't determine is if the parallelism was always controlled by the test-suite, or also by the configuration of the invocation itself. That means the status itself can be forced to use only one thread, and there it would be interesting if it still reproduces.
If not, at least there would be a workaround.

  • When the gitoxide directory was not a repository at all, the test failed on all platforms. This happened in the opts_from_probe call. That's relevant here because it makes it harder to rule out coupling with an external repository as a cause of this issue, since I cannot easily run the test and have it get to those assertions, in the absence of an outside repository.

It would probably be useful to allow this to be overridden by an environment variable pointing to the repository to use instead. All it wants is the .git/config file to exist so it can be checked for existence.

@EliahKagan
Copy link
Member Author

EliahKagan commented May 11, 2025

I agree with the generated summary except on two points:

  1. Repetition with parallelism reproduces the failure roughly 1/100 runs[...]

    How often the failure happens varies across operating systems. On macOS, in local testing, failures occur even more often than once every 100 runs, in the modified tests where the failing test case is duplicated (as in #2008). On other systems, in local testing, failures don't occur nearly that often even with the modified procedure.

    In the natural form of the failure observed on CI without the modification, the test is not duplicated, so there is at most one run of the test per CI job run. Then I think it averages much less often than once every 100 runs on CI. Only one such failure (the one reported here) is confirmed. I vaguely recall having seen such a thing before. But even assuming it occurred without modifying the test or the code under test, that's still only two observed failures over more than a year.

    (My worry is that it may occur more readily under circumstances that are not tested, in which case it might represent an as-yet unidentified nontrivial symlink traversal bug in gix-worktree-state.)

  2. Failures are harder to reproduce on GNU/Linux, potentially due to slower hard drives.

    I think the two GNU/Linux systems on which I was never able to produce the failure have slower hard drives than the two other GNU/Linux systems where I was able produce it. However, even on those with faster hard drives where I was able to produce it, it was harder to produce and occurred less often than on macOS or Windows.

    (With that said, one thing I should've mentioned in the comment was that I can't be certain that it is usually harder to produce them on GNU/Linux than macOS or Windows, because I only tested on one macOS system and one Windows system, in contrast to four GNU/Linux systems. CI runs in #2008 have since suggested that I may have overestimated the difference in how hard it is to produce on macOS vs. Windows.)

Something I couldn't determine is if the parallelism was always controlled by the test-suite, or also by the configuration of the invocation itself.

The number of parallel test runs, $t$, is set only by the test runner, and it is adjusted per run. When --test-threads is passed, this is its operand. Otherwise I believe nextest decides it based on how many logical processors are detected. In either case, I used cargo nextest in all of these experiments. Slightly confusingly, I believe nextest parallelizes only through the use of multiple processes, not multiple threads. (So the meaning of "threads" in "--test-threads" is broader or, more likely, a misnomer selected as a tradeoff, to match the name of the most closely corresponding option recognized by cargo test executables.)

In contrast, the number of test cases that the original case writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed is duplicated to produce, $n$, is specified in the code of the checkout.rs module, by the argument to the test_matrix macro. For example, $n = 1000$, which I tested the most with and which #2008 has, is set by using 0..=999 as the argument:

#[test_matrix(0..=999)]
fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed(_i: i32) {

Previously, and on the main branch, $n = 1$ implicitly. The way I am using the test_matrix macro is to achieve an effect similar to removing #[test] from the original test function and renaming it helper, and adding:

#[test] fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed_0() { helper(); }
#[test] fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed_1() { helper(); }
#[test] fn writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed_2() { helper(); }
...

Using test_matrix avoids extra actual code duplication and is easier to read, and makes it easier to adjust $n$. But the adjustments are still made by editing the range in the test module checkout.rs.

That means the status itself can be forced to use only one thread, and there it would be interesting if it still reproduces.
If not, at least there would be a workaround.

I don't know what you mean by "the status itself."

In all of my experiments where I produced failures, they do go away completely when --test-threads=1 is passed.

When the failure occurs organically, the test is not duplicated, so it must be interacting with something else, almost certainly something done as a result of another test running at the same time. It seems likely that the other test or tests that can interact to produce this are tests of gix-worktree-state as well, though I do not know that to be the case. If they are, we may be able to restrict them from running at the same time, even when the test suite is run in a way that otherwise parallelizes. Is that what you mean?

Since this test failure is extremely rare -- except when deliberately produced as I did in my local testing and in #2008, using a different mechanism from whatever produces it naturally -- I don't view the failure as something that we need to work around. Instead, I'm worried that it may be caused by a bug in the code being tested.

Or by "the status" do you mean the separate experiment I did less often where I ran git status --short in a loop to observe a subset of the temporary directories being created? But if so, then I don't follow. (As far as I have observed, that doesn't seem to affect the experiment. Also, I didn't usually do it, and I never ran more than one instance of that shell loop at a time.)

It would probably be useful to allow this to be overridden by an environment variable pointing to the repository to use instead.

I'll look into this.

I think that we should be able to fix that coupling with the outside gitoxide repository, as well as most other such couplings, by having each test that relies on an outside gitoxide repository instead create and use a temporary repository (possibly in addition to any it is already using), or moving creation of the additional repository into a fixture. This might be more extensive than accepting an environment variable, but I think the additional isolation should help avoid unintended interactions between parallel tests.

However, even if that is done and it makes this failure go away, I worry that whatever concurrency-related oddity is triggering the failure may correspond to something that could happen in real-world use, in situations where we really don't want to inadvertently dereference symlinks and write to their targets.

With all that said, I still don't know that interaction with an outside repository has anything at all to do with these failures, nor have I thought of a way that it would cause them.

@Byron
Copy link
Member

Byron commented May 11, 2025

  1. (My worry is that it may occur more readily under circumstances that are not tested, in which case it might represent an as-yet unidentified nontrivial symlink traversal bug in gix-worktree-state.)

I see, thanks for pointing that out.

My apologies for the confusion regarding the thread-limit of "the status itself" - I took a shortcut and avoided putting in reference links which I want to correct, besides the fact that this is a checkout, not a status.

The worktree checkout call has options that allow to change the amount of threads to use.

/// If set, don't use more than this amount of threads.
/// Otherwise, usually use as many threads as there are logical cores.
/// A value of 0 is interpreted as no-limit
pub thread_limit: Option<usize>,

The test-case may set this value to a number higher than one.

fn opts_from_probe() -> gix_worktree_state::checkout::Options {
static CAPABILITIES: Lazy<gix_fs::Capabilities> = Lazy::new(|| probe_gitoxide_dir().unwrap());
gix_worktree_state::checkout::Options {
fs: *CAPABILITIES,
destination_is_initially_empty: true,
thread_limit: gix_features::parallel::num_threads(None).into(),
..Default::default()
}
}

This is relying on the gix-features/parallel feature set, which also happens to be set by the test-suite. If that feature was disabled, only a single thread is used to do the checkout which should make it fully deterministic. My guess is that parallel tests also won't reproduce the problem anymore. But it would be interesting to know.

I think that we should be able to fix that coupling with the outside gitoxide repository, as well as most other such couplings, by having each test that relies on an outside gitoxide repository instead create and use a temporary repository (possibly in addition to any it is already using), or moving creation of the additional repository into a fixture. This might be more extensive than accepting an environment variable, but I think the additional isolation should help avoid unintended interactions between parallel tests.

The environment variable idea was more of a quick-hack to make the test. Ideally there was a way to have a test-fixture that is created once per run of the test-suite, and not once per test. All that would be needed is a directory with a config file.

Since this test failure is extremely rare -- except when deliberately produced as I did in my local testing and in #2008, using a different mechanism from whatever produces it naturally -- I don't view the failure as something that we need to work around. Instead, I'm worried that it may be caused by a bug in the code being tested.

With the test-case (crate) tests it seems much more likely to reproduce the failure, which would be good to prevent regressions once a fix is attempted.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 11, 2025
To investigate GitoxideLabs#2006, this explodes the occasionally failing test
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
into 1000 identical tests, using `test_case::test_matrix`.

When run in series, they all pass, but when run in parallel with
`cargo nextest`, some failures always occur, at least when this is
done on macOS. This is as predicted in:
GitoxideLabs#2006 (comment)

A more specific result in local testing on macOS 15 is that, while
the number of failures is 0 with `--test-threads=1`, with 2 or more
threads, the number of failures tends to *decrease* with the number
of threads. The most failures occur with `--test-threads=2`, about
30 on the system tested, while `--test-threads=16` has about 10
failures.

The tests were run with this command, with any `--test-threads=<N>`
argument added:

    cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 11, 2025
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS.

The `fail-fast: false` added here can be removed after testing.
@EliahKagan
Copy link
Member Author

My apologies for the confusion regarding the thread-limit of "the status itself" - I took a shortcut and avoided putting in reference links which I want to correct, besides the fact that this is a checkout, not a status.

No problem! The checkout vs. status distinction was what had confused me.

This is relying on the gix-features/parallel feature set, which also happens to be set by the test-suite. If that feature was disabled, only a single thread is used to do the checkout which should make it fully deterministic.

As a quick (non-)update: I'm definitely going to try this out, I just haven't gotten to it yet, in part because I've been looking for a reasonably efficient (i.e. doesn't take hours per run) way to produce the failure on GNU/Linux on CI as well. That way, anything that works around or fixes it can be validated more decisively. But if I don't manage to it then that's okay and I'll proceed without it.

My guess is that parallel tests also won't reproduce the problem anymore. But it would be interesting to know.

I'm curious: would you expect setting a thread_limit of 1 explicitly (instead of implicitly) to have the same effect on this test as disabling gix-features/parallel? Or is there something in gix-features/parallel beyond the use of multiple threads that you suspect may be involved?

I think they should have the same effect and thus that it may be sufficient for me to test only one of them, because even if in_parallel_with_finalize could somehow trigger the bug even when num_threads is 1, it is only used in the checkout when the number of threads to use is not 1.

But I'm not entirely sure why any of this would have an effect, since even with multithreaded checkout, the test never fails if all runs of the test are in series (even as each run is still allowed to parallelize the checkout). In any case, I shall certainly test it.

@Byron
Copy link
Member

Byron commented May 14, 2025

I'm curious: would you expect setting a thread_limit of 1 explicitly (instead of implicitly) to have the same effect on this test as disabling gix-features/parallel? Or is there something in gix-features/parallel beyond the use of multiple threads that you suspect may be involved?

Actually that should be the same. However, without the parallel feature code may possibly still run in the parallel implementation (there are usually two, one for single-threading, one for multiple threads), and that may possibly affect the outcome (even though it shouldn't).

In general I'd assume that if parallel isn't enabled, it will be the same as if num_threads = 1, particularly because the num_threads() function should always return 1 then as well.

But I'm not entirely sure why any of this would have an effect, since even with multithreaded checkout, the test never fails if all runs of the test are in series (even as each run is still allowed to parallelize the checkout). In any case, I shall certainly test it.

I think the scenario that I thought was untested is if num_threads = 1 (or without the parallel feature) along with the test-matrix that could reproduce the issue otherwise. Also, I may absolutely have missed that case in the description and didn't want to suggest double-checking or anything like that.

@EliahKagan
Copy link
Member Author

I think the scenario that I thought was untested is if num_threads = 1 (or without the parallel feature) along with the test-matrix that could reproduce the issue otherwise. Also, I may absolutely have missed that case in the description and didn't want to suggest double-checking or anything like that.

I don't think you missed anything--I have not yet tested that. (I was just curious if, when I test that, the effect of disabling gix-features/parallel would also need to be tested separately.)

@Byron
Copy link
Member

Byron commented May 14, 2025

Wonderful, then I am extra-curious how this will turn out!

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 17, 2025
To investigate GitoxideLabs#2006, this explodes the occasionally failing test
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
into 1000 identical tests, using `test_case::test_matrix`.

When run in series, they all pass, but when run in parallel with
`cargo nextest`, some failures always occur, at least when this is
done on macOS. This is as predicted in:
GitoxideLabs#2006 (comment)

A more specific result in local testing on macOS 15 is that, while
the number of failures is 0 with `--test-threads=1`, with 2 or more
threads, the number of failures tends to *decrease* with the number
of threads. The most failures occur with `--test-threads=2`, about
30 on the system tested, while `--test-threads=16` has about 10
failures.

The tests were run with this command, with any `--test-threads=<N>`
argument added:

    cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 17, 2025
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS.

The `fail-fast: false` added here can be removed after testing.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 18, 2025
- Test `--test-threads` operands of 1 to 4, rather than 1 to 7.
- Do 7 reps instead of 16.

In addition to the 1000x repetition in `checkout.rs` of the
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
test case, which in practice always surfaces GitoxideLabs#2006 on CI on macOS
and Windows, the `test-ext4-casefold` job as it currenly stands
seems sufficient to test whether a possible future solution would
fix the bug on GNU/Linux.

This looks like about the minmum amount we should reasonably verify
still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no
longer failing *any* jobs afterwards. But the `test-ext4-casefold`
jobs are still much more than would be wanted on the main branch.

Assuming the cause of the failures is the same on all systems, and
the solution neither explicitly nor implicitly operates differently
across systems (when comparing how it works when the filesystem is
case-insensitive, that is), it may be enough on CI to regression
test only macOS and Windows for GitoxideLabs#2006.

Thus, this also adds a TODO comment atop the `test-ext4-casefold`
definition reminding that it should not be kept, or not in full.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 18, 2025
- Test `--test-threads` operands of 1 to 4, rather than 1 to 7.
- Do 7 reps instead of 16.

In addition to the 1000x repetition in `checkout.rs` of the
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and
Windows without further modifications -- the `test-ext4-casefold`
jobs as they currenly stand seem like they would ber sufficient to
test whether a possible future solution would fix the bug on
GNU/Linux.

This looks like about the minmum amount we should reasonably verify
still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no
longer failing *any* jobs afterwards. But the `test-ext4-casefold`
jobs are still much more than would be wanted on the main branch.

Assuming the cause of the failures is the same on all systems, and
the solution neither explicitly nor implicitly operates differently
across systems (when comparing how it works when the filesystem is
case-insensitive, that is), it may be enough on CI to regression
test only macOS and Windows for GitoxideLabs#2006.

Thus, this also adds a TODO comment atop the `test-ext4-casefold`
definition reminding that it should not be kept, or not in full.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 18, 2025
This (from #36) adds a `test-ext4-casefold` matrix job
definition to `ci.yml`, which is intended probably to be temporary,
to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition
to macOS and Windows), when the GNU/Linux system is using an ext4
volume with a case-insensitive directory tree, i.e., where the ext4
filesystem is created with `-O casefold` and operations are
performed in a directory within that filesystem where `chattr +F`
has been used to enable case-folding path equivalence.

The bug is nondeterministic and challenging to produce on
GNU/Linux, even when the case-folding precondition is satisfied (as
all the `test-ext4-casefold` jobs achieve). It is much harder to
produce on GNU/Linux than on macOS or Windows. Accordingly, the
number of duplicate test cases is increased 25-fold from 1000 to
25000 in those tests, and multiple jobs are created, some of them
equivalent.

As expected, the failures do not occur when the runner does not run
tests in parallel, and they do occur most of the time when the
runner runs tests in parallel with a maximum number of parallel
tests set to 2 or to 4. However, unexpectedly, the failures almost
never occur when the test are run in parallel with a maximum number
of parallel tests set to 3. It is unclear to what extent, if any,
this generalizes, but I have not observed these patterns when
testing locally, including when modifying the local procedure to be
more similar to the behavior here (such as by suppressing reporting
of non-failing tests and suppressing the progress bar). In local
testing, I am unable to produce failures on some GNU/Linux systems,
but on those on which I am able to produce them, they are easier to
produce with all values of the `--test-threads` operand, including
3.

Not all operating systems, versions, and kernel builds and options
support case-folding on ext4. But support is fairly common on
Linux-based systems, and the CI runners have no problem with that.
In contrast, although case-folding tmpfs also exists on Linux, it
is newer and not (yet) supported as widely, and the GitHub-hosted
runners do not support case-folding tmpfs. It is possible to mount
a tmpfs filesystem and create an ext4 image in it so that, in
principle, fewer operations need to access the disk. That is one of
the approaches that was tried, but it does not appear to make it
any easier to surface these failures.

This squashes multiple commits. (Mistakes, and less illuminating
experiments, have already been omitted; the commits being squashes
here are those that are expected to be of possible interest, but
whose details are almost but not quite important enough to justify
the cumbersome effect of having them individually in the history.)
The squashed commits' individual messages are shown below.

For full changes and CI results from each commit, see:
#36

---

* Add a case-folding GNU/Linux CI test job

This can likely be removed later. But it might be kept if it finds
anything interesting, especially if it helps test a bugfix. In that
case, it may make sense to have it run only tests that are of
special interest in connection with case-sensitivity.

* Case-fold `$TMPDIR` as well in CI casefold test

* Run only 2 parallel test processes in test-ext4-casefold

This seemed to make the bug *easier* to reproduce (compared to
larger values) locally, so maybe it will surface it on CI even in
GNU/Linux.

* Increase symlink experiment reps 10x in test-ext4-casefold

From 1000 (0..=999) to 10000 (0..=9999).

* Try even harder to reproduce the failure on GNU/Linux

By repeating `cargo nextext` on the affected test group, up to 20
times.

* Add `ubuntu-24.04-arm` to `test-ext4-casefold` job

In case the failure might somehow be easier to reproduce there.

* Use tmpfs casefold instead of ext4 casefold

* Use tmpfs-backed ext4 as a workaround

For when tmpfs does not suppport `-o casefold` in mounting.

This does require that ext4 support `-O casefold`, but that was
already working before.

Even though that worked before, this could fail when using an ext4
image on tmpfs, because the ext4 image is smaller than it was when
tmpfs was not used, in order to allow it to be created in memory.
While this allows it to be created, the tests may fail if build
artifacts and other files need more space than is available.

* Run only the test(s) of interest

In `test-linux-casefold`, this runs only the tests that are
duplicates of each other, where the failure has been observed,
rather than first running the whole test suite.

The reason is to try to build fewer crates and test executables,
since this is currenntly failing because it runs out of space on
the ext4 image whose size is itself constrained by the amount of
available memory for the tmpfs filesystem on which the image is
created.

* Don't attempt to use case-folding tmpfs

But check if it is reported as available and, if so, issue a GHA
warning that it should be used instead of the current way.

* Try to fail at least as often, with less overhead

This combines a few changes:

- Only report the status of tests that actually fail.
- Run the `cargo nextest` run command once, not 20 times.
- Multiply the number of test cases by another factor of 10.
- Stop after the first failing case.

* Move the ext4 image off tmpfs

This tries going back to not using tmpfs for I/O speedup, with the
idea that the increased complexity and decreased flexibility might
be possible to avoid now that other adjustments are made to surface
the failure more reliably.

* Try to fail more often, with less delay

The `test-ext4-casefold` jobs were still not surfacing the error
quite reliably enough. The wait was also longer than ideal. This
commit makes some changes that hope to improve on both.

- Decrease the number of duplicate test cases by a factor of 4.
  This has the effect that they are increased by 25x compared to
  the multiplier in the commited code, instead of 100x.

- Create 10 times as many CI jobs in the matrix, by adding a `num`
  variable whose values are those in the range 0..=9. This variable
  is not actually used, it just serves to produce more CI jobs.

- Don't use `Swatinem/rust-cache` in these jobs anymore. It is not
  obvious that it will make things work better overall, in terms of
  speedup, cache usage, and cache access, now that we are
  increasing the number of matrix jobs by a factor of 10.

Moving some of the repetition into separate jobs, which may run in
parallel, is not only, nor even primarily, to leverage runner
parallelism to do more test runs faster. Instead, it is because it
looks like some chaotic initial state may contribute to how likely
the failure is to occur. No definitive experiment on CI shows this
to be the case -- but in local testing, on some systems, I often
have runs that fail early over and over again, and then wait a
while, come back, and have many runs that don't surface the bug,
then come back later and it is easy to produce again, and so on.

* Further diversify runners

* Undiversify runners but vary test parallelism

Since varying `runs-on` didn't surface more errors, nor help smooth
out inconsistencies across runs.

The `*-arm` runners seem to have fewer errors with the particular
way the tests are running now, so this just uses `ubuntu-latest`.

This also adjusts `rep`, having it take on more values, so as to
more than make up for what would otherwise be a smaller number of
jobs.

* Test more values for test parallelism

This creates additional CI `test-ext4-casefold` jobs, for more
values of the `--test-threads` operand passed to `cargo nextest`.

The motivation is that the value of 3 curiously seems never to
surface the failure on CI. Unlike 1, 3 should is expected to
surface the failure comparably to 2 and 4, both of which do fail
more often than not. Yet the jobs with `--test-threads=3` keep
passing.

(This is unlikely to be due to case-sensitivity misdetection, since
the assertions for when the filesystem is case-sensitive would also
fail, if they fire in a job where the filesystem is case-folding.)

If these jobs are kept, then `rep` should probably be adjusted to
have fewer, perhaps 7, values. However, this adjusts `rep` to have
16 values (i.e. twice as many as before), since the higher operands
to `--test-threads` are expected to produce the failure less often.

* Test fewer test parallelism values and reps

- Test `--test-threads` operands of 1 to 4, rather than 1 to 7.
- Do 7 reps instead of 16.

In addition to the 1000x repetition in `checkout.rs` of the
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and
Windows without further modifications -- the `test-ext4-casefold`
jobs as they currenly stand seem like they would ber sufficient to
test whether a possible future solution would fix the bug on
GNU/Linux.

This looks like about the minmum amount we should reasonably verify
still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no
longer failing *any* jobs afterwards. But the `test-ext4-casefold`
jobs are still much more than would be wanted on the main branch.

Assuming the cause of the failures is the same on all systems, and
the solution neither explicitly nor implicitly operates differently
across systems (when comparing how it works when the filesystem is
case-insensitive, that is), it may be enough on CI to regression
test only macOS and Windows for GitoxideLabs#2006.

Thus, this also adds a TODO comment atop the `test-ext4-casefold`
definition reminding that it should not be kept, or not in full.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 18, 2025
This attempts to turn off parallelism within individual runs of
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
by passing `--no-default-features` to turn off the
`gix-features-parallel` feature in `gix-worktree-state/Cargo.toml`
(which is how `gix-features/parallel` is enabled in those tests).

This is to check if the multithreading parallelism within the
checkout operations participates in GitoxideLabs#2006. See:
GitoxideLabs#2006 (comment)
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 18, 2025
This moves setting `thread_limit: Some(1)` out of being a step done
with `sed` in the `test-ext4-casefold` jobs, and into being an
actual change to the source code of the `checkout.rs` test module.

This change is intended to be temporary. The goal is the same as
before, but to observe the effect outside `text-ext4-casefold`.

That is, the goal here is to temporarily see if there is a change
in results in the other jobs where failure almost always occurs
due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and
`windows-latest`, and the `test-fixtures-windows` job.

Note that, as of this change, the `gix-features/parallel` feature
is only turned off in the `test-ext4-casefold` jobs. In others,
that feature is still turned on, just a parallelism of 1, i.e., a
single thread, is used for the checkout. But a parallelism of 1
seems usually to be special-cased to not use facilities related to
multithreading.

(This `thread_limit` argument change, as well as the immediately
preceding change of disabling the `parallel` feature flag in the
`test-ext4-casefold` test jobs, are entirely separate from how many
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
duplicated test cases there are, and also entirely separate from
the operand to `--test-threads`, which actually controls the number
of test *processes* `cargo nextest` creates to run the tests.)
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 18, 2025
Use single-threaded checkout in tests

This change (from #37), which is intended to be
temporary, sets `thread_limit: Some(1)` in the `opts_from_probe()`
test helper function in the `checkout.rs` test module, so that
tests including
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
will use single-threaded checkout. This applies no matter how the
tests are run.

In addition, for the `test-ext4-casefold` CI jobs (but not others),
this adds the `--no-default-features` option to the `cargo nextest`
command, so that the `gix-features-parallel` feature defined in
`gix-worktree-state/Cargo.toml`, which when enabled causes
`gix-features/parallel` to be enabled for the tests, is not
enabled.

The purpose of these changes it to examine if, when checkout
operations are themselves performed without multithreading, there
is any change to GitoxideLabs#2006.

This does not make the failures go away. They still occur on all
systems on which they had occurred before. It is unclear if it
makes any difference to the failures. It seems like they may be
slightly less likely to occur on GNU/Linux, but experiments so far
are insufficient to confirm this. Furthermore, even if there is an
effect, it may be that the effect is more subtle (such as possibly
shifting the number of parallel tests where failures peak, from 2
to some higher number).

The changes here should not be confused with the duplication of the
test case, nor with the argument to `--test-threads`, which
actually specifies the number of parallel test *processes*, since
`cargo nextest` is being used.

This squashes a few commits trying some minor variations. The
original messages are shown below. For full changes and CI results
from each commit, see:
#37

---

* Try to avoid parallel checkout in `test-ext4-casefold`

This attempts to turn off parallelism within individual runs of
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
by passing `--no-default-features` to turn off the
`gix-features-parallel` feature in `gix-worktree-state/Cargo.toml`
(which is how `gix-features/parallel` is enabled in those tests).

This is to check if the multithreading parallelism within the
checkout operations participates in GitoxideLabs#2006. See:
GitoxideLabs#2006 (comment)

* Patch the `thread_limit` argument to `Some(1)` too

In the `text-ext4-casefold` jobs.

* Modify the test code to use single-threaded checkout

This moves setting `thread_limit: Some(1)` out of being a step done
with `sed` in the `test-ext4-casefold` jobs, and into being an
actual change to the source code of the `checkout.rs` test module.

This change is intended to be temporary. The goal is the same as
before, but to observe the effect outside `text-ext4-casefold`.

That is, the goal here is to temporarily see if there is a change
in results in the other jobs where failure almost always occurs
due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and
`windows-latest`, and the `test-fixtures-windows` job.

Note that, as of this change, the `gix-features/parallel` feature
is only turned off in the `test-ext4-casefold` jobs. In others,
that feature is still turned on, just a parallelism of 1, i.e., a
single thread, is used for the checkout. But a parallelism of 1
seems usually to be special-cased to not use facilities related to
multithreading.

(This `thread_limit` argument change, as well as the immediately
preceding change of disabling the `parallel` feature flag in the
`test-ext4-casefold` test jobs, are entirely separate from how many
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
duplicated test cases there are, and also entirely separate from
the operand to `--test-threads`, which actually controls the number
of test *processes* `cargo nextest` creates to run the tests.)
@EliahKagan
Copy link
Member Author

EliahKagan commented May 18, 2025

Per EliahKagan#27 and #2008 (comment), I believe I've made the checkout operation performed in the test single-threaded. The failure still occurs.

Whether or not the checkout itself is allowed to parallelize with multiple threads, the failure:

  • Occurs nondeterministically if multiple tests that do the checkout, at least when they are copies of writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed, are run in parallel in separate processes via cargo nextest (whether or not those tests parallelize with threads internally).
  • Never occurs if only a single test is run at a time (whether or that test parallelizes with threads internally).

This is what I would have expected--since the failure was observed to be triggered by running multiple tests at once and not when running only one test, I was never clear on why the failures would be expected to go away when the checkout operation performed within a test is forbidden from using its own parallelism.

You may want to take a look at the change, in case I was mistaken to think that this change to thread_limit in opts_from_probe() was sufficient to avoid parallelism in the checkout:

gix_worktree_state::checkout::Options {
fs: *CAPABILITIES,
destination_is_initially_empty: true,
thread_limit: Some(1), // gix_features::parallel::num_threads(None).into(),
..Default::default()
}

@Byron
Copy link
Member

Byron commented May 18, 2025

Thanks for the summary, setting the thread_limit to 1 should do the trick.

I think what I'd do next is to keep the tempdirectory of the test, and compare the Git-state of a successful run with the Git state of a failure.

The source of the test is read-only, so whatever happens must be on the writer side. And the writer will write files in the correct order, but it will also only create new files, so case-folding filesystems should report an error if the file already exists.

It's a bit unexpected that the test needs collisions to be empty, I would have thought that fake-file will collide with FAKE-FILE, after all we don't want to write through symlinks. So it really feels that some notes are missing, and/or additional assertions are missing as well - right now it feels like it writes through a symlink, but as everything is empty we wouldn't be able to know.

Maybe this another avenue - trying to understand what's really happening when 'it works', to be able to make additional assertions and maybe get closer to cracking this nut that way.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 19, 2025
To investigate GitoxideLabs#2006, this explodes the occasionally failing test
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
into 1000 identical tests, using `test_case::test_matrix`.

When run in series, they all pass, but when run in parallel with
`cargo nextest`, some failures always occur, at least when this is
done on macOS. This is as predicted in:
GitoxideLabs#2006 (comment)

A more specific result in local testing on macOS 15 is that, while
the number of failures is 0 with `--test-threads=1`, with 2 or more
threads, the number of failures tends to *decrease* with the number
of threads. The most failures occur with `--test-threads=2`, about
30 on the system tested, while `--test-threads=16` has about 10
failures.

The tests were run with this command, with any `--test-threads=<N>`
argument added:

    cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 19, 2025
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS.

The `fail-fast: false` added here can be removed after testing.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 19, 2025
This (from #36) adds a `test-ext4-casefold` matrix job
definition to `ci.yml`, which is intended probably to be temporary,
to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition
to macOS and Windows), when the GNU/Linux system is using an ext4
volume with a case-insensitive directory tree, i.e., where the ext4
filesystem is created with `-O casefold` and operations are
performed in a directory within that filesystem where `chattr +F`
has been used to enable case-folding path equivalence.

The bug is nondeterministic and challenging to produce on
GNU/Linux, even when the case-folding precondition is satisfied (as
all the `test-ext4-casefold` jobs achieve). It is much harder to
produce on GNU/Linux than on macOS or Windows. Accordingly, the
number of duplicate test cases is increased 25-fold from 1000 to
25000 in those tests, and multiple jobs are created, some of them
equivalent.

As expected, the failures do not occur when the runner does not run
tests in parallel, and they do occur most of the time when the
runner runs tests in parallel with a maximum number of parallel
tests set to 2 or to 4. However, unexpectedly, the failures almost
never occur when the test are run in parallel with a maximum number
of parallel tests set to 3. It is unclear to what extent, if any,
this generalizes, but I have not observed these patterns when
testing locally, including when modifying the local procedure to be
more similar to the behavior here (such as by suppressing reporting
of non-failing tests and suppressing the progress bar). In local
testing, I am unable to produce failures on some GNU/Linux systems,
but on those on which I am able to produce them, they are easier to
produce with all values of the `--test-threads` operand, including
3.

Not all operating systems, versions, and kernel builds and options
support case-folding on ext4. But support is fairly common on
Linux-based systems, and the CI runners have no problem with that.
In contrast, although case-folding tmpfs also exists on Linux, it
is newer and not (yet) supported as widely, and the GitHub-hosted
runners do not support case-folding tmpfs. It is possible to mount
a tmpfs filesystem and create an ext4 image in it so that, in
principle, fewer operations need to access the disk. That is one of
the approaches that was tried, but it does not appear to make it
any easier to surface these failures.

This squashes multiple commits. (Mistakes, and less illuminating
experiments, have already been omitted; the commits being squashes
here are those that are expected to be of possible interest, but
whose details are almost but not quite important enough to justify
the cumbersome effect of having them individually in the history.)
The squashed commits' individual messages are shown below.

For full changes and CI results from each commit, see:
#36

---

* Add a case-folding GNU/Linux CI test job

This can likely be removed later. But it might be kept if it finds
anything interesting, especially if it helps test a bugfix. In that
case, it may make sense to have it run only tests that are of
special interest in connection with case-sensitivity.

* Case-fold `$TMPDIR` as well in CI casefold test

* Run only 2 parallel test processes in test-ext4-casefold

This seemed to make the bug *easier* to reproduce (compared to
larger values) locally, so maybe it will surface it on CI even in
GNU/Linux.

* Increase symlink experiment reps 10x in test-ext4-casefold

From 1000 (0..=999) to 10000 (0..=9999).

* Try even harder to reproduce the failure on GNU/Linux

By repeating `cargo nextext` on the affected test group, up to 20
times.

* Add `ubuntu-24.04-arm` to `test-ext4-casefold` job

In case the failure might somehow be easier to reproduce there.

* Use tmpfs casefold instead of ext4 casefold

* Use tmpfs-backed ext4 as a workaround

For when tmpfs does not suppport `-o casefold` in mounting.

This does require that ext4 support `-O casefold`, but that was
already working before.

Even though that worked before, this could fail when using an ext4
image on tmpfs, because the ext4 image is smaller than it was when
tmpfs was not used, in order to allow it to be created in memory.
While this allows it to be created, the tests may fail if build
artifacts and other files need more space than is available.

* Run only the test(s) of interest

In `test-linux-casefold`, this runs only the tests that are
duplicates of each other, where the failure has been observed,
rather than first running the whole test suite.

The reason is to try to build fewer crates and test executables,
since this is currenntly failing because it runs out of space on
the ext4 image whose size is itself constrained by the amount of
available memory for the tmpfs filesystem on which the image is
created.

* Don't attempt to use case-folding tmpfs

But check if it is reported as available and, if so, issue a GHA
warning that it should be used instead of the current way.

* Try to fail at least as often, with less overhead

This combines a few changes:

- Only report the status of tests that actually fail.
- Run the `cargo nextest` run command once, not 20 times.
- Multiply the number of test cases by another factor of 10.
- Stop after the first failing case.

* Move the ext4 image off tmpfs

This tries going back to not using tmpfs for I/O speedup, with the
idea that the increased complexity and decreased flexibility might
be possible to avoid now that other adjustments are made to surface
the failure more reliably.

* Try to fail more often, with less delay

The `test-ext4-casefold` jobs were still not surfacing the error
quite reliably enough. The wait was also longer than ideal. This
commit makes some changes that hope to improve on both.

- Decrease the number of duplicate test cases by a factor of 4.
  This has the effect that they are increased by 25x compared to
  the multiplier in the commited code, instead of 100x.

- Create 10 times as many CI jobs in the matrix, by adding a `num`
  variable whose values are those in the range 0..=9. This variable
  is not actually used, it just serves to produce more CI jobs.

- Don't use `Swatinem/rust-cache` in these jobs anymore. It is not
  obvious that it will make things work better overall, in terms of
  speedup, cache usage, and cache access, now that we are
  increasing the number of matrix jobs by a factor of 10.

Moving some of the repetition into separate jobs, which may run in
parallel, is not only, nor even primarily, to leverage runner
parallelism to do more test runs faster. Instead, it is because it
looks like some chaotic initial state may contribute to how likely
the failure is to occur. No definitive experiment on CI shows this
to be the case -- but in local testing, on some systems, I often
have runs that fail early over and over again, and then wait a
while, come back, and have many runs that don't surface the bug,
then come back later and it is easy to produce again, and so on.

* Further diversify runners

* Undiversify runners but vary test parallelism

Since varying `runs-on` didn't surface more errors, nor help smooth
out inconsistencies across runs.

The `*-arm` runners seem to have fewer errors with the particular
way the tests are running now, so this just uses `ubuntu-latest`.

This also adjusts `rep`, having it take on more values, so as to
more than make up for what would otherwise be a smaller number of
jobs.

* Test more values for test parallelism

This creates additional CI `test-ext4-casefold` jobs, for more
values of the `--test-threads` operand passed to `cargo nextest`.

The motivation is that the value of 3 curiously seems never to
surface the failure on CI. Unlike 1, 3 should is expected to
surface the failure comparably to 2 and 4, both of which do fail
more often than not. Yet the jobs with `--test-threads=3` keep
passing.

(This is unlikely to be due to case-sensitivity misdetection, since
the assertions for when the filesystem is case-sensitive would also
fail, if they fire in a job where the filesystem is case-folding.)

If these jobs are kept, then `rep` should probably be adjusted to
have fewer, perhaps 7, values. However, this adjusts `rep` to have
16 values (i.e. twice as many as before), since the higher operands
to `--test-threads` are expected to produce the failure less often.

* Test fewer test parallelism values and reps

- Test `--test-threads` operands of 1 to 4, rather than 1 to 7.
- Do 7 reps instead of 16.

In addition to the 1000x repetition in `checkout.rs` of the
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and
Windows without further modifications -- the `test-ext4-casefold`
jobs as they currenly stand seem like they would ber sufficient to
test whether a possible future solution would fix the bug on
GNU/Linux.

This looks like about the minmum amount we should reasonably verify
still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no
longer failing *any* jobs afterwards. But the `test-ext4-casefold`
jobs are still much more than would be wanted on the main branch.

Assuming the cause of the failures is the same on all systems, and
the solution neither explicitly nor implicitly operates differently
across systems (when comparing how it works when the filesystem is
case-insensitive, that is), it may be enough on CI to regression
test only macOS and Windows for GitoxideLabs#2006.

Thus, this also adds a TODO comment atop the `test-ext4-casefold`
definition reminding that it should not be kept, or not in full.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 19, 2025
This change (from #37), which is intended to be
temporary, sets `thread_limit: Some(1)` in the `opts_from_probe()`
test helper function in the `checkout.rs` test module, so that
tests including
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
will use single-threaded checkout. This applies no matter how the
tests are run.

In addition, for the `test-ext4-casefold` CI jobs (but not others),
this adds the `--no-default-features` option to the `cargo nextest`
command, so that the `gix-features-parallel` feature defined in
`gix-worktree-state/Cargo.toml`, which when enabled causes
`gix-features/parallel` to be enabled for the tests, is not
enabled.

The purpose of these changes it to examine if, when checkout
operations are themselves performed without multithreading, there
is any change to GitoxideLabs#2006.

This does not make the failures go away. They still occur on all
systems on which they had occurred before. It is unclear if it
makes any difference to the failures. It seems like they may be
slightly less likely to occur on GNU/Linux, but experiments so far
are insufficient to confirm this. Furthermore, even if there is an
effect, it may be that the effect is more subtle (such as possibly
shifting the number of parallel tests where failures peak, from 2
to some higher number).

The changes here should not be confused with the duplication of the
test case, nor with the argument to `--test-threads`, which
actually specifies the number of parallel test *processes*, since
`cargo nextest` is being used.

This squashes a few commits trying some minor variations. The
original messages are shown below. For full changes and CI results
from each commit, see:
#37

---

* Try to avoid parallel checkout in `test-ext4-casefold`

This attempts to turn off parallelism within individual runs of
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
by passing `--no-default-features` to turn off the
`gix-features-parallel` feature in `gix-worktree-state/Cargo.toml`
(which is how `gix-features/parallel` is enabled in those tests).

This is to check if the multithreading parallelism within the
checkout operations participates in GitoxideLabs#2006. See:
GitoxideLabs#2006 (comment)

* Patch the `thread_limit` argument to `Some(1)` too

In the `text-ext4-casefold` jobs.

* Modify the test code to use single-threaded checkout

This moves setting `thread_limit: Some(1)` out of being a step done
with `sed` in the `test-ext4-casefold` jobs, and into being an
actual change to the source code of the `checkout.rs` test module.

This change is intended to be temporary. The goal is the same as
before, but to observe the effect outside `text-ext4-casefold`.

That is, the goal here is to temporarily see if there is a change
in results in the other jobs where failure almost always occurs
due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and
`windows-latest`, and the `test-fixtures-windows` job.

Note that, as of this change, the `gix-features/parallel` feature
is only turned off in the `test-ext4-casefold` jobs. In others,
that feature is still turned on, just a parallelism of 1, i.e., a
single thread, is used for the checkout. But a parallelism of 1
seems usually to be special-cased to not use facilities related to
multithreading.

(This `thread_limit` argument change, as well as the immediately
preceding change of disabling the `parallel` feature flag in the
`test-ext4-casefold` test jobs, are entirely separate from how many
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
duplicated test cases there are, and also entirely separate from
the operand to `--test-threads`, which actually controls the number
of test *processes* `cargo nextest` creates to run the tests.)
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 19, 2025
To investigate GitoxideLabs#2006, this explodes the occasionally failing test
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
into 1000 identical tests, using `test_case::test_matrix`.

When run in series, they all pass, but when run in parallel with
`cargo nextest`, some failures always occur, at least when this is
done on macOS. This is as predicted in:
GitoxideLabs#2006 (comment)

A more specific result in local testing on macOS 15 is that, while
the number of failures is 0 with `--test-threads=1`, with 2 or more
threads, the number of failures tends to *decrease* with the number
of threads. The most failures occur with `--test-threads=2`, about
30 on the system tested, while `--test-threads=16` has about 10
failures.

The tests were run with this command, with any `--test-threads=<N>`
argument added:

    cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 19, 2025
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS.

The `fail-fast: false` added here can be removed after testing.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 19, 2025
This (from #36) adds a `test-ext4-casefold` matrix job
definition to `ci.yml`, which is intended probably to be temporary,
to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition
to macOS and Windows), when the GNU/Linux system is using an ext4
volume with a case-insensitive directory tree, i.e., where the ext4
filesystem is created with `-O casefold` and operations are
performed in a directory within that filesystem where `chattr +F`
has been used to enable case-folding path equivalence.

The bug is nondeterministic and challenging to produce on
GNU/Linux, even when the case-folding precondition is satisfied (as
all the `test-ext4-casefold` jobs achieve). It is much harder to
produce on GNU/Linux than on macOS or Windows. Accordingly, the
number of duplicate test cases is increased 25-fold from 1000 to
25000 in those tests, and multiple jobs are created, some of them
equivalent.

As expected, the failures do not occur when the runner does not run
tests in parallel, and they do occur most of the time when the
runner runs tests in parallel with a maximum number of parallel
tests set to 2 or to 4. However, unexpectedly, the failures almost
never occur when the test are run in parallel with a maximum number
of parallel tests set to 3. It is unclear to what extent, if any,
this generalizes, but I have not observed these patterns when
testing locally, including when modifying the local procedure to be
more similar to the behavior here (such as by suppressing reporting
of non-failing tests and suppressing the progress bar). In local
testing, I am unable to produce failures on some GNU/Linux systems,
but on those on which I am able to produce them, they are easier to
produce with all values of the `--test-threads` operand, including
3.

Not all operating systems, versions, and kernel builds and options
support case-folding on ext4. But support is fairly common on
Linux-based systems, and the CI runners have no problem with that.
In contrast, although case-folding tmpfs also exists on Linux, it
is newer and not (yet) supported as widely, and the GitHub-hosted
runners do not support case-folding tmpfs. It is possible to mount
a tmpfs filesystem and create an ext4 image in it so that, in
principle, fewer operations need to access the disk. That is one of
the approaches that was tried, but it does not appear to make it
any easier to surface these failures.

This squashes multiple commits. (Mistakes, and less illuminating
experiments, have already been omitted; the commits being squashes
here are those that are expected to be of possible interest, but
whose details are almost but not quite important enough to justify
the cumbersome effect of having them individually in the history.)
The squashed commits' individual messages are shown below.

For full changes and CI results from each commit, see:
#36

---

* Add a case-folding GNU/Linux CI test job

This can likely be removed later. But it might be kept if it finds
anything interesting, especially if it helps test a bugfix. In that
case, it may make sense to have it run only tests that are of
special interest in connection with case-sensitivity.

* Case-fold `$TMPDIR` as well in CI casefold test

* Run only 2 parallel test processes in test-ext4-casefold

This seemed to make the bug *easier* to reproduce (compared to
larger values) locally, so maybe it will surface it on CI even in
GNU/Linux.

* Increase symlink experiment reps 10x in test-ext4-casefold

From 1000 (0..=999) to 10000 (0..=9999).

* Try even harder to reproduce the failure on GNU/Linux

By repeating `cargo nextext` on the affected test group, up to 20
times.

* Add `ubuntu-24.04-arm` to `test-ext4-casefold` job

In case the failure might somehow be easier to reproduce there.

* Use tmpfs casefold instead of ext4 casefold

* Use tmpfs-backed ext4 as a workaround

For when tmpfs does not suppport `-o casefold` in mounting.

This does require that ext4 support `-O casefold`, but that was
already working before.

Even though that worked before, this could fail when using an ext4
image on tmpfs, because the ext4 image is smaller than it was when
tmpfs was not used, in order to allow it to be created in memory.
While this allows it to be created, the tests may fail if build
artifacts and other files need more space than is available.

* Run only the test(s) of interest

In `test-linux-casefold`, this runs only the tests that are
duplicates of each other, where the failure has been observed,
rather than first running the whole test suite.

The reason is to try to build fewer crates and test executables,
since this is currenntly failing because it runs out of space on
the ext4 image whose size is itself constrained by the amount of
available memory for the tmpfs filesystem on which the image is
created.

* Don't attempt to use case-folding tmpfs

But check if it is reported as available and, if so, issue a GHA
warning that it should be used instead of the current way.

* Try to fail at least as often, with less overhead

This combines a few changes:

- Only report the status of tests that actually fail.
- Run the `cargo nextest` run command once, not 20 times.
- Multiply the number of test cases by another factor of 10.
- Stop after the first failing case.

* Move the ext4 image off tmpfs

This tries going back to not using tmpfs for I/O speedup, with the
idea that the increased complexity and decreased flexibility might
be possible to avoid now that other adjustments are made to surface
the failure more reliably.

* Try to fail more often, with less delay

The `test-ext4-casefold` jobs were still not surfacing the error
quite reliably enough. The wait was also longer than ideal. This
commit makes some changes that hope to improve on both.

- Decrease the number of duplicate test cases by a factor of 4.
  This has the effect that they are increased by 25x compared to
  the multiplier in the commited code, instead of 100x.

- Create 10 times as many CI jobs in the matrix, by adding a `num`
  variable whose values are those in the range 0..=9. This variable
  is not actually used, it just serves to produce more CI jobs.

- Don't use `Swatinem/rust-cache` in these jobs anymore. It is not
  obvious that it will make things work better overall, in terms of
  speedup, cache usage, and cache access, now that we are
  increasing the number of matrix jobs by a factor of 10.

Moving some of the repetition into separate jobs, which may run in
parallel, is not only, nor even primarily, to leverage runner
parallelism to do more test runs faster. Instead, it is because it
looks like some chaotic initial state may contribute to how likely
the failure is to occur. No definitive experiment on CI shows this
to be the case -- but in local testing, on some systems, I often
have runs that fail early over and over again, and then wait a
while, come back, and have many runs that don't surface the bug,
then come back later and it is easy to produce again, and so on.

* Further diversify runners

* Undiversify runners but vary test parallelism

Since varying `runs-on` didn't surface more errors, nor help smooth
out inconsistencies across runs.

The `*-arm` runners seem to have fewer errors with the particular
way the tests are running now, so this just uses `ubuntu-latest`.

This also adjusts `rep`, having it take on more values, so as to
more than make up for what would otherwise be a smaller number of
jobs.

* Test more values for test parallelism

This creates additional CI `test-ext4-casefold` jobs, for more
values of the `--test-threads` operand passed to `cargo nextest`.

The motivation is that the value of 3 curiously seems never to
surface the failure on CI. Unlike 1, 3 should is expected to
surface the failure comparably to 2 and 4, both of which do fail
more often than not. Yet the jobs with `--test-threads=3` keep
passing.

(This is unlikely to be due to case-sensitivity misdetection, since
the assertions for when the filesystem is case-sensitive would also
fail, if they fire in a job where the filesystem is case-folding.)

If these jobs are kept, then `rep` should probably be adjusted to
have fewer, perhaps 7, values. However, this adjusts `rep` to have
16 values (i.e. twice as many as before), since the higher operands
to `--test-threads` are expected to produce the failure less often.

* Test fewer test parallelism values and reps

- Test `--test-threads` operands of 1 to 4, rather than 1 to 7.
- Do 7 reps instead of 16.

In addition to the 1000x repetition in `checkout.rs` of the
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and
Windows without further modifications -- the `test-ext4-casefold`
jobs as they currenly stand seem like they would ber sufficient to
test whether a possible future solution would fix the bug on
GNU/Linux.

This looks like about the minmum amount we should reasonably verify
still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no
longer failing *any* jobs afterwards. But the `test-ext4-casefold`
jobs are still much more than would be wanted on the main branch.

Assuming the cause of the failures is the same on all systems, and
the solution neither explicitly nor implicitly operates differently
across systems (when comparing how it works when the filesystem is
case-insensitive, that is), it may be enough on CI to regression
test only macOS and Windows for GitoxideLabs#2006.

Thus, this also adds a TODO comment atop the `test-ext4-casefold`
definition reminding that it should not be kept, or not in full.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 19, 2025
This change (from #37), which is intended to be
temporary, sets `thread_limit: Some(1)` in the `opts_from_probe()`
test helper function in the `checkout.rs` test module, so that
tests including
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
will use single-threaded checkout. This applies no matter how the
tests are run.

In addition, for the `test-ext4-casefold` CI jobs (but not others),
this adds the `--no-default-features` option to the `cargo nextest`
command, so that the `gix-features-parallel` feature defined in
`gix-worktree-state/Cargo.toml`, which when enabled causes
`gix-features/parallel` to be enabled for the tests, is not
enabled.

The purpose of these changes it to examine if, when checkout
operations are themselves performed without multithreading, there
is any change to GitoxideLabs#2006.

This does not make the failures go away. They still occur on all
systems on which they had occurred before. It is unclear if it
makes any difference to the failures. It seems like they may be
slightly less likely to occur on GNU/Linux, but experiments so far
are insufficient to confirm this. Furthermore, even if there is an
effect, it may be that the effect is more subtle (such as possibly
shifting the number of parallel tests where failures peak, from 2
to some higher number).

The changes here should not be confused with the duplication of the
test case, nor with the argument to `--test-threads`, which
actually specifies the number of parallel test *processes*, since
`cargo nextest` is being used.

This squashes a few commits trying some minor variations. The
original messages are shown below. For full changes and CI results
from each commit, see:
#37

---

* Try to avoid parallel checkout in `test-ext4-casefold`

This attempts to turn off parallelism within individual runs of
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
by passing `--no-default-features` to turn off the
`gix-features-parallel` feature in `gix-worktree-state/Cargo.toml`
(which is how `gix-features/parallel` is enabled in those tests).

This is to check if the multithreading parallelism within the
checkout operations participates in GitoxideLabs#2006. See:
GitoxideLabs#2006 (comment)

* Patch the `thread_limit` argument to `Some(1)` too

In the `text-ext4-casefold` jobs.

* Modify the test code to use single-threaded checkout

This moves setting `thread_limit: Some(1)` out of being a step done
with `sed` in the `test-ext4-casefold` jobs, and into being an
actual change to the source code of the `checkout.rs` test module.

This change is intended to be temporary. The goal is the same as
before, but to observe the effect outside `text-ext4-casefold`.

That is, the goal here is to temporarily see if there is a change
in results in the other jobs where failure almost always occurs
due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and
`windows-latest`, and the `test-fixtures-windows` job.

Note that, as of this change, the `gix-features/parallel` feature
is only turned off in the `test-ext4-casefold` jobs. In others,
that feature is still turned on, just a parallelism of 1, i.e., a
single thread, is used for the checkout. But a parallelism of 1
seems usually to be special-cased to not use facilities related to
multithreading.

(This `thread_limit` argument change, as well as the immediately
preceding change of disabling the `parallel` feature flag in the
`test-ext4-casefold` test jobs, are entirely separate from how many
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
duplicated test cases there are, and also entirely separate from
the operand to `--test-threads`, which actually controls the number
of test *processes* `cargo nextest` creates to run the tests.)
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 22, 2025
To investigate GitoxideLabs#2006, this explodes the occasionally failing test
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
into 1000 identical tests, using `test_case::test_matrix`.

When run in series, they all pass, but when run in parallel with
`cargo nextest`, some failures always occur, at least when this is
done on macOS. This is as predicted in:
GitoxideLabs#2006 (comment)

A more specific result in local testing on macOS 15 is that, while
the number of failures is 0 with `--test-threads=1`, with 2 or more
threads, the number of failures tends to *decrease* with the number
of threads. The most failures occur with `--test-threads=2`, about
30 on the system tested, while `--test-threads=16` has about 10
failures.

The tests were run with this command, with any `--test-threads=<N>`
argument added:

    cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 22, 2025
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS.

The `fail-fast: false` added here can be removed after testing.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 22, 2025
This (from #36) adds a `test-ext4-casefold` matrix job
definition to `ci.yml`, which is intended probably to be temporary,
to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition
to macOS and Windows), when the GNU/Linux system is using an ext4
volume with a case-insensitive directory tree, i.e., where the ext4
filesystem is created with `-O casefold` and operations are
performed in a directory within that filesystem where `chattr +F`
has been used to enable case-folding path equivalence.

The bug is nondeterministic and challenging to produce on
GNU/Linux, even when the case-folding precondition is satisfied (as
all the `test-ext4-casefold` jobs achieve). It is much harder to
produce on GNU/Linux than on macOS or Windows. Accordingly, the
number of duplicate test cases is increased 25-fold from 1000 to
25000 in those tests, and multiple jobs are created, some of them
equivalent.

As expected, the failures do not occur when the runner does not run
tests in parallel, and they do occur most of the time when the
runner runs tests in parallel with a maximum number of parallel
tests set to 2 or to 4. However, unexpectedly, the failures almost
never occur when the test are run in parallel with a maximum number
of parallel tests set to 3. It is unclear to what extent, if any,
this generalizes, but I have not observed these patterns when
testing locally, including when modifying the local procedure to be
more similar to the behavior here (such as by suppressing reporting
of non-failing tests and suppressing the progress bar). In local
testing, I am unable to produce failures on some GNU/Linux systems,
but on those on which I am able to produce them, they are easier to
produce with all values of the `--test-threads` operand, including
3.

Not all operating systems, versions, and kernel builds and options
support case-folding on ext4. But support is fairly common on
Linux-based systems, and the CI runners have no problem with that.
In contrast, although case-folding tmpfs also exists on Linux, it
is newer and not (yet) supported as widely, and the GitHub-hosted
runners do not support case-folding tmpfs. It is possible to mount
a tmpfs filesystem and create an ext4 image in it so that, in
principle, fewer operations need to access the disk. That is one of
the approaches that was tried, but it does not appear to make it
any easier to surface these failures.

This squashes multiple commits. (Mistakes, and less illuminating
experiments, have already been omitted; the commits being squashes
here are those that are expected to be of possible interest, but
whose details are almost but not quite important enough to justify
the cumbersome effect of having them individually in the history.)
The squashed commits' individual messages are shown below.

For full changes and CI results from each commit, see:
#36

---

* Add a case-folding GNU/Linux CI test job

This can likely be removed later. But it might be kept if it finds
anything interesting, especially if it helps test a bugfix. In that
case, it may make sense to have it run only tests that are of
special interest in connection with case-sensitivity.

* Case-fold `$TMPDIR` as well in CI casefold test

* Run only 2 parallel test processes in test-ext4-casefold

This seemed to make the bug *easier* to reproduce (compared to
larger values) locally, so maybe it will surface it on CI even in
GNU/Linux.

* Increase symlink experiment reps 10x in test-ext4-casefold

From 1000 (0..=999) to 10000 (0..=9999).

* Try even harder to reproduce the failure on GNU/Linux

By repeating `cargo nextext` on the affected test group, up to 20
times.

* Add `ubuntu-24.04-arm` to `test-ext4-casefold` job

In case the failure might somehow be easier to reproduce there.

* Use tmpfs casefold instead of ext4 casefold

* Use tmpfs-backed ext4 as a workaround

For when tmpfs does not suppport `-o casefold` in mounting.

This does require that ext4 support `-O casefold`, but that was
already working before.

Even though that worked before, this could fail when using an ext4
image on tmpfs, because the ext4 image is smaller than it was when
tmpfs was not used, in order to allow it to be created in memory.
While this allows it to be created, the tests may fail if build
artifacts and other files need more space than is available.

* Run only the test(s) of interest

In `test-linux-casefold`, this runs only the tests that are
duplicates of each other, where the failure has been observed,
rather than first running the whole test suite.

The reason is to try to build fewer crates and test executables,
since this is currenntly failing because it runs out of space on
the ext4 image whose size is itself constrained by the amount of
available memory for the tmpfs filesystem on which the image is
created.

* Don't attempt to use case-folding tmpfs

But check if it is reported as available and, if so, issue a GHA
warning that it should be used instead of the current way.

* Try to fail at least as often, with less overhead

This combines a few changes:

- Only report the status of tests that actually fail.
- Run the `cargo nextest` run command once, not 20 times.
- Multiply the number of test cases by another factor of 10.
- Stop after the first failing case.

* Move the ext4 image off tmpfs

This tries going back to not using tmpfs for I/O speedup, with the
idea that the increased complexity and decreased flexibility might
be possible to avoid now that other adjustments are made to surface
the failure more reliably.

* Try to fail more often, with less delay

The `test-ext4-casefold` jobs were still not surfacing the error
quite reliably enough. The wait was also longer than ideal. This
commit makes some changes that hope to improve on both.

- Decrease the number of duplicate test cases by a factor of 4.
  This has the effect that they are increased by 25x compared to
  the multiplier in the commited code, instead of 100x.

- Create 10 times as many CI jobs in the matrix, by adding a `num`
  variable whose values are those in the range 0..=9. This variable
  is not actually used, it just serves to produce more CI jobs.

- Don't use `Swatinem/rust-cache` in these jobs anymore. It is not
  obvious that it will make things work better overall, in terms of
  speedup, cache usage, and cache access, now that we are
  increasing the number of matrix jobs by a factor of 10.

Moving some of the repetition into separate jobs, which may run in
parallel, is not only, nor even primarily, to leverage runner
parallelism to do more test runs faster. Instead, it is because it
looks like some chaotic initial state may contribute to how likely
the failure is to occur. No definitive experiment on CI shows this
to be the case -- but in local testing, on some systems, I often
have runs that fail early over and over again, and then wait a
while, come back, and have many runs that don't surface the bug,
then come back later and it is easy to produce again, and so on.

* Further diversify runners

* Undiversify runners but vary test parallelism

Since varying `runs-on` didn't surface more errors, nor help smooth
out inconsistencies across runs.

The `*-arm` runners seem to have fewer errors with the particular
way the tests are running now, so this just uses `ubuntu-latest`.

This also adjusts `rep`, having it take on more values, so as to
more than make up for what would otherwise be a smaller number of
jobs.

* Test more values for test parallelism

This creates additional CI `test-ext4-casefold` jobs, for more
values of the `--test-threads` operand passed to `cargo nextest`.

The motivation is that the value of 3 curiously seems never to
surface the failure on CI. Unlike 1, 3 should is expected to
surface the failure comparably to 2 and 4, both of which do fail
more often than not. Yet the jobs with `--test-threads=3` keep
passing.

(This is unlikely to be due to case-sensitivity misdetection, since
the assertions for when the filesystem is case-sensitive would also
fail, if they fire in a job where the filesystem is case-folding.)

If these jobs are kept, then `rep` should probably be adjusted to
have fewer, perhaps 7, values. However, this adjusts `rep` to have
16 values (i.e. twice as many as before), since the higher operands
to `--test-threads` are expected to produce the failure less often.

* Test fewer test parallelism values and reps

- Test `--test-threads` operands of 1 to 4, rather than 1 to 7.
- Do 7 reps instead of 16.

In addition to the 1000x repetition in `checkout.rs` of the
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and
Windows without further modifications -- the `test-ext4-casefold`
jobs as they currenly stand seem like they would ber sufficient to
test whether a possible future solution would fix the bug on
GNU/Linux.

This looks like about the minmum amount we should reasonably verify
still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no
longer failing *any* jobs afterwards. But the `test-ext4-casefold`
jobs are still much more than would be wanted on the main branch.

Assuming the cause of the failures is the same on all systems, and
the solution neither explicitly nor implicitly operates differently
across systems (when comparing how it works when the filesystem is
case-insensitive, that is), it may be enough on CI to regression
test only macOS and Windows for GitoxideLabs#2006.

Thus, this also adds a TODO comment atop the `test-ext4-casefold`
definition reminding that it should not be kept, or not in full.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue May 22, 2025
This change (from #37), which is intended to be
temporary, sets `thread_limit: Some(1)` in the `opts_from_probe()`
test helper function in the `checkout.rs` test module, so that
tests including
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
will use single-threaded checkout. This applies no matter how the
tests are run.

In addition, for the `test-ext4-casefold` CI jobs (but not others),
this adds the `--no-default-features` option to the `cargo nextest`
command, so that the `gix-features-parallel` feature defined in
`gix-worktree-state/Cargo.toml`, which when enabled causes
`gix-features/parallel` to be enabled for the tests, is not
enabled.

The purpose of these changes it to examine if, when checkout
operations are themselves performed without multithreading, there
is any change to GitoxideLabs#2006.

This does not make the failures go away. They still occur on all
systems on which they had occurred before. It is unclear if it
makes any difference to the failures. It seems like they may be
slightly less likely to occur on GNU/Linux, but experiments so far
are insufficient to confirm this. Furthermore, even if there is an
effect, it may be that the effect is more subtle (such as possibly
shifting the number of parallel tests where failures peak, from 2
to some higher number).

The changes here should not be confused with the duplication of the
test case, nor with the argument to `--test-threads`, which
actually specifies the number of parallel test *processes*, since
`cargo nextest` is being used.

This squashes a few commits trying some minor variations. The
original messages are shown below. For full changes and CI results
from each commit, see:
#37

---

* Try to avoid parallel checkout in `test-ext4-casefold`

This attempts to turn off parallelism within individual runs of
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
by passing `--no-default-features` to turn off the
`gix-features-parallel` feature in `gix-worktree-state/Cargo.toml`
(which is how `gix-features/parallel` is enabled in those tests).

This is to check if the multithreading parallelism within the
checkout operations participates in GitoxideLabs#2006. See:
GitoxideLabs#2006 (comment)

* Patch the `thread_limit` argument to `Some(1)` too

In the `text-ext4-casefold` jobs.

* Modify the test code to use single-threaded checkout

This moves setting `thread_limit: Some(1)` out of being a step done
with `sed` in the `test-ext4-casefold` jobs, and into being an
actual change to the source code of the `checkout.rs` test module.

This change is intended to be temporary. The goal is the same as
before, but to observe the effect outside `text-ext4-casefold`.

That is, the goal here is to temporarily see if there is a change
in results in the other jobs where failure almost always occurs
due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and
`windows-latest`, and the `test-fixtures-windows` job.

Note that, as of this change, the `gix-features/parallel` feature
is only turned off in the `test-ext4-casefold` jobs. In others,
that feature is still turned on, just a parallelism of 1, i.e., a
single thread, is used for the checkout. But a parallelism of 1
seems usually to be special-cased to not use facilities related to
multithreading.

(This `thread_limit` argument change, as well as the immediately
preceding change of disabling the `parallel` feature flag in the
`test-ext4-casefold` test jobs, are entirely separate from how many
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
duplicated test cases there are, and also entirely separate from
the operand to `--test-threads`, which actually controls the number
of test *processes* `cargo nextest` creates to run the tests.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed
Projects
None yet
Development

No branches or pull requests

2 participants