Skip to content

Conversation

@UdjinM6
Copy link

@UdjinM6 UdjinM6 commented Nov 15, 2025

Issue being fixed or feature implemented

CI started to fail with System.IO.IOException: No space left on device for otherwise successful jobs recently. Example: https://github.com/dashpay/dash/actions/runs/19371409140/job/55429155189. Unfortunately, #6978 wasn't enough to fix the issue https://github.com/dashpay/dash/actions/runs/19380753498/job/55475384865?pr=6978#step:6:4873.

What was done?

Remove blocks folder from test datadir to free up CI disk space.

How Has This Been Tested?

https://github.com/UdjinM6/dash/actions/runs/19390408530 + see CI for this PR.

Breaking Changes

n/a

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@UdjinM6 UdjinM6 added this to the 23.1 milestone Nov 15, 2025
@github-actions
Copy link

github-actions bot commented Nov 15, 2025

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

@UdjinM6 UdjinM6 marked this pull request as ready for review November 15, 2025 20:54
@coderabbitai
Copy link

coderabbitai bot commented Nov 15, 2025

Walkthrough

A new boolean ci parameter (default False) was added to the public run_tests signature and is threaded from main() (passed as ci=args.ci). When ci is True, after a test passes the runner iterates per-test block directories under the test data directory (node0, node1, ...) and removes them while skipping symlinks; permission errors are logged and tolerated. The CI cleanup runs only when ci is enabled.

Sequence Diagram

sequenceDiagram
    %% Styling: subtle colored blocks for clarity
    rect rgba(200,230,255,0.25)
    participant Main as main()
    participant TestRunner as run_tests(ci)
    end

    participant TestExec as Test Execution
    participant Cleanup as CI Cleanup

    Main->>TestRunner: run_tests(..., ci=args.ci)
    TestRunner->>TestExec: execute test
    TestExec-->>TestRunner: test passes

    alt ci == True
        rect rgba(220,255,220,0.25)
        TestRunner->>Cleanup: iterate node0, node1, ...
        Cleanup->>Cleanup: skip symlinks
        Cleanup->>Cleanup: remove block directories
        Cleanup-->>Cleanup: log & ignore PermissionError
        Cleanup-->>TestRunner: cleanup done
        end
    end

    TestRunner-->>Main: return result
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify ci added to run_tests signature and passed from main().
  • Inspect directory iteration and symlink detection logic used for node0/node1 removal.
  • Confirm permission-error handling and logging behavior are correct and gated by ci.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: removing blocks folder from test datadir in CI to free up disk space, which matches the PR's core objective.
Description check ✅ Passed The description clearly explains the problem (CI disk space failures), the solution (remove blocks folder), and references testing, all of which align with the changeset modifications.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide custom instructions to shape the summary (bullet lists, tables, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example:

"Create a concise high-level summary as a bullet-point list. Then include a Markdown table showing lines added and removed by each contributing author."


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
test/functional/test_runner.py (1)

626-638: Consider improving the inline comment for clarity.

The cleanup logic is correct and handles edge cases appropriately. However, the comment at line 633 could be more descriptive.

Apply this diff to clarify why symlinks are skipped:

                         if os.path.islink(blocksdir):
-                            # do nothing
+                            # Skip symlinks to avoid breaking custom test setups
                             continue
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between caf5010 and 5e40fe8.

📒 Files selected for processing (1)
  • test/functional/test_runner.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: linux64_multiprocess-test / Test source
🔇 Additional comments (3)
test/functional/test_runner.py (3)

19-19: LGTM!

The itertools import is correctly placed and necessary for the cleanup logic at line 628.


553-553: LGTM!

The ci parameter is properly added as a keyword-only argument with a safe default value of False, ensuring backward compatibility.


550-550: LGTM!

The ci parameter is correctly propagated from the command-line argument to the run_tests function.

@UdjinM6 UdjinM6 marked this pull request as draft November 16, 2025 09:33
@UdjinM6 UdjinM6 marked this pull request as ready for review November 16, 2025 10:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
test/functional/test_runner.py (1)

626-638: Reorder the symlink and directory checks to handle broken symlinks correctly.

The current check order can cause the loop to exit prematurely if node0/regtest/blocks is a broken symlink. Since os.path.isdir() returns False for broken symlinks, Line 630 breaks before Line 632 can verify whether it's a symlink, potentially leaving blocks from node1, node2, etc. uncleaned.

Apply this diff to check symlinks before directories:

             # Remove blocks folder from test datadir to free up CI disk space
             if ci and os.path.isdir(testdir):
                 for i in itertools.count():
                     blocksdir = f"{testdir}/node{i}/regtest/blocks"
-                    if not os.path.isdir(blocksdir):
-                        break
                     if os.path.islink(blocksdir):
                         # Skip symlinks to avoid breaking custom test setups
                         continue
+                    if not os.path.isdir(blocksdir):
+                        break
                     try:
                         shutil.rmtree(blocksdir)
                     except (OSError, PermissionError) as e:
                         logging.debug(f"Failed to remove {blocksdir}: {e}")

Overall approach looks good!

The cleanup logic effectively addresses the CI disk space issue by removing block data only after tests pass, preserving debugging data for failures. Error handling is appropriate.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e40fe8 and 8836593.

📒 Files selected for processing (1)
  • test/functional/test_runner.py (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: UdjinM6
Repo: dashpay/dash PR: 6786
File: ci/test/04_install.sh:99-101
Timestamp: 2025-08-01T07:46:37.840Z
Learning: In backport PRs like #6786, UdjinM6 prefers to defer non-critical fixes (such as shell command expansion issues) to separate commits/PRs to maintain focus on the primary backport objectives, consistent with the project's pattern of avoiding scope creep.
Learnt from: knst
Repo: dashpay/dash PR: 6692
File: src/llmq/blockprocessor.cpp:217-224
Timestamp: 2025-08-19T14:57:31.801Z
Learning: In PR #6692, knst acknowledged a null pointer dereference issue in ProcessBlock() method where LookupBlockIndex may return nullptr but is passed to gsl::not_null, and created follow-up PR #6789 to address it, consistent with avoiding scope creep in performance-focused PRs.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: linux64_multiprocess-build / Build source
  • GitHub Check: win64-build / Build source
  • GitHub Check: arm-linux-build / Build source
  • GitHub Check: mac-build / Build source
  • GitHub Check: linux64_tsan-build / Build source
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: linux64_ubsan-build / Build source
  • GitHub Check: linux64-build / Build source
  • GitHub Check: linux64_fuzz-build / Build source
  • GitHub Check: linux64_nowallet-build / Build source
🔇 Additional comments (4)
test/functional/test_runner.py (4)

19-19: LGTM!

The itertools import is used appropriately for the infinite counter in the CI cleanup logic.


426-426: LGTM!

The --ci argument addition is clear and follows the existing argument patterns.


550-550: LGTM!

Parameter propagation is correct.


553-553: LGTM!

The signature update maintains backward compatibility with the ci=False default.

@knst
Copy link
Collaborator

knst commented Nov 16, 2025

System.IO.IOException: No space left on device

where exactly do you see this failure? I can't find it anywhere on CI

done_str = f"{len(test_results)}/{test_count} - {BOLD[1]}{test_result.name}{BOLD[0]}"
if test_result.status == "Passed":
logging.debug("%s passed, Duration: %s s" % (done_str, test_result.time))
# Remove blocks folder from test datadir to free up CI disk space
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why that doesn't happen with bitcoin? they have no cleanup mechanism on CI

Btw, I just noticed that all these blocks are not sparsed files, but regular files fullfilled zeroes.
So, they are compressed well when CI is done; but during a run they are eating disk space.

/** The pre-allocation chunk size for blk?????.dat files (since 0.8) */                              
static const unsigned int BLOCKFILE_CHUNK_SIZE = 0x1000000; // 16 MiB                                

3 nodes for each test - 48MiB used; 300 tests -> 10+ Gb of zeroes (probably even more)

I guess; this fix is going to work anyway; but is it a bug or expected behaviour that blk files are not sparsed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ test/functional/rpc_help.py
<succeed>
$ find /tmp/dash_func_test_5coohz_8 -ls | grep blk 
   436673  16384 -rw-------   1 knst     knst     16777216 Nov 17 02:08 /tmp/dash_func_test_5coohz_8/node0/regtest/blocks/blk00000.dat
$ du -h /tmp/dash_func_test_5coohz_8/node0/regtest/blocks/blk00000.dat
16M     /tmp/dash_func_test_5coohz_8/node0/regtest/blocks/blk00000.dat

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/src/node/blockstorage.h b/src/node/blockstorage.h
index b627c26162..46e253b8aa 100644
--- a/src/node/blockstorage.h
+++ b/src/node/blockstorage.h
@@ -39,7 +39,7 @@ static constexpr bool DEFAULT_STOPAFTERBLOCKIMPORT{false};
 static constexpr bool DEFAULT_TIMESTAMPINDEX{false};
 
 /** The pre-allocation chunk size for blk?????.dat files (since 0.8) */
-static const unsigned int BLOCKFILE_CHUNK_SIZE = 0x1000000; // 16 MiB
+static const unsigned int BLOCKFILE_CHUNK_SIZE = 0x400000; // 4 MiB

What are possible downsides of this change?
I tried to run functional tests locally and it seems as IO become a smaller limiting factor; because funcitonal tests running as 30 parallel jobs (-j30) speeded up from 195s to just 180s

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... there is also an option -fastprune Use smaller block files and lower minimum prune height for testing purposes, so block file chunks are just 16kb each.

Copy link
Collaborator

@knst knst Nov 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pruning probably will break many functional tests, including governance's related; blockfilter, etc

But we could one more param to use here to use small blocks, something like :

    return FlatFileSeq(gArgs.GetBlocksDirPath(), "blk", 
        gArgs.GetBoolArg("-tinyblk", false) ? 0x10000 (64kB)
        : (gArgs.GetBoolArg("-fastprune", false) ? 0x4000 /* 16kb */ : BLOCKFILE_CHUNK_SIZE));

Copy link
Collaborator

@knst knst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 8836593

(whatever I commented, CI is better alive than dead)

@UdjinM6
Copy link
Author

UdjinM6 commented Nov 16, 2025

System.IO.IOException: No space left on device

where exactly do you see this failure? I can't find it anywhere on CI

job results
https://github.com/dashpay/dash/actions/runs/19371409140/job/55429155189
Screenshot 2025-11-16 at 22 29 04

summary
https://github.com/dashpay/dash/actions/runs/19371409140
Screenshot 2025-11-16 at 22 30 34

why that doesn't happen with bitcoin? they have no cleanup mechanism on CI

I think there are few reasons why this happens to us - we have additional tests with may nodes (llmq), we also generate more logs.

@UdjinM6 UdjinM6 marked this pull request as draft November 16, 2025 22:09
@UdjinM6 UdjinM6 removed this from the 23.1 milestone Nov 16, 2025
@UdjinM6
Copy link
Author

UdjinM6 commented Nov 16, 2025

closing in fav of #6986

@UdjinM6 UdjinM6 closed this Nov 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants