[test] add end-of-run leak detection to the forked JVM runner#1692
Merged
Conversation
## Problem kyo-test had no automatic check for resources left open at the end of a test run. A leaked fiber, an unclosed file descriptor, or a non-daemon thread started by a test left the forked JVM dirty without any signal to the author. Two real leaks had reached the codebase undetected: - `StreamSubscription.loopPoll` did not check for cancellation between chunks while it held a large outstanding demand (`request(Long.MaxValue)`). After `cancel()` the loop kept pulling and calling `onNext` for the rest of the stream, leaving the consumer fiber running long after the subscriber was gone, and delivered a terminal `onComplete` the Reactive Streams spec forbids after `cancel()`. - Several tests in kyo-ffi, kyo-tasty, and kyo-website called `Files.list(...)` / `Files.createTempDirectory(...)` without closing the returned `DirectoryStream`, leaking a file descriptor per call. ## Solution `LeakCheck` (new, JVM-only, kyo-test/runner/jvm) runs three end-of-run probes once at `SbtRunner.done()`, after every suite in the forked JVM has finished, the one point where the fork is quiescent and holds only this run's resources: - Fiber probe: `Scheduler.get.loadAvg()` catches spinning or repeatedly-rescheduling fibers. - Thread probe: `Thread.getAllStackTraces` diffed against a baseline captured at runner construction catches raw `Thread` starts and un-shutdown executors. sbt's own `ForkMain` pool threads are registered via `LeakCheck.registerCarrierThread()` in `SbtTask.execute()` and excluded. - Descriptor probe: `/proc/self/fd` symlinks diffed against the construction-time baseline catch unclosed files, pipes, and sockets. Classpath jars, native libraries, and JVM-internal anonymous inodes are excluded. Linux-only; a no-op on macOS and Windows. `RunConfig` gains `leakCheck` (master), `leakCheckFileDescriptors`, `leakCheckThreads`, `leakCheckFibers`, and `leakCheckSockets` (all default `true`), plus an additive `leakCheckAllowlist`. `SuiteReport` carries the same fields so each suite's effective config survives serialization to the aggregation step; `SbtRunner` OR-aggregates them across the fork's suites, so a category runs if any suite enabled it. `TestRunner.constructorFailureReport` threads the resolved config into the report, so a suite whose constructor throws reports its real leak config rather than the field defaults. `StreamSubscription.loopPoll` now checks `requestChannel.closed()` at the top of each iteration and raises `Abort.fail(StreamCanceled)` on cancel, stopping emission immediately and suppressing the post-cancel `onComplete`. Tests in kyo-ffi, kyo-tasty, and kyo-website are migrated from `java.nio.file` to `kyo.Path`, which closes the underlying `DirectoryStream` in a `try/finally`. ## Notes - `leakCheck` is a no-op outside a forked test JVM; `LeakCheck.isForked` detects the fork via `sun.java.command` and skips the check in the main sbt JVM, whose baseline would include sbt's own resources. - kyo-http's process-global `NioIoDriver` scheduler fiber is in `LeakCheck.defaultAllowlist`, excused fork-wide until the network stack rewrite gives the driver a proper lifecycle. - The `sbt.ForkMain` pool threads are identified structurally by registering the carrier thread at the top of each `SbtTask.execute()`, not by a name pattern, so no test thread pool name can collide with the exclusion.
## Problem `HttpServer` closed only its listening socket on shutdown. Accepted keep-alive connections (the kind a long-lived `HttpClient` keeps pooled) were left open until a 60-second idle timer fired. A suite that holds a static `HttpClient` accumulated open sockets across iterations because the client kept the peer side pooled. `HttpClientBackend` tracked the connections it created in a hand-rolled map with the same close-on-shutdown intent but a separate implementation. ## Solution A shared `ConnectionRegistry` (new, kyo-http internal) gives client and server one track-all/close-all mechanism. `HttpServer` registers each accepted connection and closes the lot in `closeFiber`; `HttpClientBackend` registers each created connection, replacing its `allConnections` map with the same registry. `register` and `closeAll` both claim a connection by removing it from the backing set, and only the caller that wins the remove closes it, so a connection is closed exactly once even when a registration races a shutdown. `register` closes the connection itself when it observes a shutdown already in progress, so a handler interrupted right after registering never leaks its connection and `closeAll` never drops a still-open one. Double-close on the rare race is benign because `Connection.close()` is idempotent. ## Notes - This closes the connections kyo-http owns. The NIO transport separately defers a closed channel's real fd close to its idle selector's next `select()`, which nothing wakes; releasing that fd promptly is the transport's job (frozen for the kyo-net rewrite), which is why suites that hold a connection on it still opt out of socket leak detection.
…eak checks ## Problem A few suites hold a resource that the NIO transport or an external process keeps open past the end of the run. The transport defers a closed channel's real fd close to its idle selector's next `select()`, which nothing wakes, so a server listener or a pooled client connection outlives the run; a shared headless Chrome process holds its CDP socket and stdio pipes for the whole run. Each is an opaque `socket:[inode]` / `pipe:[inode]` target no allowlist substring can match. ## Solution Each affected suite disables the narrowest leak category, never the master toggle: - `BaseHttpTest`, `BaseCalibanTest`, `BasePodTest`, and `JsonRpcHttpTransportTest` disable `leakCheckSockets`: a server listener or client connection on the NIO transport whose fd close it defers. - `BaseBrowserTest` and kyo-ui's `UITest` disable `leakCheckSockets` and `leakCheckFileDescriptors`: the shared Chrome process's CDP socket and stdio pipes. Thread and fiber detection stay on for all of them, and every other module and category, sockets included, stays checked. ## Notes - These opt-outs track external or transport-owned resources, not test bugs. The kyo-net transport rewrite, which releases a channel's fd promptly on close, is the fix path that lets the socket opt-outs be removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
kyo-test had no automatic check for resources left open at the end of a test run. A leaked fiber, an unclosed file descriptor, or a non-daemon thread started by a test left the forked JVM dirty without any signal to the author. Two real leaks had reached the codebase undetected:
StreamSubscription.loopPolldid not check for cancellation between chunks while it held a large outstanding demand (request(Long.MaxValue)). Aftercancel()the loop kept pulling and callingonNextfor the rest of the stream, leaving the consumer fiber running long after the subscriber was gone, and delivered a terminalonCompletethe Reactive Streams spec forbids aftercancel().Files.list(...)/Files.createTempDirectory(...)without closing the returnedDirectoryStream, leaking a file descriptor per call.Solution
LeakCheck(new, JVM-only, kyo-test/runner/jvm) runs three end-of-run probes once atSbtRunner.done(), after every suite in the forked JVM has finished, the one point where the fork is quiescent and holds only this run's resources:Scheduler.get.loadAvg()catches spinning or repeatedly-rescheduling fibers.Thread.getAllStackTracesdiffed against a baseline captured at runner construction catches rawThreadstarts and un-shutdown executors. sbt's ownForkMainpool threads are registered viaLeakCheck.registerCarrierThread()inSbtTask.execute()and excluded./proc/self/fdsymlinks diffed against the construction-time baseline catch unclosed files, pipes, and sockets. Classpath jars, native libraries, and JVM-internal anonymous inodes are excluded. Linux-only; a no-op on macOS and Windows.RunConfiggainsleakCheck(master),leakCheckFileDescriptors,leakCheckThreads,leakCheckFibers, andleakCheckSockets(all defaulttrue), plus an additiveleakCheckAllowlist.SuiteReportcarries the same fields so each suite's effective config survives serialization to the aggregation step;SbtRunnerOR-aggregates them across the fork's suites, so a category runs if any suite enabled it.TestRunner.constructorFailureReportthreads the resolved config into the report, so a suite whose constructor throws reports its real leak config rather than the field defaults.StreamSubscription.loopPollnow checksrequestChannel.closed()at the top of each iteration and raisesAbort.fail(StreamCanceled)on cancel, stopping emission immediately and suppressing the post-cancelonComplete.Tests in kyo-ffi, kyo-tasty, and kyo-website are migrated from
java.nio.filetokyo.Path, which closes the underlyingDirectoryStreamin atry/finally.Notes
leakCheckis a no-op outside a forked test JVM;LeakCheck.isForkeddetects the fork viasun.java.commandand skips the check in the main sbt JVM, whose baseline would include sbt's own resources.NioIoDriverscheduler fiber is inLeakCheck.defaultAllowlist, excused fork-wide until the network stack rewrite gives the driver a proper lifecycle.sbt.ForkMainpool threads are identified structurally by registering the carrier thread at the top of eachSbtTask.execute(), not by a name pattern, so no test thread pool name can collide with the exclusion.