-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
gh-47798: Add a subprocess.run_pipeline() API
#142080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
gpshead
wants to merge
12
commits into
python:main
Choose a base branch
from
gpshead:claude/subprocess-pipe-chaining-01R27VPueru4RfRXYDsV5TmW
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
gh-47798: Add a subprocess.run_pipeline() API
#142080
gpshead
wants to merge
12
commits into
python:main
from
gpshead:claude/subprocess-pipe-chaining-01R27VPueru4RfRXYDsV5TmW
+1,187
−51
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add a new run_pipeline() function to the subprocess module that enables
running multiple commands connected via pipes, similar to shell pipelines.
New API:
- run_pipeline(*commands, ...) - Run a pipeline of commands
- PipelineResult - Return type with commands, returncodes, stdout, stderr
- PipelineError - Raised when check=True and any command fails
Features:
- Supports arbitrary number of commands (minimum 2)
- capture_output, input, timeout, and check parameters like run()
- stdin= connects to first process, stdout= connects to last process
- Text mode support via text=True, encoding, errors
- All processes share a single stderr pipe for simplicity
- "pipefail" semantics: check=True fails if any command fails
Unlike run(), this function does not accept universal_newlines.
Use text=True instead.
Example:
result = subprocess.run_pipeline(
['cat', 'file.txt'],
['grep', 'pattern'],
['wc', '-l'],
capture_output=True, text=True
)
Co-authored-by: Claude Opus 4.5 <[email protected]>
Document the new run_pipeline() function, PipelineResult class, and PipelineError exception in the subprocess module documentation. Includes: - Function signature with stdin, stdout, stderr, capture_output, etc. - Note about shared stderr pipe and text mode caveat for interleaved multi-byte character sequences - Note that universal_newlines is not supported (use text=True) - Explanation that stdin connects to first process, stdout to last - Usage examples showing basic pipelines, multi-command pipelines, input handling, and error handling with check=True - PipelineResult attributes: commands, returncodes, returncode, stdout, stderr, and check_returncodes() method - PipelineError attributes: commands, returncodes, stdout, stderr, and failed list Co-authored-by: Claude Opus 4.5 <[email protected]>
Add _communicate_streams() helper function that properly multiplexes read/write operations to prevent pipe buffer deadlocks. The helper uses selectors on POSIX and threads on Windows, similar to Popen.communicate(). This fixes potential deadlocks when large amounts of data flow through the pipeline and significantly improves performance. Co-authored-by: Claude Opus 4.5 <[email protected]>
Add three tests that verify the multiplexed I/O implementation properly handles large data volumes that would otherwise cause pipe buffer deadlocks: - test_pipeline_large_data_no_deadlock: 256KB through 2-stage pipeline - test_pipeline_large_data_three_stages: 128KB through 3-stage pipeline - test_pipeline_large_data_with_stderr: 64KB with concurrent stderr These tests would timeout or deadlock without proper multiplexing. Co-authored-by: Claude Opus 4.5 <[email protected]>
Remove support for raw file descriptors in _communicate_streams(), requiring all streams to be file objects. This simplifies both the Windows and POSIX implementations by removing isinstance() checks and fd-wrapping logic. The run_pipeline() function now wraps the stderr pipe's read end with os.fdopen() immediately after creation. This change makes _communicate_streams() more compatible with Popen.communicate() which already uses file objects, enabling potential future refactoring to share the multiplexed I/O logic. Co-authored-by: Claude Opus 4.5 <[email protected]>
Update the test to write 64KB to stderr from each process (128KB total) instead of just small status messages. This better tests that the multiplexed I/O handles concurrent large data on both stdout and stderr without deadlocking. Co-authored-by: Claude Opus 4.5 <[email protected]>
The comment suggested rewriting Popen._communicate() to use non-blocking I/O on file objects now that Python 3's io module is used instead of C stdio. This is unnecessary - the current approach using select() to detect ready fds followed by os.read()/os.write() is correct and efficient. The selector already solves "when is data ready?" so non-blocking mode would add complexity with no benefit. Co-authored-by: Claude Opus 4.5 <[email protected]>
Extract the core selector-based I/O loop into a new _communicate_io_posix() function that is shared by both _communicate_streams_posix() (used by run_pipeline) and Popen._communicate() (used by Popen.communicate). The new function: - Takes a pre-configured selector and output buffers - Supports resume via input_offset parameter (for Popen timeout retry) - Returns (new_offset, completed) instead of raising TimeoutExpired - Does not close streams (caller decides based on use case) This reduces code duplication and ensures both APIs use the same well-tested I/O multiplexing logic. Co-authored-by: Claude Opus 4.5 <[email protected]>
Move stdin writing to a background thread in _communicate_streams_windows to avoid blocking indefinitely when writing large input to a pipeline where the subprocess doesn't consume stdin quickly. This mirrors the fix made to Popen._communicate() for Windows in commit 5b1862b (pythongh-87512). Add test_pipeline_timeout_large_input to verify that TimeoutExpired is raised promptly when run_pipeline() is called with large input and a timeout, even when the first process is slow to consume stdin. Co-authored-by: Claude Opus 4.5 <[email protected]>
Apply the same fixes from Popen._communicate() to _communicate_streams_posix for run_pipeline(): 1. Handle non-byte memoryview input by casting to byte view (pythongh-134453): Non-byte memoryviews (e.g., int32 arrays) had incorrect length tracking because len() returns element count, not byte count. Now cast to "b" view for correct progress tracking. 2. Handle ValueError on stdin.flush() when stdin is closed (pythongh-74389): Ignore ValueError from flush() if stdin is already closed, matching the BrokenPipeError handling. Add tests for memoryview input to run_pipeline: - test_pipeline_memoryview_input: basic byte memoryview - test_pipeline_memoryview_input_nonbyte: int32 array memoryview Co-authored-by: Claude Opus 4.5 <[email protected]>
Extract common stdin preparation logic into shared helper functions used by both _communicate_streams_posix() and Popen._communicate(): - _flush_stdin(stdin): Flush stdin, ignoring BrokenPipeError and ValueError (for closed files) - _make_input_view(input_data): Convert input data to a byte memoryview, handling non-byte memoryviews by casting to "b" view This ensures consistent behavior and makes the fixes for pythongh-134453 (memoryview) and pythongh-74389 (closed stdin) shared in one place. Co-authored-by: Claude Opus 4.5 <[email protected]>
- Factor out _translate_newlines() as a module-level function, have Popen's method delegate to it for code sharing - Remove rejection of universal_newlines kwarg in run_pipeline(), treat it the same as text=True (consistent with Popen behavior) - Use _translate_newlines() for text mode decoding in run_pipeline() to properly handle \r\n and \r newline sequences - Update documentation to remove mention of universal_newlines rejection - Update test to verify universal_newlines=True works like text=True Co-authored-by: Claude <[email protected]>
merwok
reviewed
Nov 29, 2025
Member
|
This raises the issue of using chatbots to contribute PRs, and the unclear copyright/licensing status of their training data. [edit: plus the need for extra careful reviews given the hallucinations] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
stdlib
Standard Library Python modules in the Lib/ directory
topic-subprocess
Subprocess issues.
type-feature
A feature request or enhancement
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This was a feature request from 2008. I'm undecided if we should add this feature, but it proved a good exercise and had me review a lot of existing subprocess.py code and notice a variety of other lingering subprocess issues while doing so (thus the my recent spate of smaller subprocess PRs merged today).
Summary
Read the docs in the PR for details, but it basically mirrors the
run()API, just with multiple commands:Open questions
This matches the signaling behavior of
run()upon timeout, namely SIGKILL. But to each process. Q: Is that rude? should we start in a particular order so that their own SIGPIPE's propagate and wait a bit before we signal each? "what does a shell do" upon Ctrl-C is probably a good way to answer that. We've never allowed people to specify the specific signal used upon timeout inrun(), we could... but it'd always need to be in a "use this first, wait a bit, then resort to SIGKILL" sense to be reliable so it'd probably turn into two parameters and I doubt it would see a lot of use. I consider changing the signal to be a separate followon feature requestThe
PipelineResultandPipelineErrortypes are somewhat similar toCompletedProcessandCalledProcessError, but also different. Should they be the same, or be subclasses? Claude preferred not, as the obvious difference is the confusion around the singular attributes and what those even mean, particularly for the error when a pipeline failed its check=True saving throw. I can still imagine subclassingCalledProcessErroris handy to avoid needing to modify code though as I doubt people catch the genericSubprocessErrorbase often vs justexcept CalledProcessError:. I do not think it makes sense for the result.Naming things -
CompletedPipelinewould be more consistent thanPipelineResult? (I'm leaning in favor ofCompletedPipeline)Alternatives ideas
I was pondering the use of the
|pipe operator itself between objects. But this is unnatural and undesirable for Popen instances themselves as those start upon creation. Even though I could imagine allowing that to rewire their file descriptors. It'd get gross and raise questions around unclear management of the mess. You want processes started sequentially with the actual stdout->stdin chain of connections made from the start, so a run-like API makes sense to me.This lets people avoid using a shell.
It does not offer the same flexibility a raw Popen instance does though for people who need to do their own IO multiplexing. Though given you can provide whatever file object you want for input and output, that could still be done using this by having your own threads feed or consume those instead of relying on capture_output.
What PyPI subprocess pipe options exist?
I found two PyPI packages offering something resembling assembling pipelines of subprocesses:
Written entirely between my looking at subprocess sources, and driving a remote Claude Code for the web session and telling it what to do next. With the aid of gpshead/cpython-skills.
Co-authored-by: Claude Opus 4.5
📚 Documentation preview 📚: https://cpython-previews--142080.org.readthedocs.build/