forked from stpaine/FERS
-
Notifications
You must be signed in to change notification settings - Fork 1
Parallel Window Rendering (HDF5)
David Young edited this page Apr 30, 2025
·
1 revision
This feature provides finer-grained parallelism within the HDF5 output generation process for a single receiver. When rendering a specific time window of IQ data for the HDF5 file (renderWindow function), if certain heuristic criteria are met (e.g., a sufficient number of signal contributions or 'responses' overlap within that window), the processing of these individual response contributions (processResponse) is parallelized. Tasks, each handling one or more responses, are enqueued onto the same shared thread pool used for parallel receiver rendering. The results from these parallel tasks are then collected and combined to form the final IQ data for that window segment.
-
Thread-Safe Reads: Assumes that the functions called within the parallelized task (
Response::renderBinaryand its subsequent calls likeRadarSignal::render,Signal::render) are inherently thread-safe concerning read operations. This relies on the critical assumption that the underlying data structures they access (e.g.,_wave,_signal,_power,_size,_rate,Signal::_data, and especially shared tables likeInterpFilter::_filter_table) are effectively immutable (read-only) during this parallel processing phase. -
Filter Table Initialization: Specifically assumes the
InterpFilter::_filter_tableis fully initialized before any parallel tasks that might use it begin execution and is not modified thereafter during the parallel phase. - Heuristic Accuracy: Assumes the heuristic check (based on the number of responses and available threads) correctly identifies situations where parallel processing is genuinely beneficial and safe to activate, avoiding excessive overhead or potential issues like pool exhaustion.
- Nested Parallelism Risk: This feature uses the same global thread pool that might already be busy rendering other receivers in parallel. This creates nested parallelism (tasks submitting sub-tasks to the same pool). While mitigated by a heuristic check, this can potentially lead to thread pool exhaustion (if all threads become busy waiting for sub-tasks) or complex scheduling interactions, potentially even deadlocks in intricate scenarios if not carefully managed.
-
Mutex Contention: When combining the results calculated by the parallel tasks for a window, a mutex (
window_mutex) is used to protect the shared output buffer (local_window). For time windows containing a very large number of overlapping signal responses processed in parallel, contention for this single mutex can become a bottleneck, limiting the scalability and performance benefits of the parallel approach.
-
receiver_export.cpp::renderWindow(Function orchestrating window rendering, decides whether to use parallel processing) - Worker Lambda function within
renderWindow(Defines the actual task submitted to the pool) -
processResponsefunction (Called by the worker lambda to render a specific response's contribution) -
Response::renderBinary,RadarSignal::render,Signal::render(Core functions performing the signal computation for a response) -
pool::ThreadPool(The shared thread pool executing the tasks) - Synchronization Primitives:
work_list_mutex(likely managing the list of responses),window_mutex(protecting the shared output buffer),std::future(used to wait for task completion).
-
Needs Verification: The core assumption of thread-safe reads and data immutability within the
renderBinarycall chain needs thorough confirmation through code review or testing. The effectiveness and safety margins of the activation heuristic need assessment. The practical impact of nested parallelism and mutex contention under various loads needs investigation. -
Key Areas for Validation:
- Verify the numerical correctness and bit-for-bit equivalence of the final HDF5 window data generated using the parallel method compared to a serial execution under various scenarios (few responses, many responses).
- Test for race conditions, particularly around shared resources or potentially mutable state accessed during
renderBinarycalls. - Measure performance scaling: How does rendering time for a complex window change as the number of threads increases? Identify the point where mutex contention or other overheads dominate.
- Stress test scenarios involving both parallel receiver rendering and parallel window rendering simultaneously to check for deadlocks or excessive performance degradation due to nested parallelism and pool usage.
- Priority: Medium to High (due to the complexity of nested parallelism and potential for subtle concurrency bugs).