Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
9a3fb68
feat: add pathological-porcupines network test suite
acooks Dec 16, 2025
172d7f5
feat(tests): add automated screenshot capture to pathological-porcupines
acooks Dec 17, 2025
33d67e9
refactor(tests): clean up screenshot-controller debugging code
acooks Dec 17, 2025
8208dda
fix(tests): disable IPv6 on test interfaces to reduce noise
acooks Dec 17, 2025
1ecafb2
refactor(tests): use standard RTP port 5004 for RTP tests
acooks Dec 17, 2025
553e0dd
feat(tests): improve receiver-starvation test with self-checks
acooks Dec 17, 2025
08f3bc2
docs(tests): update READMEs with RFC 9293 references
acooks Dec 17, 2025
85103c2
docs(tests): document JitterTrap limitations discovered during testing
acooks Dec 17, 2025
9dcb55f
feat(infra): add pcap capture and robust process cleanup to run-test.sh
acooks Dec 19, 2025
e36889e
fix(pathological-porcupines): improve test process cleanup
acooks Dec 21, 2025
58925ac
fix(pathological-porcupines): disable TSO/GSO for accurate pcap capture
acooks Dec 21, 2025
9680c93
fix(screenshot): preserve X11/Wayland env for browser access
acooks Dec 27, 2025
8d937c1
feat(screenshot): add IPG chart and flow details capture
acooks Dec 27, 2025
dd7f45c
tune(receiver-starvation): slower receiver for clearer zero-window
acooks Dec 27, 2025
60b23e7
feat(tests): add sender-stall demonstration
acooks Dec 27, 2025
4634213
refactor: move verification tests to tests/ directory
acooks Dec 27, 2025
c9cd6c4
docs: update README paths for tests/ restructure
acooks Dec 27, 2025
bd07d04
feat(research): add tcp-flow-control investigation framework
acooks Dec 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,8 @@ server/test-mq
server/test-mq-mt
server/*.o
deps/toptalk/*.o

# Pathological porcupines test suite
pathological-porcupines/infra/node_modules/
pathological-porcupines/infra/package-lock.json
pathological-porcupines/screenshots/
2 changes: 2 additions & 0 deletions pathological-porcupines/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
__pycache__/
*.pyc
112 changes: 112 additions & 0 deletions pathological-porcupines/JITTERTRAP-ISSUES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# JitterTrap Issues Found During Testing

Issues discovered while running pathological-porcupines tests.

## TCP Zero-Window Detection

**Test:** `tcp-timing/persist-timer`
**Date:** 2024-12-16

### Issue 1: Zero Window indicator not appearing

**Observed:** The "Zero Window" triangle marker (shown in the TCP Advertised Window chart legend) does not appear during the zero-window condition, even though the window clearly drops to zero.

**Expected:** Zero Window indicator should mark the point(s) where zero-window was advertised.

**Status:** Needs investigation

---

### Issue 2: TCP Window size doesn't visually recover

**Observed:** After the zero-window stall ends and traffic resumes (visible in throughput chart), the TCP Advertised Window chart remains at/near zero instead of showing the window reopening.

**Expected:** Window should visually recover to a non-zero value when the receiver resumes reading and advertises available buffer space.

**Possible causes:**
- Display artifact (log scale making small values look like zero)
- Test ends too quickly after recovery
- Receiver buffer remains nearly full
- Chart not updating correctly after extended zero-window period

**Status:** Needs investigation

---

### Issue 3: Histogram measurement window unclear

**Observed:** The IPG and PPS distribution histograms show a more concentrated distribution than expected for a flow with a multi-second interruption. It's unclear what time window the histograms cover.

**Expected:** Clear documentation or UI indication of:
- What time window the histograms cover (full flow lifetime vs sliding window)
- How gaps with no packets are handled in IPG calculation
- Whether the histogram resets or accumulates over time

**Impact:** Makes it difficult to validate that the tool is correctly measuring flow characteristics during pathological conditions.

**Status:** Needs investigation

---

## RST-Terminated Flows Missing from TCP Charts

**Test:** `tcp-lifecycle/rst-storm`
**Date:** 2024-12-16

### Issue: TCP RTT and Window charts empty for RST flows

**Observed:** RST-terminated connections appear in the Top Flows list but the TCP Round-Trip Time and TCP Advertised Window charts show no data points.

**Expected:** Some indication of RST flows in TCP charts, or at minimum RST markers.

**Cause:** RST connections are terminated immediately after accept (SO_LINGER 0), before:
- Any data exchange occurs (no RTT samples)
- Window advertisements are captured
- Meaningful TCP state is established

**Impact:** RST storms are only observable via:
- Top Flows table (shows many short-lived flows)
- Throughput chart (brief bursts)
- Flow count

**Workaround:** Use tcpdump to observe RST packets directly:
```bash
sudo ip netns exec pp-observer tcpdump -i br0 'tcp[tcpflags] & tcp-rst != 0'
```

**Status:** Known limitation - RST flows too short-lived for TCP metrics

---

## RTP Detection Requires Standard Ports

**Test:** `rtp/rtp-sequence-gap`, `rtp/rtp-jitter-spike`
**Date:** 2024-12-16

### Issue: RTP flows shown as generic UDP

**Observed:** RTP test traffic on port 9999 was classified as "UDP" instead of "RTP" in JitterTrap flow table. No RTP-specific metrics (sequence tracking, jitter calculation) were applied.

**Expected:** JitterTrap should detect RTP traffic and show RTP-specific metrics.

**Cause:** JitterTrap likely uses port-based heuristics to detect RTP traffic. Port 9999 is not a standard RTP port.

**Fix:** Changed RTP tests to use port 5004 (standard RTP data port per RFC 3551).

**Note:** If JitterTrap still doesn't detect RTP, it may require:
- Deep packet inspection of RTP headers
- Configuration to specify which ports/flows are RTP
- Standard RTP port range (even ports 16384-32767)

**Status:** Port changed to 5004 - retest needed

---

## Notes

These issues were discovered using the `tcp-timing/persist-timer` test which creates a TCP flow with:
- 5 seconds normal traffic
- 5 seconds zero-window stall (receiver stops reading)
- Recovery when receiver resumes reading

The test correctly demonstrates the pathology, but JitterTrap's visualization of these conditions needs review.
244 changes: 244 additions & 0 deletions pathological-porcupines/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,244 @@
# Pathological Porcupines

Network application failure simulations for educational demonstrations and JitterTrap tool qualification testing. Uses Python 3.10 standard library only.

## Quick Start

```bash
# Create test topology (one-time setup)
sudo ./infra/setup-topology.sh

# Run a test - automatically opens browser to JitterTrap UI
sudo ./infra/run-test.sh tcp-timing/persist-timer

# Clean up when done
sudo ./infra/teardown-topology.sh
```

## Test Network Topology

```
┌─────────────┐ veth-src ┌───────────────────┐ veth-dst ┌─────────────┐
│ SOURCE │◄──────────────────►│ OBSERVER │◄──────────────────►│ DESTINATION │
│ pp-source │ │ pp-observer │ │ pp-dest │
│ 10.0.1.1 │ │ br0 (bridge) │ │ 10.0.1.2 │
│ │ │ veth-mgmt │ │ │
│ sender.py │ │ (10.0.0.2) │ │ receiver.py │
│ client.py │ │ JitterTrap │ │ server.py │
└─────────────┘ └───────────────────┘ └─────────────┘
│ veth-host (10.0.0.1)
┌─────────────────┐
│ HOST │
│ Browser → │
│ 10.0.0.2:8080 │
└─────────────────┘
```

- **Source (pp-source)**: Runs sender/client scripts (10.0.1.1)
- **Observer (pp-observer)**: Runs JitterTrap, bridges traffic (10.0.0.2 for UI)
- **Destination (pp-dest)**: Runs receiver/server scripts (10.0.1.2)

## Implemented Pathologies

| Category | Pathology | Description | JitterTrap Observable |
|----------|-----------|-------------|----------------------|
| TCP Flow Control | [receiver-starvation](tests/tcp-flow-control/receiver-starvation/) | Slow receiver causes zero-window | Zero-window events |
| TCP Flow Control | [silly-window-syndrome](tests/tcp-flow-control/silly-window-syndrome/) | Tiny segments from small windows | Small packet sizes |
| TCP Timing | [nagle-delayed-ack](tests/tcp-timing/nagle-delayed-ack/) | 40-200ms latency from Nagle/delayed ACK | RTT histogram spikes |
| TCP Timing | [persist-timer](tests/tcp-timing/persist-timer/) | Zero-window probes at exponential backoff | IPG gaps at 5s, 10s intervals |
| TCP Timing | [sender-stall](tests/tcp-timing/sender-stall/) | Application pauses sending, varying gaps | IPG gaps with healthy window |
| TCP Lifecycle | [rst-storm](tests/tcp-lifecycle/rst-storm/) | Abrupt connection termination with RST | RST flags in flow details |
| UDP | [bursty-sender](tests/udp/bursty-sender/) | Bimodal inter-packet gap distribution | IPG histogram with two peaks |
| RTP/Media | [rtp-jitter-spike](tests/rtp/rtp-jitter-spike/) | Periodic large jitter in media stream | Jitter outliers >100ms |
| RTP/Media | [rtp-sequence-gap](tests/rtp/rtp-sequence-gap/) | Packet loss via sequence discontinuities | seq_loss counter |

## Project Structure

```
pathological-porcupines/
├── infra/ # Test infrastructure
│ ├── setup-topology.sh # Create 3-namespace topology
│ ├── teardown-topology.sh # Remove topology
│ ├── run-test.sh # Orchestrate test with JitterTrap
│ ├── add-impairment.sh # Apply tc/netem impairments
│ ├── set-mtu.sh # Configure MTU
│ └── screenshot-controller.js # Automated screenshot capture
├── common/ # Shared Python utilities
│ ├── network.py # Socket creation helpers
│ ├── timing.py # Rate limiting, burst timers
│ ├── protocol.py # RTP packet building/parsing
│ └── logging_utils.py # Logging setup
├── tests/ # Verification test scenarios
│ ├── tcp-flow-control/ # Window/buffer pathologies
│ ├── tcp-timing/ # Timer-related issues
│ ├── tcp-lifecycle/ # Connection state issues
│ ├── udp/ # UDP pathologies
│ └── rtp/ # RTP/media stream issues
└── research/ # Parameter sweep experiments (untracked)
└── topics/
└── tcp-flow-control/ # TCP diagnostic research (2,587 experiments)
```

## Requirements

- Python 3.10+ (standard library only, no pip install needed)
- Linux with network namespace support
- Root/sudo for namespace and network configuration
- JitterTrap for visualization

## Infrastructure Scripts

| Script | Purpose |
|--------|---------|
| `infra/setup-topology.sh` | Create namespaces, veth pairs, and L2 bridge |
| `infra/teardown-topology.sh` | Remove all namespaces and interfaces |
| `infra/run-test.sh <path>` | Run a test with JitterTrap orchestration |
| `infra/add-impairment.sh <profile>` | Apply network impairment (wan, lossy, etc.) |
| `infra/set-mtu.sh <mtu>` | Set MTU on test interfaces |
| `infra/cleanup-processes.sh` | Kill orphaned test processes (tcpdump, jt-server, python) |

### Impairment Profiles

```bash
# Apply WAN-like delay
sudo ./infra/add-impairment.sh wan

# Apply packet loss
sudo ./infra/add-impairment.sh lossy

# Custom impairment
sudo ./infra/add-impairment.sh custom delay 100ms loss 2%

# Clear impairments
sudo ./infra/add-impairment.sh clean
```

## Running Tests

### With Test Runner (Recommended)

The test runner handles JitterTrap startup, waits for you to connect, then runs the test:

```bash
# Basic execution - opens browser, prompts before starting test
sudo ./infra/run-test.sh tcp-timing/persist-timer

# Auto-start mode (no prompt, starts after 5s)
sudo ./infra/run-test.sh tcp-timing/persist-timer --auto

# With network impairment
sudo ./infra/run-test.sh udp/bursty-sender --impairment wan

# Skip JitterTrap (for debugging)
sudo ./infra/run-test.sh rtp/rtp-jitter-spike --no-jittertrap

# Don't auto-open browser (just print URL)
sudo ./infra/run-test.sh tcp-timing/persist-timer --no-browser

# Reset network config after test
sudo ./infra/run-test.sh tcp-lifecycle/rst-storm --reset
```

### Manual Execution

For more control, run components separately in different terminals:

```bash
# Terminal 1: Start JitterTrap in observer namespace
sudo ip netns exec pp-observer jt-server --allowed veth-src:veth-dst -p 8080
# Open http://10.0.0.2:8080 in browser

# Terminal 2: Start server/receiver in destination namespace
sudo ip netns exec pp-dest python3 tests/tcp-timing/persist-timer/server.py --port 9999

# Terminal 3: Start client/sender in source namespace
sudo ip netns exec pp-source python3 tests/tcp-timing/persist-timer/client.py --host 10.0.1.2 --port 9999
```

Each pathology directory contains:
- `README.md` - Detailed explanation, usage, and expected output
- `server.py` or `receiver.py` - Destination component
- `client.py` or `sender.py` - Source component

## Observing with JitterTrap

1. Create topology: `sudo ./infra/setup-topology.sh`
2. Start JitterTrap in observer namespace:
```bash
sudo ip netns exec pp-observer jt-server --allowed veth-src:veth-dst -p 8080
```
3. Open http://10.0.0.2:8080 in browser
4. Select observation interface (veth-src or veth-dst)
5. Run a pathology test
6. Watch metrics:
- **IPG histogram** - Inter-packet gap distribution
- **Jitter** - RFC 3550 jitter calculation
- **Window size** - TCP flow control
- **Packet size** - Fragmentation and SWS
- **Flags** - RST, FIN, zero-window events

## Self-Checking Tests

All tests include self-check assertions that verify the expected pathology occurred:

```
Self-check results:
[PASS] Zero-window detected: 2 block event(s)
[PASS] Persist timer triggered: 2 block(s) >= 4s
[PASS] Data sent: 98.5 KB
```

Exit codes: 0 = pass, 1 = fail

## Troubleshooting

### Cleaning Up Orphaned Processes

If tests are interrupted or crash, processes may be left running. Use the cleanup script:

```bash
# List orphaned processes (dry run)
sudo ./infra/cleanup-processes.sh --list

# Kill orphaned processes
sudo ./infra/cleanup-processes.sh
```

The cleanup script finds and kills:
- `tcpdump` processes in the observer namespace
- `jt-server` processes in the observer namespace
- Python test processes in source/dest namespaces
- Any `jt-server` running outside namespaces

### Port Conflicts

If you see "Address already in use" errors:
```bash
# Check what's using the port
sudo ss -tlnp | grep :9999

# Clean up stale processes
sudo ./infra/cleanup-processes.sh
```

### Namespace Issues

If namespace operations fail:
```bash
# Remove all test infrastructure
sudo ./infra/teardown-topology.sh

# Recreate fresh
sudo ./infra/setup-topology.sh
```

## Documentation

See [pathological-porcupines.md](../pathological-porcupines.md) for the complete
catalog of all 61 planned pathologies across 12 categories.

## License

Part of JitterTrap - see main project for license.
24 changes: 24 additions & 0 deletions pathological-porcupines/common/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
"""
Pathological Porcupines - Common Utilities

Shared utilities for network pathology simulations.
All modules use only Python 3.10+ standard library.
"""

from .network import create_tcp_socket, create_udp_socket, parse_address
from .timing import rate_limiter, sleep_until, monotonic_ns
from .protocol import RTPPacket, parse_rtp_header
from .logging_utils import setup_logging, get_logger

__all__ = [
'create_tcp_socket',
'create_udp_socket',
'parse_address',
'rate_limiter',
'sleep_until',
'monotonic_ns',
'RTPPacket',
'parse_rtp_header',
'setup_logging',
'get_logger',
]
Loading
Loading