[bug] Sequential Processing due to Connection Pool Limits

# Connection Pool Limits Cause Sequential Processing Instead of Concurrent Execution

## Summary
BAML appears to have connection pool limits that cause high-concurrency requests to be processed sequentially rather than concurrently, despite correct usage of `asyncio.gather()`. This manifests as a distinctive timing pattern where requests complete in sequential batches rather than truly in parallel.

## Environment
- **BAML Version**: 0.208.5 (latest as of issue creation: 0.211.0)
- **Python Version**: 3.12.5
- **OS**: macOS
- **Usage Pattern**: 20+ concurrent requests via `asyncio.gather()`

## Issue Details

### Expected Behavior
When making multiple concurrent BAML calls with `asyncio.gather()`, requests should execute in parallel with completion times distributed based on actual API response times.

### Actual Behavior
Requests are processed in sequential batches (~6 at a time), creating this pattern:
1. **First ~6 requests**: Complete sequentially with 1.5-2s gaps between each
2. **Sudden burst**: 6+ requests complete within milliseconds of each other  
3. **Pattern repeats**: Indicating connection pool cycling rather than true concurrency

### Evidence from Production Logs

**Sequential Processing Phase:**
```log
2025-10-08 17:22:26,888 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
2025-10-08 17:22:28,519 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"  [Gap: 1.631s]
2025-10-08 17:22:30,228 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"  [Gap: 1.709s] 
2025-10-08 17:22:31,689 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"  [Gap: 1.461s]
2025-10-08 17:22:33,466 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"  [Gap: 1.777s]
2025-10-08 17:22:35,298 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"  [Gap: 1.832s]
```

**Then Sudden Concurrent Burst:**
```log
2025-10-08 17:22:47,930 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
2025-10-08 17:22:47,930 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"  [Gap: 0ms]
2025-10-08 17:22:47,931 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"  [Gap: 1ms]
2025-10-08 17:22:47,931 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"  [Gap: 0ms]
2025-10-08 17:22:47,932 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"  [Gap: 1ms]
2025-10-08 17:22:47,933 INFO httpx HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"  [Gap: 1ms]
```

### User Code (Correctly Implemented)
```python
async def concurrent_simplified_generation(queries, context_chunks_list, baml_options):
    """From backend/backend/core/agents/helpers.py - correctly uses asyncio.gather"""
    tasks = []
    for query, context_chunks in zip(queries, context_chunks_list, strict=True):
        task = simplified_baml_qa_response(query, ..., baml_options=baml_options)
        tasks.append(task)
    
    # This should enable true concurrency, but BAML appears to serialize internally
    return await asyncio.gather(*tasks)
```

## Relationship to Previous Work

**Acknowledgment**: The BAML team has already addressed several connection pool issues:
- **PR #1027/#1028**: Fixed idle connection stalling in FFI boundaries  
- **PR #2205**: Fixed file descriptor leaks with pool timeouts

**This issue is different:**
- Previous fixes addressed **idle connections** and **resource leaks**
- This issue is about **active connection limits** preventing true concurrency
- The distinctive timing pattern suggests **connection pool exhaustion** rather than idle timeouts

## Root Cause Analysis

BAML uses requests/httpx internally but appears to have connection pool limits that aren't suitable for high-concurrency scenarios. The current configuration likely allows ~6 concurrent connections, causing additional requests to queue rather than execute in parallel.

## Impact
- **Performance degradation**: 20 concurrent requests that should complete in ~3-5s take 30-50s
- **Poor resource utilization**: CPU and network remain idle while requests queue
- **Unpredictable latency**: Request completion depends on queue position, not actual processing

## Proposed Solutions

1. **Expose connection pool configuration** in BAML client options
2. **Increase default connection limits** for modern high-concurrency use cases
3. **Add configuration similar to the existing timeout proposal** in #1630

## Additional Context
- Issue becomes pronounced with 10+ concurrent requests
- BAML version 0.208.5, but reviewing through 0.211.0 shows no related fixes
- This significantly impacts batch processing and parallel generation workflows
- Related to #1630 (configurable timeouts) but specifically about connection limits

**Reproducible**: Yes, consistently observed across multiple test runs and production usage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] Sequential Processing due to Connection Pool Limits #2594

Connection Pool Limits Cause Sequential Processing Instead of Concurrent Execution

Summary

Environment

Issue Details

Expected Behavior

Actual Behavior

Evidence from Production Logs

User Code (Correctly Implemented)

Relationship to Previous Work

Root Cause Analysis

Impact

Proposed Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] Sequential Processing due to Connection Pool Limits #2594

Description

Connection Pool Limits Cause Sequential Processing Instead of Concurrent Execution

Summary

Environment

Issue Details

Expected Behavior

Actual Behavior

Evidence from Production Logs

User Code (Correctly Implemented)

Relationship to Previous Work

Root Cause Analysis

Impact

Proposed Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions