Skip to content

The API doesn't replenish pools immediately if all pool pods are terminated #30

@wipash

Description

@wipash

Description

A clear and concise description of what the bug is.

Steps to Reproduce

  1. Check that KCR is running and executing code properly
  2. Terminate all pool-py pods
  3. Execute some Python in librechat

Expected Behavior

Python pool pods are immediately restarted after terminating, up to the pool limit

Actual Behavior

No new pool pods are started, execution in librechat times out.
Pool pods are slowly replenished over the next few minutes.

kubecoderun api logs the following when it is replenishing a pod:

{"pod_name": "pool-py-a5de0bf2", "event": "Removing unhealthy pod", "logger": "src.services.kubernetes.pool", "level": "warning", "timestamp": "2026-01-23T01:36:13.925511Z", "service": "kubecoderun-api", "version": "2.1.1"}

Logs/Screenshots

If applicable, add logs or screenshots to help explain your problem.

{"request_id": "8fU2s9VQ", "language": "py", "code_length": 2081, "entity_id": null, "user_id": "6714ccb5c1bf470f48e646b7", "api_key_hash": "e6014c76", "event": "Code execution request","logger": "src.api.exec", "level": "info", "timestamp": "2026-01-23T01:37:48.864032Z", "service": "kubecoderun-api", "version": "2.1.1"}
{"session_id": "tRyswCKrVthInokbFLqCt", "expires_at": "2026-01-24T01:37:48.864269+00:00", "event": "Session created", "logger":"src.services.session", "level": "info", "timestamp": "2026-01-23T01:37:48.864929Z", "service": "kubecoderun-api", "version": "2.1.1"}
{"session_id": "tRyswCKrVthInokbFLqCt", "event": "Created new session", "logger": "src.services.orchestrator", "level": "info", "timestamp": "2026-01-23T01:37:48.865033Z", "service": "kubecoderun-api", "version": "2.1.1"}
{"execution_id": "Ej8tjE6N", "session_id": "tRyswCKrVthInokbFLqCt", "language": "py", "code_length": 2081, "event": "Starting code execution", "logger": "src.services.execution.runner", "level": "info", "timestamp": "2026-01-23T01:37:48.869330Z", "service": "kubecoderun-api", "version": "2.1.1"}
{"language": "py", "session_id": "tRyswCKrVthI", "event": "Failed to acquire pod from pool", "logger": "src.services.kubernetes.manager", "level": "warning", "timestamp": "2026-01-23T01:37:48.869527Z", "service": "kubecoderun-api", "version": "2.1.1"}
{"job_name": "exec-py-tryswckrvthi-d2307311", "namespace": "default", "language": "py", "session_id": "tRyswCKrVthI", "event": "Created execution job", "logger": "src.services.kubernetes.job_executor", "level": "info", "timestamp": "2026-01-23T01:37:48.897607Z", "service": "kubecoderun-api", "version": "2.1.1"}
INFO:     10.0.16.136:44294 - "GET /ready HTTP/1.1" 200 OK
{"method": "GET", "path": "/ready", "status": 200, "duration_ms": 0.96, "event": "Request processed", "logger": "src.middleware.security", "level": "info", "timestamp": "2026-01-23T01:37:50.597051Z", "service": "kubecoderun-api", "version": "2.1.1"}
INFO:     10.0.16.136:55574 - "GET /health HTTP/1.1" 200 OK
{"job_name": "exec-py-tryswckrvthi-d2307311", "pod_name": "exec-py-tryswckrvthi-d2307311-9pwff", "pod_ip": "10.244.2.7", "elapsed_seconds": 9.25, "event": "Job pod ready", "logger": "src.services.kubernetes.job_executor", "level": "info", "timestamp": "2026-01-23T01:37:58.150980Z", "service": "kubecoderun-api", "version": "2.1.1"}
{"job_name": "exec-py-tryswckrvthi-d2307311", "pod_name": "exec-py-tryswckrvthi-d2307311-9pwff", "pod_ip": "10.244.2.7", "sidecar_url": "http://10.244.2.7:8080", "event": "Job ready, starting execution", "logger": "src.services.kubernetes.job_executor", "level": "info", "timestamp": "2026-01-23T01:37:58.151140Z", "service": "kubecoderun-api", "version": "2.1.1"}
HTTP Request: POST http://10.244.2.7:8080/execute "HTTP/1.1 200 OK"
{"job_name": "exec-py-tryswckrvthi-d2307311", "exit_code": 0, "stdout_len": 506, "stderr_len": 0, "stderr_preview": "", "event": "Job execution completed", "logger": "src.services.kubernetes.job_executor", "level": "info", "timestamp": "2026-01-23T01:37:58.225580Z", "service": "kubecoderun-api", "version": "2.1.1"}
{"event": "Code execution Ej8tjE6NOEQO2q666i43N completed: status=ExecutionStatus.COMPLETED, exit_code=0, time=9356ms, source=job", "logger": "src.services.execution.runner", "level": "info", "timestamp": "2026-01-23T01:37:58.225798Z", "service": "kubecoderun-api", "version": "2.1.1"}
{"session_id": "tRyswCKrVthInokbFLqCt", "status": "completed", "pod_name": null, "has_state": false, "event": "Code execution completed", "logger": "src.services.orchestrator", "level": "info", "timestamp": "2026-01-23T01:37:58.225993Z", "service": "kubecoderun-api", "version": "2.1.1"}
{"request_id": "8fU2s9VQ", "session_id": "tRyswCKrVthInokbFLqCt", "event": "Code execution completed", "logger": "src.api.exec", "level": "info", "timestamp": "2026-01-23T01:37:58.229346Z", "service": "kubecoderun-api", "version": "2.1.1"}
INFO:     10.244.0.96:57126 - "POST /exec HTTP/1.1" 200 OK
{"method": "POST", "path": "/exec", "status": 200, "duration_ms": 9367.4, "event": "Request processed", "logger": "src.middleware.security", "level": "info", "timestamp": "2026-01-23T01:37:58.230378Z", "service": "kubecoderun-api", "version": "2.1.1"}

Additional Context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions