diff --git a/notes/pr5_update_summary.md b/notes/pr5_update_summary.md new file mode 100644 index 0000000000..cea64f0ee6 --- /dev/null +++ b/notes/pr5_update_summary.md @@ -0,0 +1,27 @@ +# PR #5 Update Summary + +## What this PR adds + +- a provider-metadata / endpoint-discovery plan for the selected Runpod path +- a concrete endpoint contract template +- a provider checklist and metadata decision surface +- a sharper blocked-state record showing which local discovery sources were exhausted + +## What still remains + +- actual pod identifier or display name from the provider +- actual host / endpoint from the provider +- actual attach or SSH command from the provider +- explicit username / port if required +- runtime confirmation that the attach route lands in `/workspace/parameter-golf` + +## First required provider handoff + +The next turn should begin from the exact provider-supplied attach command or SSH tuple that identifies the selected pod explicitly. + +## Evidence added in this turn + +- PR #5 has no review comments or embedded provider metadata +- no Runpod endpoint or attach command was found in shell history or Windows PowerShell history +- no Runpod URL history was found in Chrome, Edge, Brave, or Firefox +- local SSH material exists, but provider-specific endpoint data does not diff --git a/notes/tpi_006_endpoint_contract.md b/notes/tpi_006_endpoint_contract.md new file mode 100644 index 0000000000..6d2663430e --- /dev/null +++ b/notes/tpi_006_endpoint_contract.md @@ -0,0 +1,82 @@ +# TPI-006 Endpoint Contract + +## Objective + +Record the concrete provider-supplied endpoint data needed to attach to the selected Runpod pod. + +## Required fields + +- pod identifier or pod display name: not yet obtained +- host or endpoint: not yet obtained +- exact attach command or SSH command: not yet obtained +- username: not yet obtained +- port (if required): not yet obtained +- expected landing path: `/workspace/parameter-golf` +- first verification commands after attach: fixed below + +## Concrete facts available now + +- local SSH client exists at `/usr/bin/ssh` +- reusable SSH keys exist at: + - `/mnt/c/Users/eb245/.ssh/id_ed25519` + - `/mnt/c/Users/eb245/.ssh/id_rsa` +- no Windows-side SSH config file was present +- Windows-side `known_hosts` contained GitHub hosts only +- no Runpod-specific endpoint, hostname, or attach command was found in: + - PR #5 body or reviews + - shell history + - Windows PowerShell history + - browser history for Chrome, Edge, Brave, or Firefox + +## Provider-adjacent public reference + +The public repo still points to the official launch template: + +```text +https://console.runpod.io/deploy?template=y5cejece4j&ref=nl2r56th +``` + +This is not sufficient to resume TPI-004, because it does not identify the concrete pod, endpoint, username, or port for the already selected instance. + +## Target landing path + +Preferred: + +```bash +/workspace/parameter-golf +``` + +If the attach route lands elsewhere, record the exact correction steps needed to reach the repo. + +## Exact attach command + +Still unavailable from provider metadata in this workspace. + +Current reusable template once provider metadata is handed off: + +```bash +ssh -i /mnt/c/Users/eb245/.ssh/id_ed25519 @ -p +``` + +## First verification commands + +```bash +pwd +ls /workspace +cd /workspace/parameter-golf +git rev-parse --abbrev-ref HEAD +python3 -c "import torch, datasets, sentencepiece; print('deps-ok')" +nvidia-smi +``` + +## Resume condition + +Once all required fields are filled with real values, the branch is ready to resume the unchanged TPI-004 evidence pass. + +## Attach failure fallback + +- require one concrete provider handoff item set: + - exact SSH command, or + - exact host + username + port tuple, or + - provider console attach route tied to the selected pod id +- until then, endpoint discovery remains blocked rather than partial diff --git a/notes/tpi_006_metadata_decision.md b/notes/tpi_006_metadata_decision.md new file mode 100644 index 0000000000..1ac1a5ce61 --- /dev/null +++ b/notes/tpi_006_metadata_decision.md @@ -0,0 +1,89 @@ +# TPI-006 Metadata Decision + +## Status + +blocked + +## Objective + +Record whether concrete provider-supplied endpoint metadata has been obtained for the selected Runpod path. + +## Required decision fields + +- pod identifier known or not: not known +- host / endpoint known or not: not known +- exact attach / SSH command known or not: not known +- username known or not: not known +- port known or not: not known +- landing path confirmed or not: expected path known, but not runtime-confirmed + +## Classification + +- `confirmed` +- `partial` +- `blocked` + +## Concrete metadata obtained + +- no pod-specific provider metadata was obtained in this turn +- expected landing path remains `/workspace/parameter-golf` +- local auth material remains reusable: + - `/mnt/c/Users/eb245/.ssh/id_ed25519` + - `/mnt/c/Users/eb245/.ssh/id_rsa` +- public launch template reference remains available in the repo README + +## Discovery sources checked + +- PR #5 body and review surface +- shell history +- Windows PowerShell history +- Windows-side SSH directory, config, and known hosts +- Chrome history (`Default`, `Profile 2`) +- Edge history (`Default`) +- Brave history (`Default`, `Profile 1`) +- Firefox history (`default-release`) + +## Still missing + +- pod identifier or display name +- host / endpoint +- exact attach command or SSH command +- username +- port if non-default + +## Classification result + +- `blocked` + +## Interpretation + +- The blocker has narrowed to provider-supplied pod metadata that is absent from the current workspace. +- Further local discovery would likely repeat the same negative checks rather than create a runnable attach route. +- TPI-004 still cannot resume from this workspace alone. + +## Can `/workspace/parameter-golf` be reached now? + +- No + +## Can TPI-004 resume now? + +- No + +## First command once provider metadata is supplied + +```bash +ssh -i /mnt/c/Users/eb245/.ssh/id_ed25519 @ -p +``` + +Then: + +```bash +pwd +ls /workspace +cd /workspace/parameter-golf +git rev-parse --abbrev-ref HEAD +``` + +## Resume condition + +TPI-004 can resume unchanged only once the classification is at least `confirmed` for the concrete attach route. diff --git a/notes/tpi_006_provider_checklist.md b/notes/tpi_006_provider_checklist.md new file mode 100644 index 0000000000..78ac87f2d1 --- /dev/null +++ b/notes/tpi_006_provider_checklist.md @@ -0,0 +1,33 @@ +# TPI-006 Provider Checklist + +## Goal + +Obtain the minimum provider-supplied metadata needed to attach to the selected Runpod pod and resume TPI-004 unchanged. + +## Required items + +- [ ] pod identifier or display name +- [ ] host / endpoint +- [ ] exact attach command or SSH command +- [ ] username +- [ ] port (if needed) +- [x] expected landing path +- [x] first verification commands confirmed + +## Unfilled items and why + +- pod identifier or display name: not present in PR #5, local history, or browser history +- host / endpoint: not present in PR #5, local history, SSH state, or browser history +- exact attach command or SSH command: no saved provider command was found +- username: depends on the missing provider command or endpoint tuple +- port: depends on the missing provider command or endpoint tuple + +## Acceptance rule + +TPI-006 is successful only if a future turn can begin from concrete provider metadata rather than guessing endpoint details. + +## Resume target + +After endpoint discovery is concrete enough, the next turn should resume TPI-004 with: +- baseline `EVAL_STRIDE=1024` +- candidate `EVAL_STRIDE=128` diff --git a/notes/tpi_006_provider_metadata_plan.md b/notes/tpi_006_provider_metadata_plan.md new file mode 100644 index 0000000000..78f4b14e90 --- /dev/null +++ b/notes/tpi_006_provider_metadata_plan.md @@ -0,0 +1,46 @@ +# TPI-006 Provider Metadata Plan + +## Objective + +Obtain the concrete provider-supplied metadata needed to attach to the selected Runpod pod and land in `/workspace/parameter-golf`. + +## Public-facing name + +`MonkeyModel_EvalFirst_EndpointDiscovery` + +## Required provider metadata + +- pod identifier or display name +- host / endpoint +- exact provider-supplied attach command or SSH command +- username +- port (if needed) +- expected landing path + +## Why this loop exists + +TPI-005 established that local auth material exists, but pod-specific metadata is still missing. This loop narrows the blocker from generic attachability to concrete provider data. + +## Success condition + +The provider metadata is concrete enough that a future turn can begin by executing the exact attach route instead of inferring it. + +## Discovery sources for this turn + +- PR #5 body and review surface +- local shell history +- Windows PowerShell history +- Windows-side SSH directory and known-hosts state +- browser history for Chrome, Edge, Brave, and Firefox +- public Parameter Golf README guidance for the Runpod template + +## Stop rule + +If pod-specific values are still absent after checking the sources above, classify the branch as blocked on external provider handoff rather than spinning on more local discovery. + +## Non-goals + +- no model changes +- no tokenizer changes +- no environment reselection +- no score claims diff --git a/runs/TPI-004/run_notes.md b/runs/TPI-004/run_notes.md index 532a416792..140882f7f5 100644 --- a/runs/TPI-004/run_notes.md +++ b/runs/TPI-004/run_notes.md @@ -92,3 +92,21 @@ torchrun --standalone --nproc_per_node=1 train_gpt.py ## Status blocked before pod attachment + +## TPI-006 provider metadata discovery update + +- branch checked for discovery handoff: `exp/eval-first-006` +- current local commit during discovery: `0b981990cc6d2d21e2e49e8bb71ed1a70691342f` +- provider-specific metadata checked in: + - PR #5 body and review surface + - shell history + - Windows PowerShell history + - Windows-side SSH directory and known-hosts state + - browser history for Chrome, Edge, Brave, and Firefox +- result: + - no pod identifier found + - no Runpod host or endpoint found + - no exact attach command found + - no username or port found +- implication: + - TPI-004 remains blocked on external provider handoff, not on the eval-first monkey-model branch itself