fix: upgrade HiGHS to v1.14.0, suppress debug assertion on warm-start MIP

ElektrikAkar · claude · ElektrikAkar · commit 59b00a225363 · 2026-04-09T02:26:24.000+01:00
HiGHS &lt;=1.14.0 assert(ub_consistent) fires in updatePrimalDualIntegral()
during warm-start MIP solves. This is a performance metric tracker, not
solution correctness — prev_lb/prev_ub/prev_gap are documented "Only for
checking/debugging". Presolve restart offset arithmetic introduces
roundoff exceeding the 1e-12 tolerance.

Verified by Codex GPT-5.4 (xhigh reasoning): not a solution bug.
Workaround: NDEBUG on HiGHS target. Proper upstream fix needed.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.claude/LESSONS.md b/.claude/LESSONS.md
@@ -53,6 +53,11 @@ Critical knowledge to avoid repeating mistakes.
 - **MATLAB + MSVC OpenMP: exit segfault in `-batch` mode.** Functionality works; segfault after output. Check output, not exit code.
 - **nanobind over pybind11.** Stable ABI, 5-10x smaller binaries, native CUDA ndarray. GIL release for >10ms calls.
 
+## HiGHS MIP Solver (IMPORTANT — workaround in place)
+
+- **HiGHS <=1.14.0 `assert(ub_consistent)` fires on warm-start MIP.** The assertion is in `updatePrimalDualIntegral()` — a performance metric tracker, NOT solution correctness. `prev_lb/prev_ub/prev_gap` are documented "Only for checking/debugging" (line 2802). The P-D integral is never used to accept/reject incumbents. Presolve restart rebases bounds with offset arithmetic that introduces roundoff exceeding the 1e-12 tolerance. **Current workaround:** `target_compile_definitions(highs PRIVATE NDEBUG)` in `cmake/Dependencies.cmake` — too blunt (suppresses ALL HiGHS assertions). **Proper fix needed:** patch HiGHS to skip `check_prev_data` after restart, or relax the tolerance in this specific block. File upstream issue at github.com/ERGO-Code/HiGHS.
+- **Verified by Codex (GPT-5.4, xhigh reasoning):** Not a solution-correctness bug. The workaround is legitimate short-term.
+
 ## Build System
 
 - **CUDA multi-version on Windows:** Generate `Directory.Build.props` with `<CudaToolkitCustomDir>`.
diff --git a/.claude/summaries/handoff-2026-04-08-chunked-clara.md b/.claude/summaries/handoff-2026-04-08-chunked-clara.md
@@ -1,57 +1,70 @@
 # Session Handoff: Chunked CLARA + Float32 Views + ARC SLURM (2026-04-08)
 
-**One-liner:** `--ram-limit` wired into chunked CLARA with Parquet row-group streaming, float32 view-mode, OpenMP-parallel chunked assignment, ARC SLURM build profiles for full GPU fleet (P100→H100).
+`--ram-limit` wired into chunked CLARA with Parquet row-group streaming, float32 view-mode, OpenMP-parallel chunked assignment, ARC SLURM build profiles for full GPU fleet (P100 through H100), HiGHS upgraded to 1.14.0.
 
-## What Was Done
+## What Was Done (3 commits on Claude branch)
 
-1. **Float32 view-mode** — `Data` now supports `span<const float>` views via `p_spans_f32_`. CLARA subsampling works with float32 data (zero-copy).
-2. **ParquetChunkReader** — new `dtwc/io/parquet_chunk_reader.hpp`: row-group streaming, sparse row access (`read_rows()`), RAM budget calculation, float32 path (`read_row_groups_f32`).
-3. **Chunked CLARA** — `assign_all_points_chunked()` and `assign_all_points_chunked_f32()` stream row groups within RAM budget. `fast_clara_chunked()` loads subsamples + medoids from Parquet on demand. OpenMP `parallel for` on inner loop.
-4. **CLI wiring** — `--ram-limit` populates `CLARAOptions` for Parquet input. `--precision float32` triggers f32 chunked path.
-5. **Adversarial review** — 2 agents (Opus code-reviewer + Codex). Fixed: float Parquet `static_pointer_cast<DoubleArray>` crash (High), int overflow at >2B rows (Medium), duplicate index double-move (Medium), bounds validation, `row_groups_per_batch` edge case, N-element copy per subsample replaced with `std::sample`.
-6. **CUDA CMake** — `cmake_minimum_required(VERSION 3.26)`, CUDA C++20 now works. CI updated to `pip install cmake>=3.26`.
-7. **ARC SLURM support** — default CUDA archs expanded to `60;70;75;80;86;89;90` (P100→H100). Build script `scripts/slurm/build-arc.sh` with 6 profiles (arc, htc-cpu, htc-gpu, htc-v4, h100, grace).
-8. **HiGHS assertion fix** — HiGHS v1.13.1 debug `assert(ub_consistent)` fires on warm-start MIP. Fixed by adding `NDEBUG` compile definition to HiGHS target. 67/67 tests now pass.
-9. **Docs** — CHANGELOG.md (Phase 4 features), README.md (feature list + `DTWC_ENABLE_ARROW`), LESSONS.md (Arrow/Parquet + ARC hardware notes).
-10. **Cleanup** — deleted 8 older handoff files.
+1. **Float32 view-mode** — `Data` supports `span<const float>` views via `p_spans_f32_`. CLARA subsampling works with float32 data (zero-copy). New constructor, updated `size()`, `series_f32()`, `series_flat_size()`.
+
+2. **ParquetChunkReader** — new `dtwc/io/parquet_chunk_reader.hpp`. Row-group streaming, sparse row access (`read_rows()`), RAM budget calculation, float32 path (`read_row_groups_f32`). Handles both Float and Double Parquet columns. Thread-safety documented.
+
+3. **Chunked CLARA** — two new functions in `fast_clara.cpp`:
+   - `assign_all_points_chunked()` / `_f32()` — stream row groups within RAM budget, OpenMP `parallel for` on inner loop
+   - `fast_clara_chunked()` — loads subsamples + medoids from Parquet on demand via `std::sample` (O(sample_size), not O(N))
+   - Dispatch in `fast_clara()` routes to chunked mode when `ram_limit > estimated_data_size`
+
+4. **CLI wiring** — `--ram-limit` populates `CLARAOptions` for Parquet input. `--precision float32` triggers f32 chunked path. Warning printed for non-Parquet input.
+
+5. **Adversarial review** — 2 Opus agents + 1 Codex review. Bugs fixed:
+   - *High*: float Parquet `static_pointer_cast<DoubleArray>` — now checks value type
+   - *Medium*: int overflow at >2B rows — uses `int64_t` + `mt19937_64`
+   - *Medium*: `read_rows()` double-move on duplicates — uses copy
+   - *Low*: bounds validation, `row_groups_per_batch` edge case
+
+6. **CUDA CMake** — `cmake_minimum_required(VERSION 3.26)` across all CMakeLists. CUDA C++20 now works. CI updated to `pip install cmake>=3.26`.
+
+7. **ARC SLURM support** — CUDA archs expanded to `60;70;75;80;86;89;90` (P100 through H100). Build script `scripts/slurm/build-arc.sh` with 6 profiles: `arc`, `htc-cpu`, `htc-gpu`, `htc-v4`, `h100`, `grace`.
+
+8. **HiGHS v1.14.0** — upgraded from v1.13.1. Debug `assert(ub_consistent)` in primal-dual integral tracking still fires on warm-start MIP in both versions. Verified by Codex GPT-5.4 (xhigh reasoning): not a solution-correctness bug — it is bookkeeping for a performance metric. Workaround: `NDEBUG` compile def on HiGHS target. Needs proper upstream fix (see LESSONS.md).
+
+9. **Documentation** — CHANGELOG.md (Phase 4 features), README.md (feature list, `DTWC_ENABLE_ARROW` option), LESSONS.md (Arrow/Parquet, ARC hardware, HiGHS workaround).
+
+10. **Cleanup** — deleted 8 obsolete handoff files.
 
 ## Current State
 
-- **Branch:** Claude
+- **Branch:** Claude (3 commits ahead of origin/Claude)
 - **Tests:** 67/67 pass, 2 CUDA skipped
 - **Build:** Clang 21, C++20, Ninja, Windows 11
 
-## Files Changed
+## Key Files
 
-### New files
-- `dtwc/io/parquet_chunk_reader.hpp` — row-group streaming reader
-- `scripts/slurm/build-arc.sh` — ARC SLURM build profiles
-
-### Modified
-- `dtwc/Data.hpp` — `p_spans_f32_`, f32 view constructor, updated accessors
-- `dtwc/Problem.hpp` — `dtw_function()` / `dtw_function_f32()` public accessors
-- `dtwc/algorithms/fast_clara.hpp` — `CLARAOptions` + ram_limit, parquet_path, use_float32
-- `dtwc/algorithms/fast_clara.cpp` — f32 subsample, chunked assignment (f64+f32), OpenMP, dispatch
-- `dtwc/dtwc_cl.cpp` — `--ram-limit` wired into `clara_opts`
-- `CMakeLists.txt` — cmake_minimum_required 3.26, CUDA archs 60-90
-- `.github/workflows/cuda-mpi-detect.yml` — pip install cmake>=3.26
-- `tests/unit/unit_test_Data.cpp` — float32 view-mode tests
-- `tests/unit/algorithms/unit_test_fast_clara.cpp` — float32 CLARA test
-- `CHANGELOG.md`, `README.md`, `.claude/TODO.md`, `.claude/LESSONS.md`
+| File | Role |
+|------|------|
+| `dtwc/io/parquet_chunk_reader.hpp` | Row-group streaming Parquet reader (new) |
+| `dtwc/algorithms/fast_clara.cpp` | Chunked CLARA + f32 paths + OpenMP |
+| `dtwc/algorithms/fast_clara.hpp` | `CLARAOptions` with ram_limit, parquet_path, use_float32 |
+| `dtwc/Data.hpp` | Float32 view-mode (`p_spans_f32_`) |
+| `dtwc/Problem.hpp` | `dtw_function()` / `dtw_function_f32()` accessors |
+| `dtwc/dtwc_cl.cpp` | `--ram-limit` wired into `clara_opts` |
+| `scripts/slurm/build-arc.sh` | ARC SLURM build profiles (new) |
+| `cmake/Dependencies.cmake` | HiGHS v1.14.0 + NDEBUG workaround |
 
 ## What To Do Next
 
-### Immediate (SLURM session)
-1. SSH to ARC, `source scripts/slurm/build-arc.sh htc-gpu` — verify Arrow CPM build on Linux
-2. Run on real battery Parquet data with `--ram-limit 2G --precision float32 --method clara`
-3. Test H100 GPU path: `source scripts/slurm/build-arc.sh h100`
+### SLURM session
+1. Push branch, SSH to ARC
+2. `source scripts/slurm/build-arc.sh htc-gpu` — verify Arrow CPM build on Linux + full GPU fleet
+3. Run on real battery Parquet data: `dtwc --ram-limit 2G --precision float32 --method clara -k 10 battery.parquet`
+4. Test H100 GPU path: `source scripts/slurm/build-arc.sh h100`
 
 ### Short-term
-4. Integration test for chunked CLARA with synthetic Parquet file (no Parquet on Windows CI)
-5. Sample size scaling: `sqrt(N)` for large N (current formula too small at 100M)
-6. CLARA checkpointing: save/resume assignment state for long runs
+5. Integration test for chunked CLARA with synthetic Parquet file
+6. Sample size scaling: `sqrt(N)` for large N (current formula too small at 100M)
+7. CLARA checkpointing: save/resume assignment state for long runs
+8. File upstream HiGHS issue for `ub_consistent` assertion
 
 ### Known Issues
-
 - Arrow CPM build on Windows+Clang: blocked by Arrow upstream ExternalProject flag quoting
 - Grace Hopper (htc-g057): AArch64 CPU build untested, CUDA kernel not ported to ARM
+- HiGHS NDEBUG workaround is too blunt — suppresses all HiGHS assertions (see LESSONS.md)
diff --git a/cmake/Dependencies.cmake b/cmake/Dependencies.cmake
@@ -28,14 +28,17 @@ function(dtwc_setup_dependencies)
   if(NOT TARGET highs::highs AND DTWC_ENABLE_HIGHS)# HiGHS library:
   CPMAddPackage(
     NAME highs
-    URL "https://github.com/ERGO-Code/HiGHS/archive/refs/tags/v1.13.1.tar.gz"
+    URL "https://github.com/ERGO-Code/HiGHS/archive/refs/tags/v1.14.0.tar.gz"
     SYSTEM
     EXCLUDE_FROM_ALL
     OPTIONS
     "CI OFF" "ZLIB OFF" "BUILD_EXAMPLES OFF" "BUILD_TESTING OFF" "FAST_BUILD ON"
     )
-    # HiGHS v1.13.1 has debug assertions (ub_consistent) that fire on valid warm-start
-    # MIP solves due to numerical noise. Suppress by defining NDEBUG on HiGHS targets.
+    # HiGHS <=1.14.0 has a debug assertion (ub_consistent) that fires on valid
+    # warm-start MIP solves due to rounding in primal-dual integral tracking
+    # after presolve reset. The solution is correct; the bookkeeping tolerance
+    # (1e-12) is too tight. Suppress by defining NDEBUG on HiGHS target.
+    # Upstream: https://github.com/ERGO-Code/HiGHS — not yet fixed as of 1.14.0.
     if(TARGET highs)
       target_compile_definitions(highs PRIVATE NDEBUG)
     endif()