Releases: cactus-compute/cactus
Releases · cactus-compute/cactus
v1.14
What's Changed
- Enable SDL2 live mic recording in chat build by @rshemet in #579
- fix gemma4 conversion and cli by @jakmro in #581
- Made non-thinking default for gemma4 & Introduce simultaneous multimodalitiy by @ParkiratS in #582
- gemma4 vision memory optimizations by @jakmro in #590
- added a prompt fix for better vision+audio prompt, removed code that … by @kar-m in #589
- added default confidence and wired that for vlm by @kar-m in #591
- Gemma4 fixes by @ParkiratS in #588
- Gemma4 tool calling and RoPE fixes by @ncylich in #594
- increased rolling window size for gemma4 by @kar-m in #593
Full Changelog: v1.13...v1.14
v1.13
What's Changed
- Improve model publishing error handling in workflow by @jakmro in #546
- Fix VLM prefill cache reuse image path slicing by @KayaanT in #545
- Add cache cleanup for Hugging Face models in export_and_publish_model by @jakmro in #547
- Add branch override option to publish workflow by @ncylich in #548
- Fix Gemma 3n conversion memory usage by @ncylich in #549
- Update blog URL in README.md by @amerkld in #544
- Parakeet streaming fix by @ParkiratS in #551
- Tool call prompt formatting by @jakmro in #558
- Feature Cleanup by @ParkiratS in #559
- Add Whisper v3 (large-v3) support by @ncylich in #557
- Add LFM2.5 VL 450M by @yujonglee in #567
- LFM2-VL-450M: fix garbled output (vision tower, template, kernel) by @ncylich in #565
- Change default transcribe model to parakeet-tdt-0.6b-v3 by @rshemet in #566
- Expose min_p and repetition_penalty in completion options by @DuFanYin in #560
- Graph save load by @cattermelon1234 in #556
- Pyannote features and optimizations by @jakmro in #571
- Karen/needle by @kar-m in #574
- fix apple i8mm detection to use runtime sysctl check by @DuFanYin in #562
- Fix streaming transcribe by @jakmro in #576
New Contributors
Full Changelog: v1.12...v1.13
v1.12
What's Changed
- Gemma 4
- Versioned docs with quickstart and SDK chooser by @rshemet in #522
- add CACTUS_CLOUD_API_BASE by @yujonglee in #521
- Model weights discoverability by @jakmro in #520
- fix: set telemetry framework to "rust" for Rust bindings by @rshemet in #519
- fix: prefer local libcactus over system-installed in test build by @rshemet in #516
- Engine updates: tool calling, compute_entropy, converter improvements by @ncylich in #517
cactus_prefillby @mhayes853 in #512- Missing torch ops by @cattermelon1234 in #518
- Docs fixes by @jakmro in #524
- docs: document custom vocabulary support for transcription by @ayushmk7 in #525
- Youtu by @jakmro in #530
- Clean up Hugging Face cache after model export by @jakmro in #533
- Added parakeet optimizations for apple by @ParkiratS in #534
- Cactus torch api clean by @cattermelon1234 in #529
- add CACTUS_CLOUD_HEADERS support by @yujonglee in #531
- Parakeet encoder optimization by @ParkiratS in #535
- Update docs site_url to docs.cactuscompute.com by @rshemet in #527
- Custom vocabulary support for Parakeet TDT by @rshemet in #532
- Fix VLM crash when adding multiple images in multi-turn conversation by @FarooqMulla in #539
- New LayerNorm Kernel by @nshejwalkar in #540
- Add pyannote/segmentation-3.0 speaker diarization (10ms, 976× realtime) by @rshemet in #538
- Tinyllama by @ParkiratS in #536
New Contributors
- @cattermelon1234 made their first contribution in #518
- @ayushmk7 made their first contribution in #525
- @FarooqMulla made their first contribution in #539
Full Changelog: v1.11...v1.12
v1.11
What's Changed
- Fix/issue#490 by @lennartvoelz in #491
- simplify and align sdks by @jakmro in #489
- remove models by @jakmro in #492
- Update model configurations and enhance workflow settings in publish_… by @jakmro in #495
- Update workflow to use macos-latest instead of macos-latest-xlarge by @jakmro in #496
- Add dynamic max_tokens estimation based on audio length in cactus_tra… by @jakmro in #499
- macOS: link clang_rt.osx to fix SME2 (_arm_tpidr2*) link failures under rustc by @yujonglee in #498
- Add FFI log control: cactus_log_set_level and cactus_log_set_callback by @yujonglee in #497
- Karen/qwen3p5 by @kar-m in #481
- CLI upgrades by @rshemet in #504
- feat(stt): custom vocabulary biasing for all speech models by @vyomshah05 in #451
- Add Gemma 3N (text-only) model support by @ncylich in #493
- fix: make FunctionGemma prompt formatting strict by @lennartvoelz in #502
- fix: apply logit bias before greedy sampling by @ncylich in #507
- remove redundant file linking for tie_word_embeddings by @jakmro in #506
- Port general engine improvements for TinyLlama by @ncylich in #513
- Speech-to-Text Timestamps by @jakmro in #515
New Contributors
- @lennartvoelz made their first contribution in #491
Full Changelog: v1.10...v1.11
v1.10
What's Changed
- Enhance model publishing workflow with detailed metadata and licenses by @jakmro in #459
- Added parakeet to publish to hf yaml by @ParkiratS in #464
- Update telemetry for supported platforms by @justinl66 in #465
- added back moe weight conversion by @kar-m in #468
- adjust manual workflow for model publish by @jakmro in #470
- Parakeet blog by @ammesatyajit in #467
- perf: add FP16 fast path for LayerNorm by @yujonglee in #433
- Issue #406: Bilinear + Depthwise Optimizations by @PiyawanChaiprasit2006 in #466
- ARM SME2: Accelerate MatMul FP16 by @aarav18 in #457
- build: add Objective-C ARC support for NPU sources by @jakmro in #475
- long transcription by @jakmro in #482
- Language detection by @ParkiratS in #471
- Parakeet tdt by @ParkiratS in #476
- kotlin: expose forceTools in CompletionOptions by @rshemet in #484
- Update model list in README and publish_to_hf.yml with new LiquidAI m… by @jakmro in #487
- test: updated rag test conditions by @nshejwalkar in #488
- optimize scale correction in cactus_attention_f16_h64 by @jakmro in #485
- fix greedy sampler ignoring logit suppression by @jakmro in #486
New Contributors
- @PiyawanChaiprasit2006 made their first contribution in #466
- @aarav18 made their first contribution in #457
Full Changelog: v1.9...v1.10
v1.9
Whats New
- 50% faster int4
- Parakeet models
- LFM2-MOE models
- BugFixes
- Hybrid Inference
PRs
- fix stt test and add cpp ci by @yujonglee in #413
- add IRFFT by @yujonglee in #425
- fixed lfm2 vlm lmhead issue that came in with hf 5.0.0 by @kar-m in #426
- raspberry pi numebrs and linux fixes by @kar-m in #437
- Added parakeet model by @ParkiratS in #443
- Adding parakeet graph by @ParkiratS in #446
- Parakeet kernel by @ParkiratS in #445
- added cloud fallback and documentation+tests by @kar-m in #369
- Parakeet FFI by @ParkiratS in #447
- Parakeet convert and tests by @ParkiratS in #444
- Hybrid transcription blog post by @rshemet in #449
- Fixed missing engine changes by @ParkiratS in #453
- feat(python): add context manager support for safe resource cleanup by @yogyam in #412
- Completed ubuntu CICD pipeline by @ncylich in #455
- Tie-embed-conversion-fix by @ncylich in #454
- tiny graph fix and added benchmark by @kar-m in #456
Full Changelog: v1.8...v1.9
Breaking changes
Weights unfortunately need to be refreshed for this :(
v1.8
What's Changed
- Kernel optimisations by @HenryNdubuaku in #397
- Improve INT4 by @ncylich and @jrajala6 in #343
- add einops dependency to requirements by @jakmro in #371
- Add language parameter support for Whisper transcription by @rshemet in #384
- added moe support for lfm by @kar-m in #374
- Add raw FFI binding for Rust by @yujonglee in #382
- fix: handle spaces in paths when running shell commands by @adithya-n05 in #377
- fixing sentencepiece detection for transformers 5.0+ (still backwards compatible) by @ncylich in #373
- Improve Telemetry by @mhayes853 in #372
- proprietry commit by @HenryNdubuaku
- Update performance metrics for iPhone 13 Mini and Galaxy A56 by @jakmro in #386
- fix: improve version sorting and enhance model export tagging by @jakmro in #387
- Add Rust SDK and language parameter documentation by @rshemet in #389
- Basic addition of int4 functionality by @jrajala6 in #343
- add scalar log by @yujonglee in #390
- fix assertion and linux build in rust test by @yujonglee in #392
- Justin/api fixes by @justinl66 in #380
- Update telemetry by @justinl66 in #394
- docs: add compatibility guidelines for runtime and weights by @jakmro in #398
- add STFT_COMPLEX, derive stft_magnitude via graph composition by @yujonglee in #395
New Contributors
- @yujonglee made their first contribution in #382
- @adithya-n05 made their first contribution in #377
Full Changelog: v1.7...v1.8
Note:
This breaks the weights.
v1.7
What's Changed
- Brew setup @HenryNdubuaku
- Cactus auth @HenryNdubuaku
- Hybrid inference by the cactus team
- Karen/vlm fix by @kar-m in #311
- fixed moonshine state resetting and gemma3 4b layernorm loading by @kar-m in #317
- fix: LFM2 multiple tool calls by @mhayes853 in #316
- fix hf publish by @jakmro in #323
- update models list by @jakmro in #324
- Fixing pip command errors by @rshemet in #322
- Add instructions for installing Ruby version for xcodeproj gem by @jakmro in #327
- tests: remove duplicate vlm_multiturn test in runner by @AI-I224 in #332
- fix: replace NSLog with CACTUS_LOG for iOS NPU debuggability by @KayaanT in #328
- Kernel_attention optimization by @Ayan9074 in #319
- M4airbenchmarks by @Ayan9074 in #336
- docs: update cactus test command description for transcribe models (#297) by @AI-I224 in #339
- Accelerate FP16 matmul via cblas_sgemm for Apple AMX by @KayaanT in #340
- Fix hybrid attention sliding window for Gemma (#320) by @jrajala6 in #338
- bench: update README benchmark with M2 MacBook Air results by @vyomshah05 in #335
- docs: add iPad Pro (12.9") (6th Gen) benchmarks (#296) by @AI-I224 in #333
- removed unused graph i/o methods by @ncylich in #345
- feat: cpp-native telemetry by @justinl66 in #326
- Update CPP Telemetry to point to main DB by @justinl66 in #350
- update python bindings for stream transcribe by @jakmro in #351
- Update CPP Telemetry by @justinl66 in #352
- added only flag by @nshejwalkar in #347
- Added warmups and increased iterations for performance testing by @nshejwalkar in #355
- CMF Phone 2 Pro benchmarks by @jakmro in #356
- Vad by @jakmro in #353
- Cli reconvert by @jakmro in #357
- Asr cloud merging by @kar-m in #348
- Add optional cloud key prompt for transcribe by @rshemet in #359
- HF support multiple precision options by @jakmro in #361
- Add precision parameter to download_from_hf by @jakmro in #362
- revert silero download logic by @jakmro in #365
- Cactus clean now clears cache, Session metrics initialized properly for telemetry by @justinl66 in #363
- Curl prepack by @kar-m in #358
- Fix/f16 reduction accum by @vyomshah05 in #344
- Update telemetry by @justinl66 in #366
- Accelerate FP16 attention via cblas_sgemm for Apple AMX by @KayaanT in #346
New Contributors
- @AI-I224 made their first contribution in #332
- @jrajala6 made their first contribution in #338
- @vyomshah05 made their first contribution in #335
- @nshejwalkar made their first contribution in #347
Full Changelog: v1.6.0...v1.7
@mhayes853 API has breaking changes
v1.6
What's Changed
- Kernel Optimisations & advanced quantisation by @HenryNdubuaku
- Moonshine by @kar-m
- HF publish by @jakmro
- Streaming API by @jakmro
- Linux ARM support by @ncylich
- Stop generation on model end token by @Ayan9074
- i8MM runtime detection @mhayes853
FFI Note: This break API
v1.5
What's Changed
- Groupwise quantisation by @HenryNdubuaku
- Speech-To-Text streaming by @jakmro
- KV Quntisation by @HenryNdubuaku
- Evals by @justinl66 @ParkiratS
- INT4 support by @HenryNdubuaku
- Rust bindings by @mrsarac
Bindings: Please check Cactus FFIs again @jakmro @mrsarac @mhayes853