Skip to content

Commit 06d9d88

Browse files
danieldkdrbh
andauthored
Disable Cachix pushes (#3312)
* Disable Cachix pushes This is not safe until we have sandboxed builds. For TGI alone this might not be a huge issue, but with Cachix caching disabled in hf-nix, TGI CI would build all the packages and push it to our cache. * fix: bump docs --------- Co-authored-by: drbh <[email protected]>
1 parent 8801ba1 commit 06d9d88

File tree

4 files changed

+10
-8
lines changed

4 files changed

+10
-8
lines changed

.github/workflows/nix_build.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
with:
2424
name: huggingface
2525
# If you chose signing key for write access
26-
authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
26+
# authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
2727
env:
2828
USER: github_runner
2929
- name: Build

.github/workflows/nix_cache.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ jobs:
2222
with:
2323
name: huggingface
2424
# If you chose signing key for write access
25-
authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"
25+
#authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"
2626
env:
2727
USER: github_runner
2828
- name: Build impure devshell

.github/workflows/nix_tests.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,11 @@ jobs:
2727
with:
2828
name: huggingface
2929
# If you chose signing key for write access
30-
authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
30+
#authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
3131
env:
3232
USER: github_runner
33+
- name: Nix info
34+
run: nix-shell -p nix-info --run "nix-info -m"
3335
- name: Build
3436
run: nix develop .#test --command echo "Ok"
3537
- name: Pre-commit tests.

docs/source/reference/launcher.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,6 @@ Options:
5858
Quantization method to use for the model. It is not necessary to specify this option for pre-quantized models, since the quantization method is read from the model configuration.
5959

6060
Marlin kernels will be used automatically for GPTQ/AWQ models.
61-
62-
[env: QUANTIZE=]
6361

6462
Possible values:
6563
- awq: 4 bit quantization. Requires a specific AWQ quantized model: <https://hf.co/models?search=awq>. Should replace GPTQ models wherever possible because of the better latency
@@ -72,6 +70,8 @@ Options:
7270
- bitsandbytes-nf4: Bitsandbytes 4bit. Can be applied on any model, will cut the memory requirement by 4x, but it is known that the model will be much slower to run than the native f16
7371
- bitsandbytes-fp4: Bitsandbytes 4bit. nf4 should be preferred in most cases but maybe this one has better perplexity performance for you model
7472
- fp8: [FP8](https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/) (e4m3) works on H100 and above This dtype has native ops should be the fastest if available. This is currently not the fastest because of local unpacking + padding to satisfy matrix multiplication limitations
73+
74+
[env: QUANTIZE=]
7575

7676
```
7777
## SPECULATE
@@ -456,14 +456,14 @@ Options:
456456
```shell
457457
--usage-stats <USAGE_STATS>
458458
Control if anonymous usage stats are collected. Options are "on", "off" and "no-stack" Defaul is on
459-
460-
[env: USAGE_STATS=]
461-
[default: on]
462459
463460
Possible values:
464461
- on: Default option, usage statistics are collected anonymously
465462
- off: Disables all collection of usage statistics
466463
- no-stack: Doesn't send the error stack trace or error type, but allows sending a crash event
464+
465+
[env: USAGE_STATS=]
466+
[default: on]
467467

468468
```
469469
## PAYLOAD_LIMIT

0 commit comments

Comments
 (0)