You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
register_backend: registered backend CUDA (1 devices)
register_device: registered device CUDA0 (NVIDIA A100-PCIE-40GB)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz)
version: 4667 (d2fe216)
built with gcc (GCC) 12.2.0 for x86_64-pc-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
Test code
Command line
`./bin/test-backend-ops`
Problem description & steps to reproduce
Test failure was encountered while running MUL_MAT trough test-backend-ops.
The failing mulmat configuration was identified as MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]) Test case created Here
Failures seemed random, consecutive runs of test-backend-ops did not reproduce the error. Modifying the test-backend-ops.cpp by adding the mul_mat test case 1000 times was able to reproduce the failing test consistently (At least a few out of the 1000 cases would fail)
// Example of adding failing mul_mat casefor (int i = 0; i < 1000; i++) {
test_cases.emplace_back(newtest_mul_mat(GGML_TYPE_Q5_1, GGML_TYPE_F32, 16, 1, 256, {1, 1}, {1, 1}));
}
The test fails due to NMSE being over the maximum error threshold.
The nvidia backend seems to convert the src1 to a Q8_1 type and then run mul_mat with inputs Q5_1 and Q8_1. Could this be causing the precision issue?
The largest encountered NMSE from 20000 runs was identified as 0.001409
Is the loss of precision expected to this degree? The max error for the mul_mat tests is set to 5e-4. Should this be modified?
First Bad Commit
Due to the sporadic nature of the test failure, the commit (d2fe216) was the first one where the failure was encountered, and currently the origin is not identified. Latest commit that was tested and error was reproduced is (4806498)
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
register_backend: registered backend CUDA (1 devices)
register_device: registered device CUDA0 (NVIDIA A100-PCIE-40GB)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz)
version: 4667 (d2fe216)
built with gcc (GCC) 12.2.0 for x86_64-pc-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
Test code
Command line
`./bin/test-backend-ops`
Problem description & steps to reproduce
Test failure was encountered while running MUL_MAT trough
test-backend-ops
.MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3])
Test case created Heretest-backend-ops
did not reproduce the error. Modifying thetest-backend-ops.cpp
by adding the mul_mat test case 1000 times was able to reproduce the failing test consistently (At least a few out of the 1000 cases would fail)The nvidia backend seems to convert the
src1
to aQ8_1
type and then run mul_mat with inputsQ5_1
andQ8_1
. Could this be causing the precision issue?The largest encountered NMSE from 20000 runs was identified as
0.001409
Is the loss of precision expected to this degree? The max error for the mul_mat tests is set to
5e-4
. Should this be modified?First Bad Commit
Due to the sporadic nature of the test failure, the commit (d2fe216) was the first one where the failure was encountered, and currently the origin is not identified. Latest commit that was tested and error was reproduced is (4806498)
Relevant log output
MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.000508874 > 0.000500000 0 0.948417 1.035245, diff = -0.086828 1 -2.924956 -2.844111, diff = -0.080845 2 -1.777758 -1.695090, diff = -0.082667 3 0.450649 0.537106, diff = -0.086457 4 -4.114096 -4.030904, diff = -0.083191 5 -0.682358 -0.596930, diff = -0.085428 6 -8.252451 -8.167437, diff = -0.085014 7 -0.692235 -0.606851, diff = -0.085384 8 -5.382234 -5.304606, diff = -0.077628 9 3.467584 3.552903, diff = -0.085320 10 -7.941753 -7.861615, diff = -0.080138 11 3.101702 3.186424, diff = -0.084722 12 0.954475 1.037351, diff = -0.082876 13 2.353770 2.437956, diff = -0.084186 14 -1.223359 -1.139174, diff = -0.084185 15 0.853322 0.939753, diff = -0.086431
The text was updated successfully, but these errors were encountered: