In the Quantize function (binarized_modules.py, line 57), I don't quite understand why the range for tensor.clamp_() is from -128 to 128 if I want to quantize them with numBits=8. Since all the outputs from previous layers go through a Hardtanh function, should they be in the range [-1, 1] instead? Also, how are they converted to 8 bits if they are in the range [-128, 128]? e.g. if the input tensor is 127.125 and numBits=8, tensor.mul(2**(numBits-1)).round().div(2**(numBits-1)) gives me 127.1250. How is that stored in 8 bits?
In the Quantize function (binarized_modules.py, line 57), I don't quite understand why the range for tensor.clamp_() is from -128 to 128 if I want to quantize them with numBits=8. Since all the outputs from previous layers go through a Hardtanh function, should they be in the range [-1, 1] instead? Also, how are they converted to 8 bits if they are in the range [-128, 128]? e.g. if the input tensor is 127.125 and numBits=8, tensor.mul(2**(numBits-1)).round().div(2**(numBits-1)) gives me 127.1250. How is that stored in 8 bits?