Bugfixes surrounding torch dtypes and QOL updates for torch #53

nreHieW · 2024-10-09T02:05:27Z

This PR does the following:

Bugfixes the conflicting torch dtypes in the attention computation by casting the returned key and values from the kv cache to the input types. This matches the JAX implementation where key, values are stored in fp32 and fp16 respectively post ROPE. (thanks @tensorqt for bringing this up) (should also close Torch doesn't work on mac. #50)
qol fixes surrounding torch_main.py cleaning up imports and setting up prompt testing similar to main.py
Fixes changes from Bugfix to varentropy calculation #17 being overwritten

nreHieW · 2024-10-12T02:25:57Z

closing in favor of #72

nreHieW added 4 commits October 8, 2024 21:31

bugfix dtype

a6f45bf

make dtype dynamic

a74bee9

cleanup imports

9d4487c

revert to axis = 1

6e42170

nreHieW closed this Oct 12, 2024

Provide feedback