Torch Implementation of the Sampler on the frog branch + torch eval script + torch implementation bugfix + tests #72

nreHieW · 2024-10-12T02:24:37Z

This PR does the following:

Implement the sampler on the frog Branch
Implement a working evaluation script to be used with the Eleuther Eval Harness. Verified to work with eq_bench
Update the torch implementation to match the Jax implementation. This should fix Torch doesn't work on mac. #50 as well.
- (For documentation purposes): Specifically, the existing implementation has issues with how the q k v dtypes are handled after RoPE. In Jax, the first time the kvcache is populated (when cur_pos = 0), the keys and values are in float32. For cur_pos != 0, the cache is in bf16 and jax automatically converts to fp32 to perform xq @ k in fp32. For torch, even though post-RoPE keys are in fp32, the cache buffers are in bf16 and the update method will return bf16. So we will need to explicitly cast to fp32 to match the jax implementation.
Add tests to check that the torch implementation matches jax
- Because of bf16, jax, jit and torch things, tests are done in fp32 with jit except for attention which compares the torch version with the non jit jax version.
- Note: The test_each_layer test might fail around 3% of the time due to 'unluckily' initialised inputs. Even so, this fails with < 0.5% mismatched elements (the number of mismatched elements is < 5).
Other QOL changes to match main.py

Let me know if a PR to main is preferred and I'll update!

nreHieW added 11 commits October 8, 2024 21:31

bugfix dtype

a6f45bf

make dtype dynamic

a74bee9

cleanup imports

9d4487c

revert to axis = 1

6e42170

torch frog

4afc991

add eval

974da2c

add_eval

045fd99

Merge remote-tracking branch 'upstream/frog' into torch_frog

7e42ecf

torch eval script

0c96b7e

torch impl fixes

c353a73

add comment

4ceda29

nreHieW mentioned this pull request Oct 12, 2024

Bugfixes surrounding torch dtypes and QOL updates for torch #53

Closed

xjdr-alt deleted the branch xjdr-alt:frog November 4, 2024 18:28

xjdr-alt closed this Nov 4, 2024

Provide feedback