-
Notifications
You must be signed in to change notification settings - Fork 452
Adds unit tests which run on GPU for open-instruct. #905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3071602 to
efd6279
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be a GPU job for every single updated main? I can't tell but that feels like it would be a substantial number of jobs on the cluster :/
| tokenizer_name = "EleutherAI/pythia-14m" | ||
| tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) | ||
| test_prompt = "What is the capital of France?" | ||
| prompt_token_ids = tokenizer.encode(test_prompt, return_tensors="pt").tolist()[0] | ||
| param_prompt_Q = ray_queue.Queue(maxsize=1) | ||
| inference_results_Q = ray_queue.Queue(maxsize=1) | ||
| actor_manager = vllm_utils3.ActorManager.remote() | ||
| vllm_engines = create_vllm_engines( | ||
| num_engines=1, | ||
| tensor_parallel_size=1, | ||
| enforce_eager=True, | ||
| tokenizer_name_or_path=tokenizer_name, | ||
| pretrain=tokenizer_name, | ||
| revision="main", | ||
| seed=42, | ||
| enable_prefix_caching=False, | ||
| max_model_len=512, | ||
| vllm_gpu_memory_utilization=0.5, | ||
| prompt_queue=param_prompt_Q, | ||
| results_queue=inference_results_Q, | ||
| actor_manager=actor_manager, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure but feels like the tokenizer and vllm engine could be in the setup of the test
| actor_manager=actor_manager, | ||
| ) | ||
|
|
||
| generation_config = SamplingParams(temperature=0.0, top_p=1.0, max_tokens=5, n=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think its possible to get a deterministic output for Pythia 14m and specifically test for the expected completion for this prompt? Not sure how important that is
My attitude is to enable it and see if we get complaints. I could also make the trigger a lot more restrictive and have it only trigger on changes to grpo_fast or vllm_utils3.py. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
triggering on changes to grpo_fast or vllm3_utils makes sense to me
still feels like it could be overkill but I'm inclined to say let's try and then rollback if necessary
|
I accidentally clicked |
Pulls out all the tests which would be skipped if CUDA is not available into separate files.
Changes
pyproject.tomlso that now when we run pytest it only looks at test files that don't end in_gpu.py.Fixes some of the tests so they pass. Currently, takes ~7m to run the tests, and tests run on the cheap GPUs (L40/A6000).