Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved sharktank runner to ossci cluster #990

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
3e53034
added print debugging
Feb 13, 2025
428686f
shortened tests for faster iterations
Feb 14, 2025
e4c9501
attempted fix
Feb 14, 2025
1cde838
fixed device issue
Feb 14, 2025
fd02a97
removed docker cleanup step
Feb 15, 2025
ab6b527
moved big test back to old runner
Feb 15, 2025
0fa650d
added ci-sharktank
Feb 15, 2025
c3bd7d2
add hf token
saienduri Feb 15, 2025
a5a143e
removed sharktank workflow because I dont have a HF token
Feb 17, 2025
2d93939
added back large test
Feb 17, 2025
d98ca17
reverted llama bench for merge
Feb 17, 2025
9127d59
updated hf token
Feb 17, 2025
c9c761e
reverted shark-tank
Feb 18, 2025
01bb4c1
addressed comments
Feb 18, 2025
922fc87
tried to fix path in sharktank
Feb 19, 2025
fa0ba0e
tried to fix path in sharktank
Feb 19, 2025
806dd2f
moved quark artifacts to writable mount
Feb 20, 2025
a1b1282
added permissions
Feb 20, 2025
0db2d98
tried to fix path in sharktank
Feb 20, 2025
bae7745
seeing if tests pass while removing delete line
Feb 21, 2025
e031685
attempted to fix prefills issue
Feb 21, 2025
7aaa179
attempted to fix prefills issue
Feb 21, 2025
a32102b
attempted to fix prefills issue
Feb 21, 2025
c45a397
tried to fix path in sharktank
Feb 21, 2025
89d3aef
tried to fix path in sharktank
Feb 21, 2025
18af41d
tried to fix path in sharktank
Feb 21, 2025
e1102ca
tried to fix path in sharktank
Feb 21, 2025
475309f
tried to fix path in sharktank
Feb 21, 2025
fd7b7bd
tried to fix path in sharktank
Feb 21, 2025
976b2ba
tried to fix path in sharktank
Feb 21, 2025
531d86f
tried to fix path in sharktank
Feb 21, 2025
2e70e71
tried to fix path in sharktank
Feb 21, 2025
b6a53d7
tried to fix path in sharktank
Feb 21, 2025
9beaea8
cleaned up pr
Feb 21, 2025
d403e53
cleaned up pr
Feb 21, 2025
aa517bf
addressed issues
Feb 24, 2025
fd56941
fixed bug
Feb 24, 2025
1443f4b
added print debug for mismatch
Feb 24, 2025
48a9d79
added in already passing tests
Feb 24, 2025
1dd8681
removed passing tests for faster iteration
Feb 24, 2025
7cb61cc
relaxed tolerance to see if it results in a pass
Feb 24, 2025
e127409
tried to fix issue
Feb 24, 2025
c8d0ef9
added back tests
Feb 24, 2025
76c7ddf
tried updating tokenizer.json
Feb 25, 2025
98723da
faster iterations
Feb 25, 2025
26be665
fixxed path
Feb 25, 2025
46c5b50
added back tests
Feb 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions .github/workflows/ci-sharktank.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,15 +93,16 @@ jobs:
strategy:
matrix:
python-version: [3.11]
runs-on: [llama-mi300x-3]
runs-on: [linux-mi300-1gpu-ossci]
fail-fast: false
runs-on: ${{matrix.runs-on}}
defaults:
run:
shell: bash
env:
VENV_DIR: ${{ github.workspace }}/.venv
HF_HOME: "/data/huggingface"
HF_HOME: "/shark-cache/data/huggingface"
HF_TOKEN: ${{ secrets.HF_FLUX_TOKEN }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

Expand Down Expand Up @@ -149,7 +150,7 @@ jobs:
sharktank/tests/models/vae/vae_test.py \
sharktank/tests/models/llama/quark_parity_test.py \
--durations=0 \
--timeout=800
--timeout=10000
# TODO: add back
# --with-t5-data \
# when #888 is resolved
Expand Down Expand Up @@ -193,7 +194,7 @@ jobs:
run: |
pytest -v sharktank/ -m punet_quick \
--durations=0 \
--timeout=600
--timeout=900

# Depends on other jobs to provide an aggregate job status.
# TODO(#584): move test_with_data and test_integration to a pkgci integration test workflow?
Expand Down
8 changes: 4 additions & 4 deletions sharktank/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,25 +191,25 @@ def pytest_addoption(parser):
parser.addoption(
"--google-t5-v1-1-small-f32-model-path",
type=Path,
default="/data/t5/small/google__t5-v1_1-small_f32.gguf",
default="/shark-dev/data/t5/small/google__t5-v1_1-small_f32.gguf",
help="Google T5 v1.1 small float32 model path",
)
parser.addoption(
"--google-t5-v1-1-small-bf16-model-path",
type=Path,
default="/data/t5/small/google__t5-v1_1-small_bf16.gguf",
default="/shark-dev/data/t5/small/google__t5-v1_1-small_bf16.gguf",
help="Google T5 v1.1 small bfloat16 model path",
)
parser.addoption(
"--google-t5-v1-1-xxl-f32-model-path",
type=Path,
default="/data/t5/xxl/google__t5-v1_1-xxl_f32.gguf",
default="/shark-dev/data/t5/xxl/google__t5-v1_1-xxl_f32.gguf",
help="Google T5 v1.1 XXL float32 model path",
)
parser.addoption(
"--google-t5-v1-1-xxl-bf16-model-path",
type=Path,
default="/data/t5/xxl/google__t5-v1_1-xxl_bf16.gguf",
default="/shark-dev/data/t5/xxl/google__t5-v1_1-xxl_bf16.gguf",
help="Google T5 v1.1 XXL bfloat16 model path",
)

Expand Down
8 changes: 6 additions & 2 deletions sharktank/tests/models/llama/quark_parity_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
class QuarkParityTest(unittest.TestCase):
def setUp(self):
super().setUp()
self.path_prefix = Path("/shark-dev/quark_test")
self.path_prefix = Path("/shark-cache/quark_test")

@with_quark_data
def test_compare_against_quark(self):
Expand Down Expand Up @@ -54,7 +54,7 @@ def test_compare_against_quark(self):
"sharktank.examples.paged_llm_v1",
"The capitol of Texas is",
f"--irpa-file={self.path_prefix}/fp8_bf16_weight.irpa",
f"--tokenizer-config-json=/data/llama3.1/8b/tokenizer.json",
f"--tokenizer-config-json=/shark-dev/data/llama3.1/weights/8b/tokenizer.json",
"--fake-quant",
"--attention-kernel=torch",
"--activation-dtype=bfloat16",
Expand All @@ -69,6 +69,10 @@ def test_compare_against_quark(self):
command, shell=True, capture_output=True, cwd=sharktank_dir
)

f_ = open("/shark-cache/quark_test/test0.txt", "w+")
f_.write(str(proc))
f_.close()

ours = dict()
with safe_open(our_path, "pytorch") as st:
for key in st.keys():
Expand Down
Loading