Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runners2 #617

Merged
merged 262 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
262 commits
Select commit Hold shift + click to select a range
6b54188
cond
AlexCheema Dec 6, 2024
32cd1f1
give this a goh
AlexCheema Dec 6, 2024
9dc76ef
tooonygrad
AlexCheema Dec 6, 2024
976e5f2
disable mlx test for now..plan to run this on a self-hosted runner
AlexCheema Dec 6, 2024
deb80d2
clang for tinygrad
AlexCheema Dec 6, 2024
8302fd0
test runner
Dec 6, 2024
cb3d89e
test runner
Dec 6, 2024
7d223a0
matrix
Dec 6, 2024
90fd5c1
matrix
Dec 6, 2024
d154d37
add exo run
Dec 6, 2024
bdf417f
tweak
Dec 6, 2024
6b61fc6
tweak python install
Dec 6, 2024
1af28cb
fix
Dec 6, 2024
ce2ccdd
fix2
Dec 6, 2024
f9c2361
fix3
Dec 6, 2024
d16280d
debug
Dec 6, 2024
0739dc9
fix
Dec 6, 2024
3662ec4
fix
Dec 6, 2024
1dcc731
fix
Dec 6, 2024
ccc5415
try
Dec 6, 2024
64954aa
fixed
Dec 6, 2024
c3dfac6
debug
Dec 6, 2024
f7e0348
activate
Dec 6, 2024
19a7d5a
fix
Dec 6, 2024
cb3c147
fix
Dec 6, 2024
4cac1bb
quotes
Dec 6, 2024
faf0aae
jq
Dec 6, 2024
16b126d
fix
Dec 6, 2024
f087c0a
fix
Dec 6, 2024
9fc3358
path
Dec 6, 2024
acdee16
debug
Dec 6, 2024
4dd617a
shorter
Dec 6, 2024
6c08b32
nodebug
Dec 6, 2024
7b77ef0
flush
Dec 6, 2024
6dae3a4
conf
Dec 7, 2024
320892d
maxtok
Dec 7, 2024
7857103
aws
Dec 7, 2024
732ba91
new_conf
Dec 8, 2024
38bd003
fix
Dec 8, 2024
c138de0
job_name
Dec 8, 2024
c3c80c6
name
Dec 8, 2024
fe80749
fix
Dec 8, 2024
be8cbc0
trigger test
AlexCheema Dec 8, 2024
fb44eb0
simplify bench
AlexCheema Dec 8, 2024
755dd47
jobname
Dec 8, 2024
87865f0
list exo processes before test, warmup req in bench
AlexCheema Dec 8, 2024
fb8d870
t
AlexCheema Dec 8, 2024
c8f9372
model matrix
Dec 8, 2024
6bb7c11
enable debug
AlexCheema Dec 8, 2024
3687ba1
bench logs
AlexCheema Dec 8, 2024
3ccbdf1
add DEBUG_DISCOVERY
AlexCheema Dec 8, 2024
8e57f33
trigger test
AlexCheema Dec 8, 2024
b4f8649
bootstrap
Dec 8, 2024
903a5aa
fix
Dec 8, 2024
1716f63
test
Dec 8, 2024
b0977f9
t
AlexCheema Dec 8, 2024
cbac4d6
git version
AlexCheema Dec 8, 2024
fd05bca
lfs
AlexCheema Dec 8, 2024
f584e86
get rid of lfs stuff
AlexCheema Dec 8, 2024
b216819
remove
Dec 8, 2024
571b26c
allowed interface types
AlexCheema Dec 8, 2024
bd9d118
sleep before bench
AlexCheema Dec 8, 2024
b4e885b
test range
AlexCheema Dec 8, 2024
314a5d9
test 1
AlexCheema Dec 8, 2024
f6c2c37
test 2
AlexCheema Dec 8, 2024
e78a52d
test 3
AlexCheema Dec 8, 2024
cc74b1f
test 4
AlexCheema Dec 8, 2024
b69cb49
test 5
AlexCheema Dec 8, 2024
d93b8e8
test 6
AlexCheema Dec 8, 2024
af6048e
test 7
AlexCheema Dec 8, 2024
9ba8bbd
test 8
AlexCheema Dec 8, 2024
3cf28f8
test 9
AlexCheema Dec 8, 2024
38eaecf
test 10
AlexCheema Dec 8, 2024
e78ef75
test 11
AlexCheema Dec 8, 2024
d714e40
test 12
AlexCheema Dec 8, 2024
286db87
test 13
AlexCheema Dec 8, 2024
a4b221d
test 14
AlexCheema Dec 8, 2024
3108434
test 15
AlexCheema Dec 8, 2024
8c7c156
test 16
AlexCheema Dec 8, 2024
4d6af6e
test 17
AlexCheema Dec 8, 2024
29d9df0
test 18
AlexCheema Dec 8, 2024
53edb85
test 19
AlexCheema Dec 8, 2024
8a5d212
test 20
AlexCheema Dec 8, 2024
5a4d128
trigger test
AlexCheema Dec 9, 2024
1e869a0
trigger test
AlexCheema Dec 10, 2024
8269b4b
t
AlexCheema Dec 11, 2024
16d9839
test {i}
AlexCheema Dec 11, 2024
4f4ac0f
test 21
AlexCheema Dec 11, 2024
6030b39
test 22
AlexCheema Dec 11, 2024
23dd5de
test 23
AlexCheema Dec 11, 2024
5d3be3c
test 24
AlexCheema Dec 11, 2024
fc26ad4
test 25
AlexCheema Dec 11, 2024
070b163
test 26
AlexCheema Dec 11, 2024
949055d
test 27
AlexCheema Dec 11, 2024
04bc163
test 28
AlexCheema Dec 11, 2024
0e32a62
test 29
AlexCheema Dec 11, 2024
18e7919
test 30
AlexCheema Dec 11, 2024
23158a4
add branch name to results
AlexCheema Dec 11, 2024
a84cba4
Merge remote-tracking branch 'origin/main' into runners
AlexCheema Dec 11, 2024
afe71c0
check gpu usage
AlexCheema Dec 11, 2024
cb40eb2
more robust configure_mlx.sh
AlexCheema Dec 11, 2024
ba96413
bootstrap script tweaks
AlexCheema Dec 11, 2024
e2d3a90
runner-token typo
AlexCheema Dec 11, 2024
c938efb
t
AlexCheema Dec 11, 2024
f7122d4
add system_status check to bench
AlexCheema Dec 11, 2024
cff03fc
perf diag
AlexCheema Dec 11, 2024
bbb5846
Test on m4
AlexCheema Dec 11, 2024
6169996
test
AlexCheema Dec 11, 2024
b7bab80
test2
AlexCheema Dec 11, 2024
41902f7
tweaks
AlexCheema Dec 11, 2024
e501eea
tweak install
AlexCheema Dec 11, 2024
668766f
t
AlexCheema Dec 11, 2024
3b1ea19
use .venv exo
AlexCheema Dec 11, 2024
7b2282d
run without debug flag
AlexCheema Dec 11, 2024
e680e8a
fix name
AlexCheema Dec 11, 2024
3789758
t
AlexCheema Dec 11, 2024
9848a45
TT
AlexCheema Dec 11, 2024
7b99cb4
t
AlexCheema Dec 11, 2024
a4bb4bb
update bootstrap
AlexCheema Dec 11, 2024
9dd33d3
t
AlexCheema Dec 11, 2024
8d9e3b8
t
AlexCheema Dec 11, 2024
1dbe11c
t
AlexCheema Dec 11, 2024
6bb3893
tt
AlexCheema Dec 11, 2024
0904cda
ttt
AlexCheema Dec 11, 2024
cacf50c
tttt
AlexCheema Dec 11, 2024
739b7d1
tttttt
AlexCheema Dec 11, 2024
7c0c5ef
ttttttt
AlexCheema Dec 11, 2024
63da9fc
a
AlexCheema Dec 11, 2024
d6c2146
t
AlexCheema Dec 11, 2024
9a11e27
ttt
AlexCheema Dec 11, 2024
97ffb83
t
AlexCheema Dec 11, 2024
d95f40b
a
AlexCheema Dec 11, 2024
cdae702
t
AlexCheema Dec 11, 2024
a932afc
oi
AlexCheema Dec 11, 2024
b1142d4
t
AlexCheema Dec 11, 2024
6acfb81
t
AlexCheema Dec 11, 2024
5dee5e5
t
AlexCheema Dec 11, 2024
26351e7
t
AlexCheema Dec 11, 2024
e698ef6
t
AlexCheema Dec 11, 2024
61c0963
t
AlexCheema Dec 11, 2024
dd3fd27
t
AlexCheema Dec 11, 2024
5a1a0f5
t
AlexCheema Dec 11, 2024
6cf2af3
t
AlexCheema Dec 11, 2024
9067741
t
AlexCheema Dec 11, 2024
d0b7f1b
t
AlexCheema Dec 11, 2024
741c318
test
AlexCheema Dec 11, 2024
6249bee
tes
AlexCheema Dec 11, 2024
225dcba
t
AlexCheema Dec 11, 2024
92edfa5
t
AlexCheema Dec 11, 2024
83470a9
t
AlexCheema Dec 11, 2024
83892d5
t
AlexCheema Dec 11, 2024
20e3065
les goh
AlexCheema Dec 11, 2024
e63c224
testtt
AlexCheema Dec 11, 2024
3f6ef1c
single node test 1
AlexCheema Dec 11, 2024
fe506a5
single node test 2
AlexCheema Dec 11, 2024
fb7a0de
single node test 3
AlexCheema Dec 11, 2024
6f097c9
single node test 4
AlexCheema Dec 11, 2024
f89b85b
single node test 5
AlexCheema Dec 11, 2024
8b47a9d
single node test 6
AlexCheema Dec 11, 2024
b23c3fd
single node test 7
AlexCheema Dec 11, 2024
32ff3ef
single node test 8
AlexCheema Dec 11, 2024
9f1393d
single node test 9
AlexCheema Dec 11, 2024
c5c27a3
single node test 10
AlexCheema Dec 11, 2024
6c322ac
single node test 11
AlexCheema Dec 11, 2024
3fda05a
single node test 12
AlexCheema Dec 11, 2024
f22bc99
single node test 13
AlexCheema Dec 11, 2024
0bd44c0
single node test 14
AlexCheema Dec 11, 2024
c65d1d9
single node test 15
AlexCheema Dec 11, 2024
8408c84
single node test 16
AlexCheema Dec 11, 2024
76196b8
single node test 17
AlexCheema Dec 11, 2024
92e2b74
single node test 18
AlexCheema Dec 11, 2024
279354a
single node test 19
AlexCheema Dec 11, 2024
bba0aa0
single node test 20
AlexCheema Dec 11, 2024
8cb7327
re-enable m4 cluster run
AlexCheema Dec 12, 2024
1194db6
m3
AlexCheema Dec 12, 2024
8c6d37d
m4 cluster test
AlexCheema Dec 12, 2024
f9f7612
better bench system info
AlexCheema Dec 12, 2024
eeecdcb
try a different taskpolicy
AlexCheema Dec 12, 2024
2abe57b
grasping at straws
AlexCheema Dec 12, 2024
dbb7ad3
run with three m4 pro
AlexCheema Dec 12, 2024
9472ab0
t
AlexCheema Dec 12, 2024
b6f2385
run llama-3.1-8b on 3 m4 pro cluster
AlexCheema Dec 12, 2024
2ff4638
Merge remote-tracking branch 'origin/main' into runners
AlexCheema Dec 12, 2024
e5d54c7
add llama-3.3-70b to 3 M4 Pro cluster
AlexCheema Dec 12, 2024
0c6ab35
increase timeout of http request in bench.py up to 10 mins
AlexCheema Dec 14, 2024
a930921
set max-generate-tokens to 250
AlexCheema Dec 14, 2024
25b4af7
Merge branch 'main' into runners
Dec 14, 2024
f55a53a
one token at a time
AlexCheema Dec 14, 2024
cb4615c
fix SendNewToken
AlexCheema Dec 14, 2024
06c2e23
rip out stats bloat
AlexCheema Dec 14, 2024
08912d1
Only collect topology if peers changed
blindcrone Dec 15, 2024
9397464
add commit to results
AlexCheema Dec 15, 2024
64365d6
one two and three m4 pro clusters
AlexCheema Dec 15, 2024
c9ded9b
optimise networking, remove bloat
AlexCheema Dec 16, 2024
804ad47
upgrade mlx
AlexCheema Dec 16, 2024
063964a
remove redundant sample_logits, put back opaque status for process_pr…
AlexCheema Dec 16, 2024
c0534b6
Merge commit: trigger test
AlexCheema Dec 16, 2024
bfa06ee
Merge commit: trigger test
AlexCheema Dec 16, 2024
bf1aafd
Merge commit: trigger test
AlexCheema Dec 16, 2024
41eaaec
Merge commit: trigger test
AlexCheema Dec 16, 2024
b49c4ca
Merge commit: trigger test
AlexCheema Dec 16, 2024
427d071
Merge commit: trigger test
AlexCheema Dec 16, 2024
34ecbbe
Merge commit: trigger test
AlexCheema Dec 16, 2024
bd0febe
Merge commit: trigger test
AlexCheema Dec 16, 2024
99a70f1
Merge commit: trigger test
AlexCheema Dec 16, 2024
8d94b8a
trigger test
AlexCheema Dec 16, 2024
35d90d9
Merge remote-tracking branch 'origin/main' into runners
AlexCheema Dec 16, 2024
b17faa8
dont broadcast every single process_tensor
AlexCheema Dec 16, 2024
036224f
add topology to tinychat ui
AlexCheema Dec 16, 2024
1b14be6
make device_capabilities async running on a thread pool
AlexCheema Dec 16, 2024
e2474c3
fail if we never get the desired node count
AlexCheema Dec 16, 2024
58f0a0f
optimise grpc parameters
AlexCheema Dec 17, 2024
0a07223
switch to uvloop (faster asyncio event loop) and optimise grpc settings
AlexCheema Dec 17, 2024
3a58576
make sure this is actually doing something
AlexCheema Dec 17, 2024
1f108a0
remove test sleep
AlexCheema Dec 17, 2024
198308b
more robust udp broadcast
AlexCheema Dec 17, 2024
7ac4004
change it back to collecting topology periodically even if peers dont…
AlexCheema Dec 17, 2024
2f0b543
add peer connection info to tinychat
AlexCheema Dec 17, 2024
023ddc2
support different network interface tests
AlexCheema Dec 17, 2024
218c1e7
Merge branch 'main' into runners2
AlexCheema Jan 20, 2025
6b8cd05
fix some issues with results
AlexCheema Jan 20, 2025
461e4f3
Merge remote-tracking branch 'origin/main' into runners2
AlexCheema Jan 22, 2025
97f3bad
fix peer_handle
AlexCheema Jan 22, 2025
d80324f
disable test-m3-single-node
AlexCheema Jan 22, 2025
98d6e98
add back .circleci
AlexCheema Jan 22, 2025
09e12d8
temporarily disable github runner benchmarks
AlexCheema Jan 22, 2025
9954ce8
fix treating token as a list
AlexCheema Jan 22, 2025
55d1846
clean up DEBUG=2 logs, a few fixes for token
AlexCheema Jan 22, 2025
87d1271
fix stream: false completion
AlexCheema Jan 22, 2025
3a4bae0
fix issue with eos_token_id
AlexCheema Jan 22, 2025
8ab9977
fix stable diffusion case for tui, make mlx run on its own thread aga…
AlexCheema Jan 22, 2025
9ba8bbb
fix filter to include 169.254.* since thats what mac uses for ethernet
AlexCheema Jan 22, 2025
bbb6856
fix check for sd2.1
AlexCheema Jan 22, 2025
f8db4e1
fix check for sd2.1
AlexCheema Jan 22, 2025
dc5cdc4
add back opaque
AlexCheema Jan 22, 2025
112dea1
add back the benchmarks baby
AlexCheema Jan 23, 2025
2391051
remove kern.timer.scan_interval from bootstrap.sh
AlexCheema Jan 23, 2025
cc78738
remove kern scan intervals
AlexCheema Jan 23, 2025
d54e19c
runners back
AlexCheema Jan 23, 2025
5c9bcb8
set GRPC_VERBOSITY=error; TRANSFORMERS_VERBOSITY=error
AlexCheema Jan 23, 2025
a8a9e3f
explicitly enable TOKENIZERS_PARALLELISM=true
AlexCheema Jan 23, 2025
790c08a
add linux tinygrad test
AlexCheema Jan 23, 2025
8484eb4
fix config
AlexCheema Jan 23, 2025
495987b
beef up the instance
AlexCheema Jan 23, 2025
209163c
add linux tinygrad test
AlexCheema Jan 23, 2025
e57fa1d
xlarge
AlexCheema Jan 23, 2025
b2764f1
linux install
AlexCheema Jan 23, 2025
200ff4d
linux install
AlexCheema Jan 23, 2025
dfd9d3e
linux install
AlexCheema Jan 23, 2025
88ac12d
install clang test
AlexCheema Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,33 @@ jobs:
prompt: "Keep responses concise. Who was the king of pop?"
expected_output: "Michael Jackson"

chatgpt_api_integration_test_tinygrad_linux:
machine:
image: ubuntu-2204:current
resource_class: xlarge
steps:
- checkout
- run:
name: Set up Python
command: |
sudo apt-get update
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install -y python3.12 python3.12-venv clang
python3.12 -m venv env
source env/bin/activate
- run:
name: Install dependencies
command: |
source env/bin/activate
pip install --upgrade pip
pip install .
- run_chatgpt_api_test:
inference_engine: tinygrad
model_id: llama-3.2-1b
prompt: "Keep responses concise. Who was the king of pop?"
expected_output: "Michael Jackson"

measure_pip_sizes:
macos:
xcode: "16.0.0"
Expand Down Expand Up @@ -342,5 +369,6 @@ workflows:
- discovery_integration_test
- chatgpt_api_integration_test_mlx
- chatgpt_api_integration_test_tinygrad
- chatgpt_api_integration_test_tinygrad_linux
- chatgpt_api_integration_test_dummy
- measure_pip_sizes
Loading
Loading