-
Notifications
You must be signed in to change notification settings - Fork 22
Support model deployment #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I think those token names have been deprecated and replaced with |
Why the big uv.lock diff? Doesn't look like pyproject.toml changed? |
I assume someone ran uv sync on mac |
@@ -379,3 +384,43 @@ async def _experimental_push_to_s3( | |||
delete=delete, | |||
art_path=self._path, | |||
) | |||
|
|||
async def _experimental_deploy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a required parameter deploy_to
, which must be set to "together". Will help people understand what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this class:
class LoRADeploymentProvider(str, Enum):
TOGETHER = "together"
return None | ||
|
||
|
||
async def deploy_together( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an explicit check in this function that the base model is one that Together supports? I guess the user will figure it out eventually if they try one that isn't supported but would be nice to have it explicitly up front so you can take that into account when deciding which base model to train.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now returning a UnsupportedBaseModelDeploymentError
error.
I'm not an expert but according to this uv.lock is supposed to be cross-platform: https://docs.astral.sh/uv/guides/projects/#pyprojecttoml |
Might not be a platform issue, but I have noticed that running on my local mac messes up the uv lock and interferes with installation on a linux machine. Worth looking into, probably not in scope for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
For models that Together supports as serverless endpoints, we can now deploy LoRAs to Together and query them.
Changes
HUGGINGFACE_TOKEN
andHUGGING_FACE_HUB_TOKEN
, used for training gated modelsYOUR_TOGETHER_API_KEY
, used for uploading models to Togetherget_step_checkpoint_dir
archive_and_presign_step_url
for preparing checkpoint dir to upload to Together_experimental_deploy
function toLocalBackend
andBackend
tic-tac-toe-local.py
to deploy a model to Together and use it in a rolloutUnsupportedBaseModelDeploymentError
andLoRADeploymentTimedOutError
errorsLoRADeploymentJobStatusBody
typeSome relevant types:
TODO:
Test
_experimental_deploy
through SkyPilot