docker build -t %something%/%sub%:v0 .
Docker run -v %path to a working directory with sharded webdataset for model%:/datasets/%model name% %something%/%sub%:v0
cd /datasets
python train.py %model name%
NOTE: this produces whole torch model as "trained_model" and model state only as "model.safetensors" ( the latter is untested )
Then copy an image into %path to a working directory with sharded webdataset for model% TODO: create a script that handles this, by taking an image as a parameter, and copying it into a mounted working directory for the docker run, as another parameter ( first arg probably ) and copies the file to the working dir, maybe copies the shards in and runs the docker image mounting that working dir as a volume and then exits once inference is done, and ideally writes inference out and can handle multiple images, which is not currently true. Ideally we can iterate ona batch of images
THOUGh longer term, it'd be nice if it can just monitor a directory
and infer whenever images are there.. then you could capture data
at some interval, write it to that location or upload it to a service
hosted by the docker image and get your inference response.
That could be used to embed the inference into an app as long as it can
capture a screenshot of a work area in the standard size and shape
that is needed for the trained model.
This would also be something we do for train so just pas in
the path to your shards container, into a script, it gets
mounted into the docker context and trained and the output written
to that same working dir and then that can be copied to some other
location on disk if you like or left int he working dir
WOuld also be nice to create an online UI to let a service in the docker
env , manage creating, editing, or allocating a simple web dataset
Then you can just run the image, give it a work directory,
open up the web ui, and then you have a train and infer tab
on train select your shard and update the model for it
on infer select the model, upload a batch of images or single image
and hit infer, all with a rest API that can be called with some simple creds
or an auth token or something, to do this from some kind of non ai related app
to embed some ai inference an/dor real time training. Datasets
could be patched/updated in real time as new data comes in, and inferred against
also in real time.
And ideally you can use it just as a web dataset constructor, so
you can just manage your dataset and download the latest version.
python infer.py %model name% test-input-image.png
NOTE: This consumes the trained_model and so that must be generated first
I don't love python a problem because it was selected early on as the defacto standard for AI Anyway I may create a CLI wrapper script for all the critical python libraries, so that I can manage the logic in java or nodejs + typescript.
Just exposes a set of methods you can invoke with args for each and lets you run them from an external process.
Though I will probably clean up and leave the initial train and infer pyscripts in place in case anyone wants them as reference.
But then I can do all the actual logic in java.
Ideally we'd actually port the libraries and ports may exist. As well as other wrappers frankly. So something else to consider. But a port would naturally be far more involved and wouldn't have as much support as everyone else is doing python.
THOUGH that may be less of an issue as the tech has kind of matured, and so may change less rapidly as far as the tooling goes. THe models and model architecture will continue to evolve but the actual , organizing data, getting it into the model, and inferring with it, "interface" code should probably be pretty stable at this point, and that's all this really is.
Also ideally we can handle models of all types and all types of data transformation
image to text text to image image to more different image text to more different text ( llm ) image to classification ( in this initial case a single class, with numeric weight ) text to video ( currently just text to clip, so may need another layer) image to video clips to longer cut video ext to multiple clips, each clip then composed to video audio.... whatever really that you'd need to convert with all the tools for fine tuning, adapting, patching, iterating on theoutput of, ..etc
And it can like comfy UI have a pipeline for chaining all those together.
This could really just be an extensin for that ui.
But I think I want to take a slightly different approach, but there will be some overlap.
And I like the idea of realtime training.
so host your model and you can have a live service that will feed in new data and outcomes as results as they are received, live updating your model.. maybe snapshotting every n intervals fo time.
Retaining the last n model versions as well as maybe any tagged as exceptionally good.
So being able to have a web ui to send in data, image, text whatever, and the real time learned outcome once the outcome is known, and have inference come back at you, in real time could be useful
COuld even be linked to inference requests.
So you are just constantly sending reference requests. And you get an id for your inference request.
And once you know the outcome of that inference, the actual truth,
then you send that back with the inference id, and it's added to your dataset.