-
Notifications
You must be signed in to change notification settings - Fork 0
USE 137 - Use sentence-transformers library #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fedbdd2 to
8f5ca49
Compare
This command worked but did give me a truncated warning, not sure if it's significant: |
Thanks for surfacing @ehanson8, yes, this is known. I've been doing some reading and it sounds harmless. Furthermore, in our ECS deployed context, the container would shutdown immediately following the embeddings work anyways so mostly moot in that context. But good catch! |
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as expected and this is a lot cleaner overall so a very positive switch in my mind!
Why these changes are being introduced: With a pivot to sentence-transformers coming, rework the dependencies. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-137
Why these changes are being introduced: The first approach for the embedding class OSNeuralSparseDocV3GTE used the `transformers` library for creating embeddings. Some early exploratory code seemed to indicate this more low level library would provide more flexibility in response formats and better performance. However, when exploring using multiprocessing for the `transformers` library, the alternate approach of using the `sentence-transformers` library was explored given it's more out-of-the-box multiprocessing support. During that exploration, based on learnings since the original spike code, it was determined that the `sentence-transformers` library might be a better fit overall for our purposes. Pivoting to this library will simplify our actual embedding logic, while providing some out-of-the-box tuning capabilities that should be sufficient for our purposes. How this addresses that need: The embedding class `OSNeuralSparseDocV3GTE` is reworked to use `sentence-transformers` instead of `transformers` for creating embeddings. This reducues considerably complexity in the actual creating of embeddings, while also exposing an API for multiprocessing. It's worth noting that testing is indicating that multiprocessing will *not* speed up embeddings, at least for the contexts we aim to create them, but the `sentence-transformers` library also better handles parallelism without explicit multiprocessing. In summary, switching to `sentence-transformers` results in a simpler API for creating embeddings, better out-of-the-box performance, with an API that still allows for more tuning later. Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-137
8f5ca49 to
b859edd
Compare
NOTE: the +1,692 −598 line count diff is mostly the addition of a couple of fixtures and dependency churn.
Purpose and background context
This PR is a fairly substantial pivot for our first embedding class
OSNeuralSparseDocV3GTEto use the sentence-transformers library vs the lower level transformers library for creating embeddings.It was decided to revisit
sentence-transformerswhile exploring multiprocessing in USE-137, assentence-transformersprovides more out-of-the-box functionality tailored for our model and our purposes, including parallel processing. The learnings from this were complicated, but ultimately a good solution!On one hand, it became clear that manual multiprocessing with
transformers-- specifically, multiple processes vs threads -- was not demonstrating the performance gain that initial spike code had suggested. Still exploring why this might be the case, but is fairly consistent with articles and blog posts on the topic. That said, becausesentence-transformersdoes expose multiprocessing options more easily, it keeps the door open for tuning later on.On the other hand, and what tipped the scales, was the simplicity of
sentence-transformersversustransformers. Even the hugging face model card leads with ansentence-transformersexample.Assuming then that
sentence-transformersis a good fit for performance + ergonomics, what actually changes?Overall, while
transformerswas demonstrated to work, revisitingsentence-transformerssuggests it's a better fit for us and this application at this time. In addition to a simpler API to work with, testing also suggests better out-of-the-box performance without any complicated multiprocessing or batching.How can a reviewer manually see the effects of these changes?
For both local and docker tests, set env vars:
Note the lack of any performance tuning env vars; defaults are generally a good bet at this time.
If you haven't already, ensure to download the model:
Local Run
Invoke CLI via
uv:If interested, you can also test using MPS on Macs with apple silicon chips (e.g. M1 - M4).
First, run a job with more records to get a baseline time:
Then, re-run with the job with the env var
TE_TORCH_DEVICE=mpsfor a fairly significant speedup:Docker Run
First, build the image:
Invoke docker container:
Includes new or updated dependencies?
YES
Changes expectations for external applications?
NO
What are the relevant tickets?
Code review