-
Notifications
You must be signed in to change notification settings - Fork 0
Add ChromaDB service #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rust/dev
Are you sure you want to change the base?
Conversation
ca041a5 to
991f268
Compare
991f268 to
4297d88
Compare
| StringMeta(String), | ||
| NumberMeta(i64), | ||
| NullMeta, | ||
| // TODO (chase): Support floating point numbers once Rholang does? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ever in the roadmap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean in the roadmap for Rholang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
| // TODO (chase): Do we need custom options? i.e custom database name, authentication method, and url? | ||
| // If the chroma db is hosted alongside the node locally, custom options don't make much sense. | ||
| let client = ChromaClient::new(ChromaClientOptions::default()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would custom options for database name, authentication method and URL ever be required? It wouldn't make sense for these to exist if the ChromaDB is being hosted alongside the node in a sort of tightly coupled manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make these configuration options that can be put in shared-rnode-runtime.conf and shared-rnode.conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the rust impl of Rholang doesn't have any hooks set up with the configurations - the only configurations it seems to take are from the CLI options.
It might be better to wait till the config infra is set up (assuming it's in the works) before adding configs. Otherwise, setting up the config infra will add some time on our end.
| // The embedding are currently auto-filled by a pre-chosen embedding function. | ||
| embeddings: None, | ||
| }; | ||
|
|
||
| // We'll use OpenAI to generate embeddings. | ||
| let embeddingsf = OpenAIEmbeddings::new(Default::default()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here: The embeddings are currently auto generated instead of user-provided from Rholang code. This is to avoid requiring too much input data in Rholang code to drive up the script costs.
It currently chooses OpenAI to generate said embeddings. The choice is due to two reasons:
- It's already supported by the chromaDB library
- OpenAI API is already used elsewhere in this codebase (the openai service)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to support Rholang code provided embeddings - we should figure out how we want to represent the embeddings. Usually, they would be floating point numbers - but that's not supported in Rholang. It's possible to use integers as limited precision fractional numbers - not particularly clean though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would be good to support embeddings in Rholang, but I do think we should support BERT embeddings as well (see the original task description for details). We can allow the users to set which source of embeddings they'd like to use in the configuration files
React
| // TODO (chase): How to define overloads? | ||
| // This function can support 4 or 3 arguments (including ack) (second to last one is optional). | ||
| arity: 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rholang seems to support overloads for user defined "contracts" but I don't see a(n easy) way to support overloads for service methods. It'd be nice to support overloads for create_collection because its last parameter is an optional.
Not a high priority though, since the implementation treats supports both RNil and {} (empty map literal) as "no metadata argument provided"
| } | ||
| } | ||
|
|
||
| pub struct RhoList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why these didn't exist already - but they're helpful to be provided.
| } | ||
|
|
||
| pub trait Extractor<RhoType> { | ||
| pub trait Extractor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type parameter was entirely redundant. It might have been relevant for Scala, but it's not for Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suspicious of this... Mind asking about this in the mlabs-sms channel in that Discord group I added you to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the tests still run?
| // these bytes may need to change during finalization. | ||
| pub fn chroma_create_collection() -> Par { | ||
| byte_name(25) | ||
| } | ||
|
|
||
| pub fn chroma_get_collection_meta() -> Par { | ||
| byte_name(26) | ||
| } | ||
|
|
||
| pub fn chroma_upsert_entries() -> Par { | ||
| byte_name(27) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will have to change after nunet implementation has been merged to prevent overlapping bytes.
With that said, the Nunet implementation uses weird byte names. They are not continuous. See: https://github.com/F1R3FLY-io/f1r3node/pull/240/files#diff-0ff9b3c4f760d8d6f04ee13fd2d33f196b7ac795019a91b5bc8202dce9626ecbR162
|
I have added a bunch of runnable examples under Ensure there is a chroma db running locally in the background, as well as |
Overview
Adds ChromaDB integration to the node as well as primitives to Rholang.
Notes
This will require adding parsers for lists and maps into rho types. Those are currently missing. Some of the argument types for the service will need to be list or maps. Currently, there aren't any system processes/services that require complex types - it seems.
Please make sure that this PR:
Bors cheat-sheet:
bors r+runs integration tests and merges the PR (if it's approved),bors tryruns integration tests for the PR,bors delegate+enables non-maintainer PR authors to run the above.