-
Notifications
You must be signed in to change notification settings - Fork 4.3k
feat(oracle): add new Oracle connector for Semantic Kernel #13229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@microsoft-github-policy-service agree [company="Oracle"] |
@microsoft-github-policy-service agree company="Oracle" |
@@ -0,0 +1,1349 @@ | |||
# Copyright (c) 2025, Oracle Corporation. All rights reserved. | |||
|
|||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it.
VectorSearchExecutionException, | ||
VectorStoreOperationException | ||
) | ||
from semantic_kernel.exceptions.memory_connector_exceptions import MemoryConnectorConnectionException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from semantic_kernel.exceptions.memory_connector_exceptions import MemoryConnectorConnectionException | |
from semantic_kernel.exceptions import MemoryConnectorConnectionException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated code.
async connection pools for Oracle. | ||
""" | ||
|
||
user: str | None = Field(default=None, validation_alias=ORACLE_USER_ENV_VAR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the intended use of a KernelBaseSettings object, make sure in this class to set:
env_prefix: ClassVar[str] = "ORACLE_"
and then each of the parameters will be prefixed by that + the name of the param capatilized. And then you can remove all the validation_alias's
So only, min and max, should then become pool_min and pool_max.
And it is important to document in the docstring which parameters there are and what their respective env variable name is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated code!
|
||
connection_pool: oracledb.AsyncConnectionPool | None = None | ||
|
||
model_config = SettingsConfigDict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also likely not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it!
wallet_location: str | None = Field(default=None, validation_alias=ORACLE_WALLET_LOCATION_ENV_VAR) | ||
wallet_password: SecretStr | None = Field(default=None, validation_alias=ORACLE_WALLET_PASSWORD_ENV_VAR) | ||
|
||
connection_pool: oracledb.AsyncConnectionPool | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a PrivateAttr or changed to _connection_pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated code!
def _unwrap_secret(self, value): | ||
if value is None: | ||
return None | ||
return value.get_secret_value() if hasattr(value, "get_secret_value") else str(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really needed, since you only use this for parameters that you know are secrets...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated code!
# Create pool with extra user-supplied kwargs | ||
self.connection_pool = oracledb.create_pool_async( | ||
user=self.user, | ||
password=self._unwrap_secret(self.password), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
password=self._unwrap_secret(self.password), | |
password=self.password.get_secret_value() if self.password else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated code!
connection_pool: oracledb.AsyncConnectionPool | None = None | ||
db_schema: str | None = None | ||
pool_args: dict[str, Any] | None = None | ||
supported_key_types: ClassVar[set[str] | None] = {"str", "int", "UUID"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is UUID a separate type (in python)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, UUID is a separate type in Python, not just a string.
query, bind, columns = await self._inner_search_vector(options, values, vector, **kwargs) | ||
|
||
# If total count is requested, fetch all rows to count. | ||
if options.include_total_count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is set, but the database doesn't support a parameter, then we shouldnt pull everything in, just ignore the setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having considered options.include_total_count for Oracle, here are some possible approaches,
- Raise a warning or log message (e.g., RuntimeWarning or logger.warning) indicating that include_total_count is not supported for Oracle and will be ignored.
- Raise a NotSupportedError if we want stricter enforcement and to prevent misuse in performance-sensitive scenarios.
- Fetch all rows and log a warning a balanced approach,
- Users still get the total count.
- A clear log message or warning highlights that fetching all rows may be inefficient for large datasets.
- The behavior will be properly documented to ensure developers are aware of the performance implications.
I recommend option 3 as it maintains correctness while providing transparency and guidance but like to hear your thoughts or if you prefer a stricter approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eavanvalkenburg Need your input here.
…ync support
Motivation and Context
This change is required to enable Semantic Kernel users to store and retrieve embeddings using Oracle databases. Currently, Semantic Kernel supports vector storage for several backends, but Oracle was missing. This connector solves that gap by providing full async support, native VECTOR type handling, and vector index management.
Description
This PR introduces a new Oracle connector for Semantic Kernel with the following features:
Asynchronous upsert, get, delete and search operations for memory records.
Native Oracle VECTOR type support for storing embeddings efficiently.
Support for HNSW and IVFFLAT vector indexes for similarity search.
Integration with Semantic Kernel collections, enabling semantic search and memory operations.
Comprehensive unit tests to ensure correctness and stability.
The connector is designed to work seamlessly with existing Semantic Kernel memory abstractions and follows the same async patterns as other vector stores.
Integration tests have also been implemented and verified locally; however, they are not included in this PR because the current CI environment setup for Oracle Database support is unknown.
Once guidance is provided on Oracle DB availability in the CI pipeline, integration tests can be enabled and added in a follow-up PR.
Contribution Checklist