Skip to content

Conversation

monita1208
Copy link

@monita1208 monita1208 commented Oct 7, 2025

…ync support

Motivation and Context

This change is required to enable Semantic Kernel users to store and retrieve embeddings using Oracle databases. Currently, Semantic Kernel supports vector storage for several backends, but Oracle was missing. This connector solves that gap by providing full async support, native VECTOR type handling, and vector index management.

Description

This PR introduces a new Oracle connector for Semantic Kernel with the following features:

  • Asynchronous upsert, get, delete and search operations for memory records.

  • Native Oracle VECTOR type support for storing embeddings efficiently.

  • Support for HNSW and IVFFLAT vector indexes for similarity search.

  • Integration with Semantic Kernel collections, enabling semantic search and memory operations.

  • Comprehensive unit tests to ensure correctness and stability.

The connector is designed to work seamlessly with existing Semantic Kernel memory abstractions and follows the same async patterns as other vector stores.

Integration tests have also been implemented and verified locally; however, they are not included in this PR because the current CI environment setup for Oracle Database support is unknown.
Once guidance is provided on Oracle DB availability in the CI pipeline, integration tests can be enabled and added in a follow-up PR.

Contribution Checklist

@monita1208 monita1208 requested a review from a team as a code owner October 7, 2025 19:17
@monita1208
Copy link
Author

@monita1208 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="oracle"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@monita1208 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="oracle"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@monita1208 monita1208 closed this Oct 7, 2025
@monita1208 monita1208 reopened this Oct 7, 2025
@alexkeh
Copy link

alexkeh commented Oct 7, 2025

@microsoft-github-policy-service agree [company="Oracle"]

@monita1208
Copy link
Author

monita1208 commented Oct 7, 2025

@microsoft-github-policy-service agree company="Oracle"

@markwallace-microsoft markwallace-microsoft added the msft.ext.vectordata Related to Microsoft.Extensions.VectorData label Oct 7, 2025
@@ -0,0 +1,1349 @@
# Copyright (c) 2025, Oracle Corporation. All rights reserved.

from __future__ import annotations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be necessary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it.

VectorSearchExecutionException,
VectorStoreOperationException
)
from semantic_kernel.exceptions.memory_connector_exceptions import MemoryConnectorConnectionException
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from semantic_kernel.exceptions.memory_connector_exceptions import MemoryConnectorConnectionException
from semantic_kernel.exceptions import MemoryConnectorConnectionException

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code.

async connection pools for Oracle.
"""

user: str | None = Field(default=None, validation_alias=ORACLE_USER_ENV_VAR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the intended use of a KernelBaseSettings object, make sure in this class to set:

env_prefix: ClassVar[str] = "ORACLE_"

and then each of the parameters will be prefixed by that + the name of the param capatilized. And then you can remove all the validation_alias's

So only, min and max, should then become pool_min and pool_max.

And it is important to document in the docstring which parameters there are and what their respective env variable name is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code!


connection_pool: oracledb.AsyncConnectionPool | None = None

model_config = SettingsConfigDict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also likely not needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it!

wallet_location: str | None = Field(default=None, validation_alias=ORACLE_WALLET_LOCATION_ENV_VAR)
wallet_password: SecretStr | None = Field(default=None, validation_alias=ORACLE_WALLET_PASSWORD_ENV_VAR)

connection_pool: oracledb.AsyncConnectionPool | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a PrivateAttr or changed to _connection_pool

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code!

def _unwrap_secret(self, value):
if value is None:
return None
return value.get_secret_value() if hasattr(value, "get_secret_value") else str(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really needed, since you only use this for parameters that you know are secrets...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated code!

# Create pool with extra user-supplied kwargs
self.connection_pool = oracledb.create_pool_async(
user=self.user,
password=self._unwrap_secret(self.password),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
password=self._unwrap_secret(self.password),
password=self.password.get_secret_value() if self.password else None

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated code!

connection_pool: oracledb.AsyncConnectionPool | None = None
db_schema: str | None = None
pool_args: dict[str, Any] | None = None
supported_key_types: ClassVar[set[str] | None] = {"str", "int", "UUID"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is UUID a separate type (in python)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, UUID is a separate type in Python, not just a string.

query, bind, columns = await self._inner_search_vector(options, values, vector, **kwargs)

# If total count is requested, fetch all rows to count.
if options.include_total_count:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is set, but the database doesn't support a parameter, then we shouldnt pull everything in, just ignore the setting

Copy link
Author

@monita1208 monita1208 Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having considered options.include_total_count for Oracle, here are some possible approaches,

  1. Raise a warning or log message (e.g., RuntimeWarning or logger.warning) indicating that include_total_count is not supported for Oracle and will be ignored.
  2. Raise a NotSupportedError if we want stricter enforcement and to prevent misuse in performance-sensitive scenarios.
  3. Fetch all rows and log a warning a balanced approach,
  • Users still get the total count.
  • A clear log message or warning highlights that fetching all rows may be inefficient for large datasets.
  • The behavior will be properly documented to ensure developers are aware of the performance implications.

I recommend option 3 as it maintains correctness while providing transparency and guidance but like to hear your thoughts or if you prefer a stricter approach.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eavanvalkenburg Need your input here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

msft.ext.vectordata Related to Microsoft.Extensions.VectorData

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants