-
Notifications
You must be signed in to change notification settings - Fork 34
feat: add moss-connector-dynamodb (DynamoDB source connector) #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Abhijitam01
wants to merge
5
commits into
usemoss:main
Choose a base branch
from
Abhijitam01:feat/add-moss-connector-dynamodb
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
18b0f78
feat: add Semantic Kernel plugin for Moss semantic search (#82)
Abhijitam01 71896e5
Merge pull request #1 from Abhijitam01/Abhijitam01/sk-moss-plugin
Abhijitam01 09a33f1
Merge branch 'usemoss:main' into main
Abhijitam01 872de47
fix: use moss package instead of inferedge-moss
Abhijitam01 64daf7b
feat: add moss-connector-dynamodb; DynamoDB source connector
Abhijitam01 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| # TODOs | ||
|
|
||
| ## CI/CD pipeline for integration packages | ||
|
|
||
| **What:** Add GitHub Actions workflows to build, test, and publish `strands-agents-moss` and `semantic-kernel-moss` to PyPI. | ||
|
|
||
| **Why:** Neither integration package has a CI/CD pipeline. Code without distribution is code nobody can use. Users currently have no way to `pip install` these packages from PyPI. | ||
|
|
||
| **Context:** Both packages use setuptools with `pyproject.toml`. Tests use pytest + pytest-asyncio. Linting uses ruff. A single reusable workflow could cover both packages since they share the same build system and test tooling. Consider matrix strategy for Python 3.10-3.13. | ||
|
|
||
| **Depends on:** Nothing. Can be done independently. | ||
|
|
||
| ## .NET Semantic Kernel plugin design doc | ||
|
|
||
| **What:** Write a design document for a .NET version of the Moss Semantic Kernel plugin. | ||
|
|
||
| **Why:** The GitHub issue (#82) lists .NET as a stretch goal. Semantic Kernel has strong .NET adoption in enterprise shops. A design doc captures the approach (NuGet package, IKernelPlugin interface, C# async patterns) without committing to implementation. | ||
|
|
||
| **Context:** The Python plugin (`semantic-kernel-moss`) is the reference implementation. The .NET version would follow the same pattern: single `Search` kernel function, constructor-configured `MossClient`, pre-loaded index. Key decisions: whether to use the .NET Moss SDK (if it exists) or wrap the Python SDK, and how to handle the async index loading lifecycle in C#. | ||
|
|
||
| **Depends on:** Python plugin shipped and validated by users. | ||
|
|
||
| ## Streaming/batched ingest for large tables | ||
|
|
||
| **What:** Change shared `ingest.py` to pass iterables directly or batch in chunks instead of `list(source)`. | ||
|
|
||
| **Why:** Current `list()` call loads entire dataset into memory. For 1M+ row tables, that's 1GB+ RAM. | ||
|
|
||
| **Context:** The connector `__iter__` already streams correctly. The bottleneck is the shared `ingest.py` doing eager collection. Fix would be ~5 lines but affects all connectors (SQLite, MongoDB, DynamoDB). | ||
|
|
||
| **Depends on:** Checking if `MossClient.create_index()` accepts iterables. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| # moss-data-connector | ||
|
|
||
| Folder holding the database-connector packages. Each subfolder is its own pip-installable package | ||
|
|
||
| ## Layout | ||
|
|
||
| ``` | ||
| moss-data-connector/ | ||
| ├── _template/ # copy-me starting point for a new connector | ||
| ├── moss-connector-sqlite/ # SQLite source (stdlib, no driver) | ||
| ├── moss-connector-mongodb/ # MongoDB source (requires pymongo) | ||
| └── moss-connector-dynamodb/ # DynamoDB source (requires boto3) | ||
| ``` | ||
|
|
||
|
|
||
| ## Caller shape | ||
|
|
||
| ```python | ||
| from moss import DocumentInfo | ||
| from moss_connector_sqlite import SQLiteConnector, ingest | ||
|
|
||
| source = SQLiteConnector( | ||
| database="my.db", | ||
| query="SELECT id, title, body FROM articles", | ||
| mapper=lambda r: DocumentInfo(id=str(r["id"]), text=r["body"], metadata={"title": r["title"]}), | ||
| ) | ||
|
|
||
| await ingest(source, project_id="...", project_key="...", index_name="articles") | ||
| ``` | ||
|
|
||
|
|
||
| ## Available connectors | ||
|
|
||
| | Package | Source | Extra driver | | ||
| | ---------------------------------------------------- | -------- | ------------ | | ||
| | [`moss-connector-sqlite`](moss-connector-sqlite) | SQLite | — | | ||
| | [`moss-connector-mongodb`](moss-connector-mongodb) | MongoDB | `pymongo` | | ||
| | [`moss-connector-dynamodb`](moss-connector-dynamodb) | DynamoDB | `boto3` | | ||
|
|
||
| ## Adding a new connector | ||
|
|
||
| See [`_template/README.md`](_template/README.md). | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| build/ | ||
| dist/ | ||
| *.egg-info/ | ||
| __pycache__/ | ||
| *.py[cod] | ||
| .venv/ | ||
| .pytest_cache/ | ||
| .ruff_cache/ | ||
| .mypy_cache/ | ||
| .env |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # moss-connector-template | ||
|
|
||
| Starting point for a new connector. Not a real package, don't install it. | ||
|
|
||
| ## To create a new connector | ||
|
|
||
| ```bash | ||
| cd packages/moss-data-connector | ||
| cp -r _template moss-connector-<source> | ||
| cd moss-connector-<source> | ||
| ``` | ||
|
|
||
| Then: | ||
|
|
||
| 1. Open `pyproject.toml` and replace every `TODO` (name, description, keywords, Source URL, driver deps). The package name is `moss-connector-<source>`, the Python module is `moss_connector_<source>`. | ||
| 2. Open `src/connector.py` and: | ||
| - Rename `TemplateConnector` → `<Source>Connector`. | ||
| - Add your source-specific config to `__init__`. | ||
| - Implement `__iter__` (connect, pull rows, `yield self.mapper(row)`). | ||
| 3. Update `src/__init__.py` to re-export your renamed class. | ||
| 4. Rename `tests/test_template.py` → `tests/test_<source>.py` and fill in. | ||
| 5. Add a live integration test in `tests/test_integration_<source>_moss.py` if you can (see sqlite/mongodb for patterns). | ||
| 6. Update this package's README with install + usage snippets (see `moss-connector-sqlite/README.md` for shape). | ||
| 7. Add a row to `packages/moss-data-connector/README.md`. | ||
| 8. Open a PR. | ||
|
|
||
| ## Rules | ||
|
|
||
| - **One source per package.** Don't combine. | ||
| - **Declare your driver as a main dependency** in `pyproject.toml` and import it at the top of the module. | ||
| - **No retries or rate-limit logic in `ingest.py`.** If a connector needs it, put it in the connector's own code. | ||
|
|
||
| ## Caller shape (what users write against your connector) | ||
|
|
||
| ```python | ||
| from moss import DocumentInfo | ||
| from moss_connector_<source> import <Source>Connector, ingest | ||
|
|
||
| source = <Source>Connector( | ||
| # your config here | ||
| mapper=lambda r: DocumentInfo( | ||
| id=str(r["id"]), | ||
| text=r["body"], | ||
| metadata={"title": r["title"]}, | ||
| ), | ||
| ) | ||
|
|
||
| await ingest(source, project_id="...", project_key="...", index_name="articles") | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| [project] | ||
| # TODO: rename to "moss-connector-<source>" | ||
| name = "moss-connector-template" | ||
| version = "0.0.1" | ||
| description = "TODO: short description of the source this connector reads from." | ||
| readme = "README.md" | ||
| requires-python = ">=3.10,<3.15" | ||
| license = { text = "BSD-2-Clause" } | ||
| authors = [{ name = "InferEdge Inc.", email = "contact@moss.dev" }] | ||
| # TODO: update keywords | ||
| keywords = ["moss", "connectors", "ingest"] | ||
| classifiers = [ | ||
| "Development Status :: 3 - Alpha", | ||
| "Intended Audience :: Developers", | ||
| "License :: OSI Approved :: BSD License", | ||
| "Programming Language :: Python :: 3", | ||
| "Programming Language :: Python :: 3.10", | ||
| "Programming Language :: Python :: 3.11", | ||
| "Programming Language :: Python :: 3.12", | ||
| "Programming Language :: Python :: 3.13", | ||
| "Topic :: Database", | ||
| ] | ||
| dependencies = [ | ||
| "moss>=1.0.0", | ||
| # TODO: add your source's driver, e.g. "psycopg[binary]>=3.1" | ||
| ] | ||
|
|
||
| [project.optional-dependencies] | ||
| dev = [ | ||
| "pytest>=8.0.0", | ||
| "pytest-asyncio>=0.23.0", | ||
| "python-dotenv>=1.0.0", | ||
| "ruff>=0.5.0", | ||
| ] | ||
|
|
||
| [project.urls] | ||
| Homepage = "https://github.com/usemoss/moss" | ||
| Repository = "https://github.com/usemoss/moss" | ||
| # TODO: update the Source path | ||
| Source = "https://github.com/usemoss/moss/tree/main/packages/moss-data-connector/moss-connector-template" | ||
|
|
||
| [build-system] | ||
| requires = ["setuptools>=61.0"] | ||
| build-backend = "setuptools.build_meta" | ||
|
|
||
| # Flat layout: src/ itself IS the package. | ||
| # TODO: rename to "moss_connector_<source>" to match your package name. | ||
| [tool.setuptools] | ||
| packages = ["moss_connector_template"] | ||
| package-dir = { "moss_connector_template" = "src" } | ||
|
|
||
| [tool.ruff] | ||
| line-length = 100 | ||
| target-version = "py310" | ||
|
|
||
| [tool.ruff.lint] | ||
| select = ["E", "W", "F", "I", "B", "UP"] | ||
|
|
||
| [tool.pytest.ini_options] | ||
| asyncio_mode = "auto" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| """Template connector package. | ||
|
|
||
| Copy this directory to `packages/moss-data-connector/moss-connector-<source>/`, | ||
| then rename `TemplateConnector` in `connector.py` to `<Source>Connector`. | ||
| """ | ||
|
|
||
| from .connector import TemplateConnector | ||
| from .ingest import ingest | ||
|
|
||
| __all__ = ["TemplateConnector", "ingest"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| """Connector class goes here. Rename both the file's class and the module's | ||
| host package (`moss_connector_template` → `moss_connector_<source>`). | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from typing import Any, Callable, Iterator | ||
|
|
||
| from moss import DocumentInfo | ||
|
|
||
|
|
||
| class TemplateConnector: | ||
| """Yield one `DocumentInfo` per row from your source. | ||
|
|
||
| `mapper` turns one row dict into a `DocumentInfo`, the caller decides | ||
| which keys become id / text / metadata / embedding. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| # TODO: add your source-specific config here (connection string, query, etc.) | ||
| mapper: Callable[[dict[str, Any]], DocumentInfo], | ||
| ) -> None: | ||
| self.mapper = mapper | ||
|
|
||
| def __iter__(self) -> Iterator[DocumentInfo]: | ||
| # TODO: connect to your source, pull rows, and for each one: | ||
| # yield self.mapper(row_as_dict) | ||
| # Don't pre-filter columns - the caller's mapper decides what to use. | ||
| # Import your driver *inside* this method, not at module top, so | ||
| # importing the package never fails just because the driver isn't | ||
| # installed. | ||
| raise NotImplementedError | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| """Copy rows into a Moss index.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from collections.abc import Iterable | ||
|
|
||
| from moss import DocumentInfo, MossClient, MutationResult | ||
|
|
||
|
|
||
| async def ingest( | ||
| source: Iterable[DocumentInfo], | ||
| project_id: str, | ||
| project_key: str, | ||
| index_name: str, | ||
| model_id: str | None = None, | ||
| ) -> MutationResult | None: | ||
| """Copy every `DocumentInfo` from `source` into a fresh Moss index.""" | ||
| docs = list(source) | ||
| if not docs: | ||
| return None | ||
| client = MossClient(project_id, project_key) | ||
| return await client.create_index(index_name, docs, model_id=model_id) |
35 changes: 35 additions & 0 deletions
35
packages/moss-data-connector/_template/tests/test_template.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| """Template unit test. Rename to test_<source>.py and adapt.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from dataclasses import dataclass, field | ||
| from typing import Any | ||
| from unittest.mock import patch | ||
|
|
||
| import pytest # noqa: F401 | ||
|
|
||
| from moss import DocumentInfo # noqa: F401 | ||
|
|
||
| # TODO: update these imports to match your renamed package. | ||
| # from moss_connector_<source> import <Source>Connector, ingest | ||
|
|
||
|
|
||
| @dataclass | ||
| class FakeMossClient: | ||
| """Records create_index calls without hitting the network.""" | ||
|
|
||
| calls: list[dict[str, Any]] = field(default_factory=list) | ||
|
|
||
| async def create_index(self, name, docs, model_id=None): | ||
| self.calls.append({"name": name, "docs": list(docs), "model_id": model_id}) | ||
|
|
||
|
|
||
| # Example test, adapt to your source. See moss-connector-sqlite/tests/test_sqlite.py | ||
| # for a worked example that uses a real stdlib driver + fake MossClient. | ||
| # | ||
| # async def test_<source>_ingest(): | ||
| # fake_moss = FakeMossClient() | ||
| # with patch("moss_connector_<source>.ingest.MossClient", return_value=fake_moss): | ||
| # source = <Source>Connector(..., mapper=lambda r: DocumentInfo(...)) | ||
| # count = await ingest(source, "fake_id", "fake_key", "idx") | ||
| # assert count == ... |
10 changes: 10 additions & 0 deletions
10
packages/moss-data-connector/moss-connector-dynamodb/.gitignore
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| build/ | ||
| dist/ | ||
| *.egg-info/ | ||
| __pycache__/ | ||
| *.py[cod] | ||
| .venv/ | ||
| .pytest_cache/ | ||
| .ruff_cache/ | ||
| .mypy_cache/ | ||
| .env |
80 changes: 80 additions & 0 deletions
80
packages/moss-data-connector/moss-connector-dynamodb/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| # moss-connector-dynamodb | ||
|
|
||
| DynamoDB source connector for Moss. Scans an entire table (with optional filters) and ingests items into a Moss search index. | ||
|
|
||
| ## Install | ||
|
|
||
| ```bash | ||
| pip install moss-connector-dynamodb | ||
| ``` | ||
|
|
||
| Pulls `boto3` as a dependency. Uses the standard boto3 credential chain (env vars, shared credentials file, IAM role, etc.). | ||
|
|
||
| ## Usage | ||
|
|
||
| ```python | ||
| import asyncio | ||
| from moss import DocumentInfo | ||
| from moss_connector_dynamodb import DynamoDBConnector, ingest | ||
|
|
||
| async def main(): | ||
| source = DynamoDBConnector( | ||
| table_name="articles", | ||
| mapper=lambda item: DocumentInfo( | ||
| id=str(item["id"]), | ||
| text=item["body"], | ||
| metadata={"title": item["title"]}, | ||
| ), | ||
| region_name="us-east-1", | ||
| scan_kwargs={ # optional | ||
| "FilterExpression": "#s = :val", | ||
| "ExpressionAttributeNames": {"#s": "status"}, | ||
| "ExpressionAttributeValues": {":val": "published"}, | ||
| }, | ||
| ) | ||
|
|
||
| result = await ingest( | ||
| source, | ||
| project_id="your_project_id", | ||
| project_key="your_project_key", | ||
| index_name="articles", | ||
| ) | ||
| print(f"copied {result.doc_count} rows") | ||
|
|
||
| asyncio.run(main()) | ||
| ``` | ||
|
|
||
| DynamoDB items come back as dicts with Python types (Decimal for numbers, etc.). Handle type coercion in your mapper. | ||
|
|
||
| For large tables, `ingest()` loads all items into memory before indexing. Consider batching for tables with 100K+ rows. | ||
|
|
||
| ### Local development | ||
|
|
||
| Pass `endpoint_url` to target DynamoDB Local or localstack: | ||
|
|
||
| ```python | ||
| source = DynamoDBConnector( | ||
| table_name="articles", | ||
| mapper=my_mapper, | ||
| endpoint_url="http://localhost:8000", | ||
| ) | ||
| ``` | ||
|
|
||
| ## Layout | ||
|
|
||
| ``` | ||
| src/ | ||
| ├── __init__.py # re-exports DynamoDBConnector and ingest | ||
| ├── connector.py # DynamoDBConnector class | ||
| └── ingest.py # ingest() - keep in sync with the other connector packages | ||
| ``` | ||
|
|
||
| ## Tests | ||
|
|
||
| ```bash | ||
| pip install -e ".[dev]" | ||
| pytest tests/test_dynamodb.py -v # mocked Moss + mocked boto3 | ||
| pytest tests/test_integration_dynamodb_moss.py -v -s # live Moss + real DynamoDB | ||
| ``` | ||
|
|
||
| The live integration test requires `DYNAMODB_TABLE`, `AWS_REGION`, `MOSS_PROJECT_ID`, and `MOSS_PROJECT_KEY` env vars. Optionally set `DYNAMODB_ENDPOINT_URL` for DynamoDB Local. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Template connector comment contradicts template README import rule
The template
connector.pycomment on lines 30-32 says to import the driver inside__iter__, but the templateREADME.mdatpackages/moss-data-connector/_template/README.md:30explicitly states the opposite rule: "Declare your driver as a main dependency inpyproject.tomland import it at the top of the module." All three actual connectors (SQLite, MongoDB, DynamoDB) follow the README rule and import at the top. A new contributor copying this template would see contradictory advice, and if they follow the code comment, their connector would diverge from the established pattern and the authoritative rules.Was this helpful? React with 👍 or 👎 to provide feedback.