Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions TODOS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# TODOs

## CI/CD pipeline for integration packages

**What:** Add GitHub Actions workflows to build, test, and publish `strands-agents-moss` and `semantic-kernel-moss` to PyPI.

**Why:** Neither integration package has a CI/CD pipeline. Code without distribution is code nobody can use. Users currently have no way to `pip install` these packages from PyPI.

**Context:** Both packages use setuptools with `pyproject.toml`. Tests use pytest + pytest-asyncio. Linting uses ruff. A single reusable workflow could cover both packages since they share the same build system and test tooling. Consider matrix strategy for Python 3.10-3.13.

**Depends on:** Nothing. Can be done independently.

## .NET Semantic Kernel plugin design doc

**What:** Write a design document for a .NET version of the Moss Semantic Kernel plugin.

**Why:** The GitHub issue (#82) lists .NET as a stretch goal. Semantic Kernel has strong .NET adoption in enterprise shops. A design doc captures the approach (NuGet package, IKernelPlugin interface, C# async patterns) without committing to implementation.

**Context:** The Python plugin (`semantic-kernel-moss`) is the reference implementation. The .NET version would follow the same pattern: single `Search` kernel function, constructor-configured `MossClient`, pre-loaded index. Key decisions: whether to use the .NET Moss SDK (if it exists) or wrap the Python SDK, and how to handle the async index loading lifecycle in C#.

**Depends on:** Python plugin shipped and validated by users.

## Streaming/batched ingest for large tables

**What:** Change shared `ingest.py` to pass iterables directly or batch in chunks instead of `list(source)`.

**Why:** Current `list()` call loads entire dataset into memory. For 1M+ row tables, that's 1GB+ RAM.

**Context:** The connector `__iter__` already streams correctly. The bottleneck is the shared `ingest.py` doing eager collection. Fix would be ~5 lines but affects all connectors (SQLite, MongoDB, DynamoDB).

**Depends on:** Checking if `MossClient.create_index()` accepts iterables.
43 changes: 43 additions & 0 deletions packages/moss-data-connector/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# moss-data-connector

Folder holding the database-connector packages. Each subfolder is its own pip-installable package

## Layout

```
moss-data-connector/
├── _template/ # copy-me starting point for a new connector
├── moss-connector-sqlite/ # SQLite source (stdlib, no driver)
├── moss-connector-mongodb/ # MongoDB source (requires pymongo)
└── moss-connector-dynamodb/ # DynamoDB source (requires boto3)
```


## Caller shape

```python
from moss import DocumentInfo
from moss_connector_sqlite import SQLiteConnector, ingest

source = SQLiteConnector(
database="my.db",
query="SELECT id, title, body FROM articles",
mapper=lambda r: DocumentInfo(id=str(r["id"]), text=r["body"], metadata={"title": r["title"]}),
)

await ingest(source, project_id="...", project_key="...", index_name="articles")
```


## Available connectors

| Package | Source | Extra driver |
| ---------------------------------------------------- | -------- | ------------ |
| [`moss-connector-sqlite`](moss-connector-sqlite) | SQLite | — |
| [`moss-connector-mongodb`](moss-connector-mongodb) | MongoDB | `pymongo` |
| [`moss-connector-dynamodb`](moss-connector-dynamodb) | DynamoDB | `boto3` |

## Adding a new connector

See [`_template/README.md`](_template/README.md).

10 changes: 10 additions & 0 deletions packages/moss-data-connector/_template/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
build/
dist/
*.egg-info/
__pycache__/
*.py[cod]
.venv/
.pytest_cache/
.ruff_cache/
.mypy_cache/
.env
49 changes: 49 additions & 0 deletions packages/moss-data-connector/_template/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# moss-connector-template

Starting point for a new connector. Not a real package, don't install it.

## To create a new connector

```bash
cd packages/moss-data-connector
cp -r _template moss-connector-<source>
cd moss-connector-<source>
```

Then:

1. Open `pyproject.toml` and replace every `TODO` (name, description, keywords, Source URL, driver deps). The package name is `moss-connector-<source>`, the Python module is `moss_connector_<source>`.
2. Open `src/connector.py` and:
- Rename `TemplateConnector` → `<Source>Connector`.
- Add your source-specific config to `__init__`.
- Implement `__iter__` (connect, pull rows, `yield self.mapper(row)`).
3. Update `src/__init__.py` to re-export your renamed class.
4. Rename `tests/test_template.py` → `tests/test_<source>.py` and fill in.
5. Add a live integration test in `tests/test_integration_<source>_moss.py` if you can (see sqlite/mongodb for patterns).
6. Update this package's README with install + usage snippets (see `moss-connector-sqlite/README.md` for shape).
7. Add a row to `packages/moss-data-connector/README.md`.
8. Open a PR.

## Rules

- **One source per package.** Don't combine.
- **Declare your driver as a main dependency** in `pyproject.toml` and import it at the top of the module.
- **No retries or rate-limit logic in `ingest.py`.** If a connector needs it, put it in the connector's own code.

## Caller shape (what users write against your connector)

```python
from moss import DocumentInfo
from moss_connector_<source> import <Source>Connector, ingest

source = <Source>Connector(
# your config here
mapper=lambda r: DocumentInfo(
id=str(r["id"]),
text=r["body"],
metadata={"title": r["title"]},
),
)

await ingest(source, project_id="...", project_key="...", index_name="articles")
```
60 changes: 60 additions & 0 deletions packages/moss-data-connector/_template/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
[project]
# TODO: rename to "moss-connector-<source>"
name = "moss-connector-template"
version = "0.0.1"
description = "TODO: short description of the source this connector reads from."
readme = "README.md"
requires-python = ">=3.10,<3.15"
license = { text = "BSD-2-Clause" }
authors = [{ name = "InferEdge Inc.", email = "contact@moss.dev" }]
# TODO: update keywords
keywords = ["moss", "connectors", "ingest"]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: BSD License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Topic :: Database",
]
dependencies = [
"moss>=1.0.0",
# TODO: add your source's driver, e.g. "psycopg[binary]>=3.1"
]

[project.optional-dependencies]
dev = [
"pytest>=8.0.0",
"pytest-asyncio>=0.23.0",
"python-dotenv>=1.0.0",
"ruff>=0.5.0",
]

[project.urls]
Homepage = "https://github.com/usemoss/moss"
Repository = "https://github.com/usemoss/moss"
# TODO: update the Source path
Source = "https://github.com/usemoss/moss/tree/main/packages/moss-data-connector/moss-connector-template"

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

# Flat layout: src/ itself IS the package.
# TODO: rename to "moss_connector_<source>" to match your package name.
[tool.setuptools]
packages = ["moss_connector_template"]
package-dir = { "moss_connector_template" = "src" }

[tool.ruff]
line-length = 100
target-version = "py310"

[tool.ruff.lint]
select = ["E", "W", "F", "I", "B", "UP"]

[tool.pytest.ini_options]
asyncio_mode = "auto"
10 changes: 10 additions & 0 deletions packages/moss-data-connector/_template/src/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""Template connector package.

Copy this directory to `packages/moss-data-connector/moss-connector-<source>/`,
then rename `TemplateConnector` in `connector.py` to `<Source>Connector`.
"""

from .connector import TemplateConnector
from .ingest import ingest

__all__ = ["TemplateConnector", "ingest"]
33 changes: 33 additions & 0 deletions packages/moss-data-connector/_template/src/connector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
"""Connector class goes here. Rename both the file's class and the module's
host package (`moss_connector_template` → `moss_connector_<source>`).
"""

from __future__ import annotations

from typing import Any, Callable, Iterator

from moss import DocumentInfo


class TemplateConnector:
"""Yield one `DocumentInfo` per row from your source.

`mapper` turns one row dict into a `DocumentInfo`, the caller decides
which keys become id / text / metadata / embedding.
"""

def __init__(
self,
# TODO: add your source-specific config here (connection string, query, etc.)
mapper: Callable[[dict[str, Any]], DocumentInfo],
) -> None:
self.mapper = mapper

def __iter__(self) -> Iterator[DocumentInfo]:
# TODO: connect to your source, pull rows, and for each one:
# yield self.mapper(row_as_dict)
# Don't pre-filter columns - the caller's mapper decides what to use.
# Import your driver *inside* this method, not at module top, so
# importing the package never fails just because the driver isn't
# installed.
Comment on lines +30 to +32
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Template connector comment contradicts template README import rule

The template connector.py comment on lines 30-32 says to import the driver inside __iter__, but the template README.md at packages/moss-data-connector/_template/README.md:30 explicitly states the opposite rule: "Declare your driver as a main dependency in pyproject.toml and import it at the top of the module." All three actual connectors (SQLite, MongoDB, DynamoDB) follow the README rule and import at the top. A new contributor copying this template would see contradictory advice, and if they follow the code comment, their connector would diverge from the established pattern and the authoritative rules.

Suggested change
# Import your driver *inside* this method, not at module top, so
# importing the package never fails just because the driver isn't
# installed.
# Import your driver at the top of the module (see README rules).
# Declare it as a main dependency in pyproject.toml so pip installs it.
#
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

raise NotImplementedError
22 changes: 22 additions & 0 deletions packages/moss-data-connector/_template/src/ingest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
"""Copy rows into a Moss index."""

from __future__ import annotations

from collections.abc import Iterable

from moss import DocumentInfo, MossClient, MutationResult


async def ingest(
source: Iterable[DocumentInfo],
project_id: str,
project_key: str,
index_name: str,
model_id: str | None = None,
) -> MutationResult | None:
"""Copy every `DocumentInfo` from `source` into a fresh Moss index."""
docs = list(source)
if not docs:
return None
client = MossClient(project_id, project_key)
return await client.create_index(index_name, docs, model_id=model_id)
35 changes: 35 additions & 0 deletions packages/moss-data-connector/_template/tests/test_template.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""Template unit test. Rename to test_<source>.py and adapt."""

from __future__ import annotations

from dataclasses import dataclass, field
from typing import Any
from unittest.mock import patch

import pytest # noqa: F401

from moss import DocumentInfo # noqa: F401

# TODO: update these imports to match your renamed package.
# from moss_connector_<source> import <Source>Connector, ingest


@dataclass
class FakeMossClient:
"""Records create_index calls without hitting the network."""

calls: list[dict[str, Any]] = field(default_factory=list)

async def create_index(self, name, docs, model_id=None):
self.calls.append({"name": name, "docs": list(docs), "model_id": model_id})


# Example test, adapt to your source. See moss-connector-sqlite/tests/test_sqlite.py
# for a worked example that uses a real stdlib driver + fake MossClient.
#
# async def test_<source>_ingest():
# fake_moss = FakeMossClient()
# with patch("moss_connector_<source>.ingest.MossClient", return_value=fake_moss):
# source = <Source>Connector(..., mapper=lambda r: DocumentInfo(...))
# count = await ingest(source, "fake_id", "fake_key", "idx")
# assert count == ...
10 changes: 10 additions & 0 deletions packages/moss-data-connector/moss-connector-dynamodb/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
build/
dist/
*.egg-info/
__pycache__/
*.py[cod]
.venv/
.pytest_cache/
.ruff_cache/
.mypy_cache/
.env
80 changes: 80 additions & 0 deletions packages/moss-data-connector/moss-connector-dynamodb/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# moss-connector-dynamodb

DynamoDB source connector for Moss. Scans an entire table (with optional filters) and ingests items into a Moss search index.

## Install

```bash
pip install moss-connector-dynamodb
```

Pulls `boto3` as a dependency. Uses the standard boto3 credential chain (env vars, shared credentials file, IAM role, etc.).

## Usage

```python
import asyncio
from moss import DocumentInfo
from moss_connector_dynamodb import DynamoDBConnector, ingest

async def main():
source = DynamoDBConnector(
table_name="articles",
mapper=lambda item: DocumentInfo(
id=str(item["id"]),
text=item["body"],
metadata={"title": item["title"]},
),
region_name="us-east-1",
scan_kwargs={ # optional
"FilterExpression": "#s = :val",
"ExpressionAttributeNames": {"#s": "status"},
"ExpressionAttributeValues": {":val": "published"},
},
)

result = await ingest(
source,
project_id="your_project_id",
project_key="your_project_key",
index_name="articles",
)
print(f"copied {result.doc_count} rows")

asyncio.run(main())
```

DynamoDB items come back as dicts with Python types (Decimal for numbers, etc.). Handle type coercion in your mapper.

For large tables, `ingest()` loads all items into memory before indexing. Consider batching for tables with 100K+ rows.

### Local development

Pass `endpoint_url` to target DynamoDB Local or localstack:

```python
source = DynamoDBConnector(
table_name="articles",
mapper=my_mapper,
endpoint_url="http://localhost:8000",
)
```

## Layout

```
src/
├── __init__.py # re-exports DynamoDBConnector and ingest
├── connector.py # DynamoDBConnector class
└── ingest.py # ingest() - keep in sync with the other connector packages
```

## Tests

```bash
pip install -e ".[dev]"
pytest tests/test_dynamodb.py -v # mocked Moss + mocked boto3
pytest tests/test_integration_dynamodb_moss.py -v -s # live Moss + real DynamoDB
```

The live integration test requires `DYNAMODB_TABLE`, `AWS_REGION`, `MOSS_PROJECT_ID`, and `MOSS_PROJECT_KEY` env vars. Optionally set `DYNAMODB_ENDPOINT_URL` for DynamoDB Local.
Loading