Skip to content

[Feat] : Add moss-connector-dynamodb; DynamoDB source connector #171

@yatharthk2

Description

@yatharthk2

Summary

This connector reads every item from an AWS DynamoDB table and turns each item into a document in a Moss search index.

It uses boto3 for the connection and AWS's standard credential chain (environment variables, shared credentials file, or IAM role) — no keys are passed into the connector directly.

Authentication

The connector doesn't accept AWS credentials as constructor arguments — boto3 handles them via the standard chain:

  1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, optionally AWS_SESSION_TOKEN).
  2. Shared credentials file at ~/.aws/credentials (with AWS_PROFILE to pick a profile).
  3. IAM role when running on EC2, ECS, or Lambda.

Don't add access_key_id or secret_access_key parameters to the connector. The only AWS-specific parameter is region_name, since DynamoDB is region-scoped.

To get started

  1. Read _template/README.md.
  2. Copy the template:
    cd packages/moss-data-connector
    cp -r _template moss-connector-dynamodb
    cd moss-connector-dynamodb
  3. Rename TemplateConnectorDynamoDBConnector in src/connector.py, update src/__init__.py to re-export it, and fill in pyproject.toml.
  4. Implement __iter__ using boto3.resource("dynamodb").Table(...). Call .scan() in a loop, passing ExclusiveStartKey=response["LastEvaluatedKey"] on each next iteration until the key is absent, so large tables stream instead of stopping at the 1 MB page limit.

Deliverables

The caller should be able to write:

from moss_connector_dynamodb import DynamoDBConnector, ingest
from moss import DocumentInfo

source = DynamoDBConnector(
    table_name="articles",
    region_name="us-east-1",
    mapper=lambda row: DocumentInfo(
        id=str(row["id"]),
        text=row["body"],
        metadata={"title": row["title"]},
    ),
)

await ingest(source, project_id="...", project_key="...", index_name="articles")

Files:

  • src/connector.pyDynamoDBConnector class, with pagination handled
  • src/__init__.py — re-exports DynamoDBConnector and ingest
  • src/ingest.py — copy from _template/src/ingest.py
  • pyproject.toml — lists boto3>=1.28
  • tests/test_dynamodb.py — unit test with mocked boto3.resource and MossClient, including a multi-page scan case
  • tests/test_integration_dynamodb_moss.py — live test, skips without DYNAMODB_TABLE, AWS_REGION, and Moss creds
  • README.md
  • Row added to packages/moss-data-connector/README.md

Required for review / PR acceptance

  • Loom video (≈1 min): show the DynamoDB table, run the live test, show Moss queried.
  • CLA signed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions