The benchmark requires the following environment variables to be set:
SUPERMEMORY_API_KEY: Your Supermemory API key.SUPERMEMORY_API_URL: (Optional) API base URL, defaults tohttps://api.supermemory.ai.GOOGLE_VERTEX_PROJECT_ID: Project ID for Google Vertex AI (required for evaluation).GOOGLE_CLIENT_EMAIL: Google Service Account email (required for evaluation).GOOGLE_PRIVATE_KEY: Google Service Account private key (required for evaluation).
You can set these in your shell or environment before running the scripts.
-
Download the Dataset:
- Download
longmemeval_s_cleaned.jsonfrom HuggingFace. - Place it in
memorybench/benchmarks/LongMemEval/datasets/.
- Download
-
Generate Questions:
- Run the split script to generate individual question files:
bun run scripts/setup/split_questions.ts
This will populate
datasets/questions/. -
Install Dependencies:
- Ensure all project dependencies are installed via
bun install.
- Ensure all project dependencies are installed via
To ingest questions, use the scripts in scripts/ingest/.
From memorybench/benchmarks/LongMemEval:
bun run scripts/ingest/ingest.ts <questionId> <runId>From memorybench/benchmarks/LongMemEval:
./scripts/ingest/ingest-batch.sh --runId=<runId> --questionType=<questionType> --startPosition=<startPos> --endPosition=<endPos>To search questions, use the scripts in scripts/search/.
From memorybench/benchmarks/LongMemEval:
bun run scripts/search/search.ts <questionId> <runId>From memorybench/benchmarks/LongMemEval:
./scripts/search/search-batch.sh --runId=<runId> [--questionType=<questionType>] [--startPosition=<startPos>] [--endPosition=<endPos>]To evaluate results, use the scripts in scripts/evaluate/.
From memorybench/benchmarks/LongMemEval:
./scripts/evaluate/evaluate-batch.sh --runId=<runId> [--questionType=<questionType>] [--startPosition=<startPos>] [--endPosition=<endPos>]- single-session-user
- single-session-assistant
- single-session-preference
- knowledge-update
- temporal-reasoning
- multi-session
datasets/longmemeval_s_cleaned.json: The raw dataset (download from HF).datasets/questions/: Individual question JSON files (generated).scripts/: Scripts for ingestion, search, and evaluation.scripts/utils/: Shared utilities (config, checkpointing).checkpoints/ingest/session/: Session-level ingestion checkpoints.checkpoints/ingest/batch/: Batch ingestion checkpoints.checkpoints/search/batch/: Batch search checkpoints.results/: Search results.evaluations/: Evaluation results.