Skip to content

supermemoryai/memorybench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LongMemEval Benchmark

Configuration

The benchmark requires the following environment variables to be set:

  • SUPERMEMORY_API_KEY: Your Supermemory API key.
  • SUPERMEMORY_API_URL: (Optional) API base URL, defaults to https://api.supermemory.ai.
  • GOOGLE_VERTEX_PROJECT_ID: Project ID for Google Vertex AI (required for evaluation).
  • GOOGLE_CLIENT_EMAIL: Google Service Account email (required for evaluation).
  • GOOGLE_PRIVATE_KEY: Google Service Account private key (required for evaluation).

You can set these in your shell or environment before running the scripts.

Setup

  1. Download the Dataset:

    • Download longmemeval_s_cleaned.json from HuggingFace.
    • Place it in memorybench/benchmarks/LongMemEval/datasets/.
  2. Generate Questions:

    • Run the split script to generate individual question files:
    bun run scripts/setup/split_questions.ts

    This will populate datasets/questions/.

  3. Install Dependencies:

    • Ensure all project dependencies are installed via bun install.

Ingestion

To ingest questions, use the scripts in scripts/ingest/.

Single Question

From memorybench/benchmarks/LongMemEval:

bun run scripts/ingest/ingest.ts <questionId> <runId>

Batch Ingestion

From memorybench/benchmarks/LongMemEval:

./scripts/ingest/ingest-batch.sh --runId=<runId> --questionType=<questionType> --startPosition=<startPos> --endPosition=<endPos>

Search

To search questions, use the scripts in scripts/search/.

Single Question

From memorybench/benchmarks/LongMemEval:

bun run scripts/search/search.ts <questionId> <runId>

Batch Search

From memorybench/benchmarks/LongMemEval:

./scripts/search/search-batch.sh --runId=<runId> [--questionType=<questionType>] [--startPosition=<startPos>] [--endPosition=<endPos>]

Evaluation

To evaluate results, use the scripts in scripts/evaluate/.

Batch Evaluation

From memorybench/benchmarks/LongMemEval:

./scripts/evaluate/evaluate-batch.sh --runId=<runId> [--questionType=<questionType>] [--startPosition=<startPos>] [--endPosition=<endPos>]

Available Question Types

  • single-session-user
  • single-session-assistant
  • single-session-preference
  • knowledge-update
  • temporal-reasoning
  • multi-session

Directory Structure

  • datasets/longmemeval_s_cleaned.json: The raw dataset (download from HF).
  • datasets/questions/: Individual question JSON files (generated).
  • scripts/: Scripts for ingestion, search, and evaluation.
  • scripts/utils/: Shared utilities (config, checkpointing).
  • checkpoints/ingest/session/: Session-level ingestion checkpoints.
  • checkpoints/ingest/batch/: Batch ingestion checkpoints.
  • checkpoints/search/batch/: Batch search checkpoints.
  • results/: Search results.
  • evaluations/: Evaluation results.

About

Unified benchmark for evaluating conversational memory and RAG across multiple datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published