Skip to content

Bulk User Import into SuperTokens #912

@anku255

Description

@anku255

https://docs.google.com/document/d/1TUrcIPbdsHfqheIkB6CTkTNiA6YN-73Xz_pX2kjXhkc/edit

Open PRs:

TODO:

  • Test with 1M users in CICD. Make sure users are generated in a way that have various login methods, tenancy, metadata, roles etc. Make sure the time the test takes is not too much, and things work well and are consistent.

  • Create an API for starting/stopping the cron job, and an other one for getting the status of the cron job (active/inactive). The processing batch size should be a parameter for the starting API.

  • Allow developers to configure parallelism in ProcessBulkImportUsers cron job

    Currently, it takes on an average 66 seconds to process 1000 users. This is very slow if we are processing a large number of users. This happens because we loop through the users one by one in a for loop and also use just 1 DB connection for BulkImportProxyStorage.

    The solution is to process users parallely using threads and create a BulkImportProxyStorage instance for each user (that is being processed). The number of users we will process in parallel will depend on the BULK_MIGRATION_PARALLELISM config value set by the user. This will be a SaaS protected prop and can be added to PROTECTED_CONFIGS in CoreConfig.java. It should have the @NotConflictingInApp annotation.

  • PR changes in supertokens-core PR
    - All the PR changes are done but there maybe more changes after review.

  • PR changes in supertokens-postgresql-plugin PR

  • Changes in Node Script to add users

    The script needs to re-written to optimise for the following user stories -

    • The user is not expected to monitor the script. The script should try to continue processing and retry failures wherever possible.

    • The user should be able to re-run the script using the same input file multiple times and the script should process the remaining users. This can be implemented by maintaining a state file per input file name.

    • The Core API calls should have an exponential backoff retries (unlimited) in case we get an error from the Core. This is to ensure that we don't halt processing in case the Core is down for a few seconds.

    • The script should continue showing the Bulk Import Cron Job status after it has added all the users. Any users with status=FAILED will be added to the same usersHavingInvalidSchema.json file. This file could also be renamed to be something like usersHavingErrors.json. Since, we will be writing to the usersHavingErrors.json file, they are expected to wait until all the users have been processed and then fix the error file and add those users again.

    • The script should display progress logs while adding the users. This could include total number of usrs, number of users added, number of users having errors, etc.

    • We also need to re-search about the size limit of the JSON file. A JSON file having a million user would be about 880 MB. JSON files cannot be streamed, the whole file needs to be read in memory. If this is an issue then we may need to switch to ndjson file format which allows streaming the file.

  • Documentation for Bulk Import

  • Update the CDI spec to include /bulk-import/import and /bulk-import/users/count APIs. Also update the BulkImportUser schema to include plainTextPassword field.

  • Bulk Import for Auth0 users (ignore for now)
    After the Bulk Import task is complete. We plan to have a special guide for Auth0 users.

    Auth0 users need to request the exported file from their support team if they want password hashses of the users. For any other type of login, they can export the users themselves using the API. However, this API doesn't include user roles.

    • We could write a script that takes the exported JSON and Auth0 credentials. The script would get all the roles and map to the users. (We could also call the roles API for each user but that would take more API calls)

    • We can also add separate page in our documentation for Auth0 users that guides them about requesting the export JSON file and running the above script.

Order of changes:

  • Core release:
    • Bulk migration cron
    • User migration API
    • Deprecate existing migration APIs
  • Node script to help migration
  • Docs changes for migration
  • Lazy migration changes in backend SDK
  • Auth0 migration helper

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions