Bulk User Import into SuperTokens

https://docs.google.com/document/d/1TUrcIPbdsHfqheIkB6CTkTNiA6YN-73Xz_pX2kjXhkc/edit

Open PRs: 
- https://github.com/supertokens/supertokens-core/pull/1077
- https://github.com/supertokens/supertokens-core/pull/977
- https://github.com/supertokens/supertokens-plugin-interface/pull/162
- https://github.com/supertokens/supertokens-postgresql-plugin/pull/235
- https://github.com/supertokens/supertokens-mysql-plugin/pull/128
- https://github.com/supertokens/core-driver-interface/pull/90


## TODO:

- [x] Test with 1M users in CICD. Make sure users are generated in a way that have various login methods, tenancy, metadata, roles etc. Make sure the time the test takes is not too much, and things work well and are consistent.

- [x] Create an API for starting/stopping the cron job, and an other one for getting the status of the cron job (active/inactive). The processing batch size should be a parameter for the starting API.

- [x] Allow developers to configure parallelism in ProcessBulkImportUsers cron job

    Currently, it takes on an average 66 seconds to process 1000 users. This is very slow if we are processing a large number of users. This happens because we loop through the users one by one in a for loop and also use just 1 DB connection for BulkImportProxyStorage.

    The solution is to process users parallely using threads and create a BulkImportProxyStorage instance for each user (that is being processed). The number of users we will process in parallel will depend on the `BULK_MIGRATION_PARALLELISM` config value set by the user. This will be a SaaS protected prop and can be added to `PROTECTED_CONFIGS` in CoreConfig.java. It should have the `@NotConflictingInApp` annotation.

- [ ] PR changes in [supertokens-core PR](https://github.com/supertokens/supertokens-core/pull/966)
      - All the PR changes are done but there maybe more changes after review.
- [ ] PR changes in [supertokens-postgresql-plugin PR](https://github.com/supertokens/supertokens-postgresql-plugin/pull/211)

- [ ] Changes in [Node Script](https://github.com/supertokens/supertokens-core/pull/977) to add users

  The script needs to re-written to optimise for the following user stories - 

    * The user is not expected to monitor the script. The script should try to continue processing and retry failures wherever possible.

    * The user should be able to re-run the script using the same input file multiple times and the script should process the remaining users. This can be implemented by maintaining a state file per input file name.

    * The Core API calls should have an exponential backoff retries (unlimited) in case we get an error from the Core. This is to ensure that we don't halt processing in case the Core is down for a few seconds.

    * The script should continue showing the Bulk Import Cron Job status after it has added all the users. Any users with status=`FAILED` will be added to the same `usersHavingInvalidSchema.json` file. This file could also be renamed to be something like `usersHavingErrors.json`. Since, we will be writing to the `usersHavingErrors.json` file, they are expected to wait until all the users have been processed and then fix the error file and add those users again.

    * The script should display progress logs while adding the users. This could include total number of usrs, number of users added, number of users having errors, etc.

    * We also need to re-search about the size limit of the JSON file. A JSON file having a million user would be about 880 MB. JSON files cannot be streamed, the whole file needs to be read in memory. If this is an issue then we may need to switch to `ndjson` file format which allows streaming the file.

- [ ] Documentation for Bulk Import
   * Write the docs for Bulk Import using [this outline](https://github.com/supertokens/supertokens-core/issues/912#issuecomment-2136887644).

- [ ] Update the CDI spec to include `/bulk-import/import` and `/bulk-import/users/count` APIs. Also update the `BulkImportUser` schema to include `plainTextPassword` field.

- [ ] Bulk Import for Auth0 users (ignore for now)
  After the Bulk Import task is complete. We plan to have a special guide for Auth0 users. 
  
  Auth0 users need to request the exported file from their support team if they want password hashses of the users. For any other type of login, they can export the users themselves using the API. However, this API doesn't include user roles.

  - We could write a script that takes the exported JSON and Auth0 credentials. The script would get all the roles and map to the users. (We could also call the roles API for each user but that would take more API calls)
  
  - We can also add separate page in our documentation for Auth0 users that guides them about requesting the export JSON file and running the above script.

## Order of changes:
- Core release:
   - Bulk migration cron
   - User migration API
   - Deprecate existing migration APIs
- Node script to help migration
- Docs changes for migration
- Lazy migration changes in backend SDK
- Auth0 migration helper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bulk User Import into SuperTokens #912

TODO:

Order of changes:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bulk User Import into SuperTokens #912

Description

TODO:

Order of changes:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions