Proposal
ACCESS AMIE protocol provides the allocation information from ACCESS allocation systems to ACCESS supported Resource Providers(RP). It is the responsibility of the RPs to implement the account provisioning and allocation management on the scheduler based on the AMIE messages they receive. In this post I am trying to brainstorm how we should map AMIE transaction messages into SLURM. According to ACCESS AMIE documentation [1], there are 8 main transaction messages defined
- request_project_create: Create a project in RP site
- data_project_create: Not within the scope for this ticket
- request_account_create: Create a new user account on RP site or extend with new grants
- request_project_inactive: Inactivate a Project (Do not delete)
- request_project_reactivate: Reactivate a Project
- request_account_inactivate: Inactivate an account
- request_account_reactivate: Reactivate the account back
- request_user_modify: Modfy user attributes (Not within the scope for this ticket)
Mapping transaction messages to SLURM world
request_project_create
Create a SLURM account corresponding to the project's GrantNumber or local ProjectID (e.g., sacctmgr add account TG-CDA200013). Set the account's GrpTRESMins or GrpTRES based on ServiceUnitsAllocated. Allocation manager might implement SUs to GrpTRESMins conversion rate
request_account_create
Do not get confused with AIME account vs SLURM accounts. AIME account = SLURM user and AIME Project = SLURM account :). Create a SLURM user if he doesn't already exist, and create an association linking that user to the project's SLURM account on the relevant cluster (e.g., sacctmgr add user testuser account=TG-CDA200013). Account creation is out of the scope of this ticket as it has to be handled through the provisioner. This ticket takes over once the account is created on the cluster. We need to save this message if an account was not created and re try once the account is provisioned on the cluster. There should be a notification mechanism from the provisioner to the allocation manager to make it happen. The UserRemoteSiteLogin becomes the actual SLURM username.
Based on the AllocationType, the logic will take different forks
- new — Create the account and set the initial GrpTRESMins.
- renewal — This is tricky. It "replaces" the previous year's allocation. Suggestion: Set QOS per allocation period rather than modifying the account limits.
- supplement — Add SUs to the existing GrpTRESMins without changing dates.
- transfer (positive) — Same as supplement: add SUs.
- transfer (negative) — Subtract SUs from GrpTRESMins. If the remaining balance goes to zero or negative, the site may need to set a restrictive QOS or hold pending jobs.
- extension — Update the end date only; no change to SU balance
- adjustment — Modify the balance (typically a deduction), handled like a transfer
request_project_inactive
Set all associations under that slurm account (AIME Project) to a blocked state. Proposal: Set MaxJobs=0, MaxSubmitJobs=0 or introduce a QoS
request_project_reactivate
Remove restriction placed at notify_project_inactive
request_account_inactivate
Apply Blocked QoS per user or sacctmgr modify user ... set MaxSubmitJobs=0 for every project
request_account_reactivate
Remove restrictions applied at request_account_inactivate
Implementation
Implementation will go under the association mapping connector [2]. This will be integrated with AIME processor using a pub sub method to keep a loosly copuled integration so that we can test the functionalities independently and extend to non-ACCESS clusters. This will be handled in multiple PRs (one PR for each message) to keep the context focused and review process simple.
Integration with SLURM: Instead of using the command line interface, we consider using the SLURM REST API to perform above mentioned operations. Security context and implications are yet to be finalized
[1] https://access-ci.atlassian.net/wiki/spaces/ACP/pages/589496333/AMIE+Documentation
[2] https://github.com/apache/airavata-custos/tree/master/connectors/SLURM/Association-Mapper
[3] https://github.com/apache/airavata-custos/tree/master/connectors/ACCESS/AMIE-Processor
Proposal
ACCESS AMIE protocol provides the allocation information from ACCESS allocation systems to ACCESS supported Resource Providers(RP). It is the responsibility of the RPs to implement the account provisioning and allocation management on the scheduler based on the AMIE messages they receive. In this post I am trying to brainstorm how we should map AMIE transaction messages into SLURM. According to ACCESS AMIE documentation [1], there are 8 main transaction messages defined
Mapping transaction messages to SLURM world
request_project_create
Create a SLURM account corresponding to the project's GrantNumber or local ProjectID (e.g., sacctmgr add account TG-CDA200013). Set the account's GrpTRESMins or GrpTRES based on ServiceUnitsAllocated. Allocation manager might implement SUs to GrpTRESMins conversion rate
request_account_create
Do not get confused with AIME account vs SLURM accounts. AIME account = SLURM user and AIME Project = SLURM account :). Create a SLURM user if he doesn't already exist, and create an association linking that user to the project's SLURM account on the relevant cluster (e.g., sacctmgr add user testuser account=TG-CDA200013). Account creation is out of the scope of this ticket as it has to be handled through the provisioner. This ticket takes over once the account is created on the cluster. We need to save this message if an account was not created and re try once the account is provisioned on the cluster. There should be a notification mechanism from the provisioner to the allocation manager to make it happen. The UserRemoteSiteLogin becomes the actual SLURM username.
Based on the AllocationType, the logic will take different forks
request_project_inactive
Set all associations under that slurm account (AIME Project) to a blocked state. Proposal: Set MaxJobs=0, MaxSubmitJobs=0 or introduce a QoS
request_project_reactivate
Remove restriction placed at notify_project_inactive
request_account_inactivate
Apply Blocked QoS per user or sacctmgr modify user ... set MaxSubmitJobs=0 for every project
request_account_reactivate
Remove restrictions applied at request_account_inactivate
Implementation
Implementation will go under the association mapping connector [2]. This will be integrated with AIME processor using a pub sub method to keep a loosly copuled integration so that we can test the functionalities independently and extend to non-ACCESS clusters. This will be handled in multiple PRs (one PR for each message) to keep the context focused and review process simple.
Integration with SLURM: Instead of using the command line interface, we consider using the SLURM REST API to perform above mentioned operations. Security context and implications are yet to be finalized
[1] https://access-ci.atlassian.net/wiki/spaces/ACP/pages/589496333/AMIE+Documentation
[2] https://github.com/apache/airavata-custos/tree/master/connectors/SLURM/Association-Mapper
[3] https://github.com/apache/airavata-custos/tree/master/connectors/ACCESS/AMIE-Processor