- 
                Notifications
    
You must be signed in to change notification settings  - Fork 32
 
🐛Autoscaling: ensure unstarteable warm buffer are replaced by cold instances if available #8277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
          Codecov Report❌ Patch coverage is  Additional details and impacted files@@            Coverage Diff             @@
##           master    #8277      +/-   ##
==========================================
+ Coverage   89.31%   89.79%   +0.48%     
==========================================
  Files        1678     1272     -406     
  Lines       65431    54789   -10642     
  Branches      828      225     -603     
==========================================
- Hits        58438    49199    -9239     
+ Misses       6774     5520    -1254     
+ Partials      219       70     -149     
 
 Continue to review full report in Codecov by Sentry. 
 🚀 New features to boost your workflow:
  | 
    
0458399    to
    40c258a      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug in the autoscaling service where warm buffer instances that couldn't be started due to insufficient capacity would prevent new instances from being launched. The solution ensures that when warm buffer instances fail to start, their assigned tasks are de-assigned and can be fulfilled by launching new cold instances.
Key Changes
- Modified the warm buffer starting logic to handle 
EC2InsufficientCapacityErrorexceptions gracefully - Added a mechanism to de-assign tasks from warm buffer instances that cannot be started
 - Updated the autoscaling flow to retry task assignment with cold instances when warm buffers fail
 
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description | 
|---|---|
services/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py | 
Core autoscaling logic updated to handle warm buffer start failures and de-assign tasks for retry with cold instances | 
services/autoscaling/tests/unit/test_modules_cluster_scaling_dynamic.py | 
Removed xfail marker and increased test warm buffer count to properly test the fix | 
packages/aws-library/tests/test_ec2_client.py | 
Added test coverage for insufficient capacity scenarios and improved code formatting | 
        
          
                ...es/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py
          
            Show resolved
            Hide resolved
        
              
          
                ...es/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py
          
            Show resolved
            Hide resolved
        
              
          
                ...es/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
        
          
                ...es/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py
          
            Show resolved
            Hide resolved
        
      7dc87da    to
    cb8d815      
    Compare
  
    
          
 | 
    
| 
           @mergify queue  | 
    
          
 🛑 Configuration not compatible with a branch protection settingThe branch protection setting   | 
    



What do these changes do?
Related issue/s
How to test
Dev-ops