-
Notifications
You must be signed in to change notification settings - Fork 364
Stack upgrades & ProcessSync: What happens if apps use a stack the platform no longer supports?
braa braa braa edited this page May 26, 2020
·
9 revisions
If I upgrade CC to a version that drops support for a stack, but some of my running apps still use that stack...
- Diego will be very careful not to cause unexpected app downtime.
- Those freshly invalid, old-stack apps will continue to run and be routable, but CC will no longer be able to send updates of them to Diego.
- The system will recognize this and refuse to delete any compute resources until it can confirm that they aren't the old-stack apps that it can no longer sync.
- They continue to exist in CCDB
- They continue to exist as BBS as Diego DesiredLRPs
- They continue to run on Diego Cells as Diego ActualLRPs (?)
- They continue to be routable (?)
- They can no longer be updated or created in Diego
- Updates and creates will result in the error
no compiler defined for requested stack
- Any change to the process'
updated_at
will make Diego's DesiredLRP out-of-date - The ProcessSync loop will attempt to update all out-of-date DesiredLRPs
- Updates and creates will result in the error
- Because the domain is unfresh:
- They can be deleted in the CF API, but Diego will not stop running their ActualLRPs (?)
- It continues to run
- In parallel, it continues to sync as many CC processes as possible to Diego as DesiredLRPs
- Any app with an unsupported stack will error on update if Diego's DesiredLRP is out-of-date.
- Update errors will prevent freshness from being bumped
- All errors encountered should be logged by the clock
- see the BBS documentation for domain freshness
- tldr
- No destructive action will be taken against LRPs in that domain
- Processes with unsupported stacks will continue to run (unless Diego has dropped them during evacuation?)
- Processes that have been deleted in CC but exist in Diego will continue to run
- Creates and updates of processes will continue to work fine
- tldr
- They can be created, updated, scaled, etc
- Because the domain is unfresh:
- They can be deleted in the CF API
- BUT Diego will not stop running their ActualLRPs
- They cannot be deleted in the CF API
- Because the domain is unfresh:
- Diego will not stop running their ActualLRPs
- October 2018: #156029607 We made uncaught errors on the clock log and
exit 1
. - November 2018: #162064721 We made most errors log, but continue to sync and refuse to bump freshness.
- November 2018: #161800100 We verified this behavior applies to apps with absent stacks.
- December 2018: A KB Article was written about recovering from this issue
- May 2020: Pivotal Slack We started seeing a rash of this in escalations, with log lines where
cc.diego.sync.processes
loggedsync-failed
anderror-updating-lrp-state
-
Pipelines
-
Contributing
- Tips and Tricks
- Cloud Controller API v3 Style Guide
- Playbooks
- Development configuration
- Testing
-
Architectural Details
-
CC Resources
- Apps
- Audit Events
- Deployments
- Labels
- Services
- Sidecars
-
Dependencies
-
Troubleshooting
- Ruby Console Script to Find Fields that Cannot Be Decrypted
- Logging database queries in unit tests
- Inspecting blobstore cc resources and cc packages(webdav)
- How to Use USR1 Trap for Diagnostics
- How to Perf: Finding and Fixing Bottlenecks
- How to get access to mysql database
- How To Get a Ruby Heap Dumps & GC Stats from CC
- How to curl v4 internal endpoints with mtls
- How to access Bosh Director console and restore an outdated Cloud Config
- Analyzing Cloud Controller's NGINX logs using the toplogs script
-
k8s
-
Archive