Skip to content

Stack upgrades & ProcessSync: What happens if apps use a stack the platform no longer supports?

braa braa braa edited this page May 26, 2020 · 9 revisions

If I upgrade CC to a version that drops support for a stack, but some of my running apps still use that stack...

Summary

  • Diego will be very careful not to cause unexpected app downtime.
  • Those freshly invalid, old-stack apps will continue to run and be routable, but CC will no longer be able to send updates of them to Diego.
  • The system will recognize this and refuse to delete any compute resources until it can confirm that they aren't the old-stack apps that it can no longer sync.

What happens to the apps that are still using that unsupported stack?

  • They continue to exist in CCDB
  • They continue to exist as BBS as Diego DesiredLRPs
  • They continue to run on Diego Cells as Diego ActualLRPs (?)
  • They continue to be routable (?)
  • They can no longer be updated or created in Diego
    • Updates and creates will result in the error no compiler defined for requested stack
    • Any change to the process' updated_at will make Diego's DesiredLRP out-of-date
    • The ProcessSync loop will attempt to update all out-of-date DesiredLRPs
  • Because the domain is unfresh:
    • They can be deleted in the CF API, but Diego will not stop running their ActualLRPs (?)

What happens to the sync loop?

  • It continues to run
  • In parallel, it continues to sync as many CC processes as possible to Diego as DesiredLRPs
  • Any app with an unsupported stack will error on update if Diego's DesiredLRP is out-of-date.
  • Update errors will prevent freshness from being bumped
  • All errors encountered should be logged by the clock

What happens when freshness isn't bumped

  • see the BBS documentation for domain freshness
    • tldr
      • No destructive action will be taken against LRPs in that domain
      • Processes with unsupported stacks will continue to run (unless Diego has dropped them during evacuation?)
      • Processes that have been deleted in CC but exist in Diego will continue to run
      • Creates and updates of processes will continue to work fine

What happens to apps that synced successfully?

  • They can be created, updated, scaled, etc
  • Because the domain is unfresh:
    • They can be deleted in the CF API
    • BUT Diego will not stop running their ActualLRPs

What happens to "mysterious" ActualLRPs that CCDB has no record of?

  • They cannot be deleted in the CF API
  • Because the domain is unfresh:
    • Diego will not stop running their ActualLRPs

Collated context of how we came to have this behavior

  • October 2018: #156029607 We made uncaught errors on the clock log and exit 1.
  • November 2018: #162064721 We made most errors log, but continue to sync and refuse to bump freshness.
  • November 2018: #161800100 We verified this behavior applies to apps with absent stacks.
  • December 2018: A KB Article was written about recovering from this issue
  • May 2020: Pivotal Slack We started seeing a rash of this in escalations, with log lines where cc.diego.sync.processes logged sync-failed and error-updating-lrp-state
Clone this wiki locally