Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable the LSP on crash #679

Merged
merged 10 commits into from
Jan 31, 2025
Merged

Disable the LSP on crash #679

merged 10 commits into from
Jan 31, 2025

Conversation

lionel-
Copy link
Contributor

@lionel- lionel- commented Jan 28, 2025

Part of posit-dev/positron#5507

The goal of this PR is to prevent the LSP from ever crashing the R session and lose the user state. I've moved all LSP request handlers behind a catch_unwind(), basically a try for panics. When an LSP handler panics, we detect it, report it, and flip our state to crashed. Once crashed, all LSP handlers respond with an error. This causes some chatter in the LSP logs but I made sure these errors do not carry a backtrace to avoid flooding the logs.

Ideally we'd shut down the LSP entirely and forcefully disconnect from the client. Unfortunately that scenario of a server-initiated shutdown is not supported by the LSP protocol and tower-lsp does not give us the tools to do this.

Alternatively we could send a notification to the client that the LSP has crashed. The client could then initiate a shutdown. I chose not to go that route because to avoid having to deal with synchronisation issues and having to make changes to both the client and the server.

For context, this is a temporary workaround. Once the LSP lives in Air a crash will never be a big deal for the user. Most of the time they will not be aware of it since VS Code / Positron silently restarts crashed servers (unless they crash too many times in a short period, in which case the user is notified and the server is no longer restarted).

Here is a screencast of what happens when the LSP crashes:

Screen.Recording.2025-01-28.at.10.52.34.mov

The user is notified of the crash and requested to send a report with the logs.

Note that the relevant backtrace is sent by our panic hook to the kernel logs rather than the LSP logs. The backtrace in the LSP logs is unlikely to be helpful.

@lionel- lionel- requested a review from DavisVaughan January 28, 2025 10:29
Copy link
Contributor

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See slack comments, hopefully we can talk about it in the morning before merging

Comment on lines +170 to +171
async fn request(&self, request: LspRequest) -> RequestResponse {
if LSP_HAS_CRASHED.load(Ordering::Acquire) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fn notify( should probably know about LSP_HAS_CRASHED too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to touch as few things as possible and it seemed fine not to change notify

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've decided to do

if LSP_HAS_CRASHED {
  return;
}

in notify() as well to be safe, and to not relay the notification to the main loop when it might be in a bad state (after a crash)

crates/ark/src/lsp/backend.rs Outdated Show resolved Hide resolved
@lionel-
Copy link
Contributor Author

lionel- commented Jan 29, 2025

Davis pointed out that the serve().await is not really blocking (not sure how I missed that 😬) so we are able to fully shut down the LSP by waking up a select. That's nice because that removes a source of potential problems and that prevents any further log messages in the LSP output channel.

To be on the safe side I decided to keep the crash flag that disables request handlers because our internal notification races with incoming messages from the client so it's possible the main loop will tick again after we detect a crash.

Also I realised we already have the infrastructure to show notifications via Jupyter so I now do that. The downside is that this requires us to go through an r_task() to send the notification (it would be possible to avoid that but would require a non trivial amount of plumbing). The upside is that we leave the LSP out of this which seems safer since we are shutting down.

crates/ark/src/lsp/backend.rs Show resolved Hide resolved
crates/ark/src/lsp/backend.rs Outdated Show resolved Hide resolved
crates/ark/src/lsp/main_loop.rs Outdated Show resolved Hide resolved
into_lsp_response: impl FnOnce(T) -> LspResponse,
) -> anyhow::Result<()> {
let mut crashed = false;

let response = std::panic::catch_unwind(std::panic::AssertUnwindSafe(response))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely certain why we can AssertUnwindSafe here, maybe a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not done a thorough analysis but the main source of unsafety is interior mutability which we a priori don't use in request handlers. We're not in a position to guarantee safety here though.

@lionel- lionel- force-pushed the feature/soft-lsp-crash branch from 0237e64 to 37deee3 Compare January 31, 2025 12:56
@lionel- lionel- merged commit 0b31c63 into main Jan 31, 2025
5 of 6 checks passed
@lionel- lionel- deleted the feature/soft-lsp-crash branch January 31, 2025 13:00
@github-actions github-actions bot locked and limited conversation to collaborators Jan 31, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants