Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] safeConn Write takes a long time (when server is misbehaving) to return on errors which is blocking #5

Open
kbhalaki opened this issue Sep 2, 2024 · 0 comments

Comments

@kbhalaki
Copy link

kbhalaki commented Sep 2, 2024

Describe the bug
go version: 1.22

Write (safeConn.go) checks if connection is closed by performing a small Read of 1 byte before continuing with Write which is blocking in some cases. such as during k8s upgrades, statsd client keeps on trying to establish the connection in Write but fails after around 10-15 mins. This means all other Writes from the same client are blocked because of the lock here

Why is it blocking even though we Read 1 byte?
Actually, the function connIsClosed does not check for other errors such as timeout. So, even if Read timed out, it will still try for Write (which can be blocking in case of network congestion or as mentioned in this earlier PR: #2

To Reproduce
I have tried to reproduce both server and client side code. This is a general code and does not use safeConn but I have tried to replicate a similar logic where I have tried to show that Write can be blocked:
server.go - https://go.dev/play/p/V5SfDJVu1Pe
client.go - https://go.dev/play/p/YG-B2TaxESy

Note that below is a snippet after I added a Write Deadline otherwise it can go for a long time.

image

Expected behavior

  1. Writes should not be blocking for such a long time.

Additional context
My suggestions on how we can fix:

  1. Writes should have a deadline even if we check the state of the connection by Reading because there can be scenarios where Read is working but Writes can be blocked.
  2. If we are dependent on Read to understand the state of the connection, we should return on any err that Read returns, not just EOF. We should not proceed with Write if Read is timing out or any other error.

I am happy to create a Pull request and contribute :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant