-
Notifications
You must be signed in to change notification settings - Fork 718
Fix issue where deadlock can occur over logging._lock #4636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue where deadlock can occur over logging._lock #4636
Conversation
I think this is good since it fixes the deadlock issue which is separate from auto instrumentation. But I think this issue only comes up when the Logging handler is added before someone calls |
Ack. I've added function overwriting for dictConfig and fileConfig like we have for basicConfig... Let me know what you think |
…elemetry-python into fix_deadlock_attempt2
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_configuration/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_configuration/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_configuration/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_configuration/__init__.py
Outdated
Show resolved
Hide resolved
@DylanRussell can you rebase and I can merge? |
Alright I've rebased. Should be all set now |
Description
We (google) saw a deadlock occur when logging.config.dictConfig is called after the OTEL
LoggingHandler
is attached to the root logger.This happened b/c
dictConfig
acquireslogging._lock
and thenclearsHandlers
which then callsshutdown
on the OTELLoggingHandler
, which callsflush
.flush
triggered anexport
call to ourexporter
. Deep inside ourexporter
we spin up a new thread to handle auth, and that thread also tried to acquirelogging._lock
resulting in a deadlock..To fix this I updated
LoggingHandler.flush
to executeforce_flush
in a separate thread, and not to block/wait before returning.. This should be reasonable because we don't return the result of the force flush anyway, so why block there.This seems to reliably fix the deadlock, but I think it's technically possible for this new thread to spin up and reach the lock before logging.config.dictConfig releases it's lock..
Another simple fix is to set
flushOnClose
to true, so that the OTELLogHandler.flush
is not called duringshutdown
. This seems fine to me as well because we callshutdown
on exit anyway. Either of these solutions seem fine.Also considered making our exporter
async
, but we don't have support forasync
exporter's in this repo yet.Type of change
How Has This Been Tested?
I've added a unit test to show how the deadlock happens.. I don't think that test should actually be submitted because of the chance a deadlock can occur and lock up the test suite..
Does This PR Require a Contrib Repo Change?
Checklist: