-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propegates exceptions from child to parent process. #61
Propegates exceptions from child to parent process. #61
Conversation
This commit addresses cloudkick#57 . PROBLEM Too many times when working on Whiskey, I might make a seemingly innocuous change to the program which causes timeouts to occur. The cause of the timeouts remain a total mystery. The problem is the result of a cascade of failures: 1) Something I changed causes an exception. 2) stdout/stderr is written to, and then the child Node process dies. 3) Meanwhile, the parent is unaware of this happening, and so sits on its socket until it times out, 4) After the timeout, the parent process has no idea what killed the child process, and so the only indication of a problem is that it's listed in [TIMEOUT] status. SOLUTION Capture all exceptions that you can at the top-most level of the child process, and marshal that data through the IPC channel back to the parent process. Then, alter the parent process to die immediately if an exception is discovered. Fail fast, fail often. OTHER BENEFITS Knowing what's causing some of the timeouts will, I *think*, be beneficial from a Buildbot perspective as well. No gaurantees, but it *could* be useful in uncovering some of the causes for intermittent buildbot failures we've seen in the past. HOW IT WORKS It works by capturing all exceptions that leak to the main run_test_file.js script, and forwards them to the _reportException method on the current TestFile. This stringifies the exception into a record that is sent, via the usual Whiskey socket, to the parent Whiskey runner process. Once there, the lib/util.js parseResultLine method will decode this new format and, if discovered to be an exception, will encapsulate it in a triple, which can be uniquely identified as follows: [true, filename, null] = File end without coverage data [false, filenameIThink, coverageData] = File end with coverage data. [null, testName, testResults] = Test invokation results [string, string, null] = Exception report. (New as of this patch) This is interpreted by the lib/run.js method TestRunner._handleConnection(), which encapsulates the provided results into a new exception to be handled by the parent process immediately. The exception that is thrown is not an exact duplicate, but rather just a simple object with a message field, formatted with text suitable for human-readable display, as that appears to be Whiskey's default action at this stage. I'm not sure where the code resides that handles the exception thrown in run.js.
result = testUtil.parseResultLine(line); | ||
end = result[0]; | ||
if (typeof end === 'string') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be better to use underscore here: if (_.isString(end)) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and staged for the next commit. It'll read underscore.isString instead, though, as that's the convention for using underscore in Whiskey.
self._connection = connection; | ||
callback(); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a semicolon here for lint
// away. | ||
jsonPayload = JSON.stringify(JSON.parse(result[1]), null, 2); | ||
msg = sprintf("** EXCEPTION CAUGHT IN CHILD PROCESS **\n\n%s\n\n%s", end, jsonPayload); | ||
throw { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you throw something it should be an instance of Error
class - throw new Error(msg);
.
Throwing regular object or string is nasty and breaks a lot of things which rely on instanceof Error
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed as of f62212c
This should definitely help with Whiskey development, but we need test for it. Also a quick comment about "other benefits" section - I don't think this should in any way change or make discovering intermediate failures and timeouts easier. Assuming that there are no bugs in Whiskey usual cause for a timeout is that some resource is not being properly closed and |
@@ -491,6 +489,17 @@ TestFile.prototype._getConnection = function(callback) { | |||
}); | |||
}; | |||
|
|||
TestFile.prototype._reportException = function(exc) { | |||
var payload; | |||
exc.message = exc.message || 'Message field not set in exception object'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually we should re-factor code at https://github.com/sam-falvo/whiskey/blob/a3f8af9b11e8057340db2f5b5c5efca2dac9a330/lib/common.js#L171 into a separate function (e.g. getExceptionObject
and also use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some fo this code is now obsolete, thanks to f62212c ; I'll be cleaning this up next. OOPS My comment applies to the wrong chunk of code. Disregard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the the linked code -- I'm not seeing how it's relevant. Can you explain? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It knows how to create an Error
object with a string message attribute even if you throw something else which is not an actual instance of Error
class (e.g. string, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH! Duhh! Sorry; I spaced it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 06a227a
I'm not sure how to write a test for this feature without deliberately injecting a failure inside of Whiskey itself. Alternate ideas? |
With two sets of mutually similar if-statements in caller-side and callee-side logic, I decided to collapse the whole shebang and use object polymorphism to eliminate much of the control flow confusion. The end result is: 1) Easier to expand in the future if need be, and, 2) Easier for someone fluent in OO to understand what's happening.
The PR comment suggested that I alter the callback so that I pass the connection to it. However, doing this breaks Whiskey. With the code as presented, at least on my laptop, all of Whiskey's test cases pass.
@robert-chiniquy @Kami Now that I have this obnoxious mess sitting in github, is there any way to clean this up? Should I just delete this branch, and start a new branch and PR? |
@sam-falvo IMO keep the same PR and re-do the commitlog, you can force-push here and we have good history. |
Not sure about the best way to test this. |
@sam-falvo @robert-chiniquy Yep, you can rebase and squash all the commits into one, although I don't really care that much. It depends on the feature scope, but personally I usually prefer writing good commit messages and just preserving the commit history (without doing a rebase) so people can see how the code evolved. |
Just a tickle to see if this is good or not? Thanks! |
Okay so:
|
@Kami ^ ping |
Just checking for updates. Thanks. |
Nothing heard in five years. Closing PR. Someone else is free to assume control over this PR if they wish. |
LOL!! How are things going? Long time no chat!
…On Fri, Oct 19, 2018 at 10:31 AM Robert Chiniquy ***@***.***> wrote:
https://www.youtube.com/watch?v=ZRV32Li6h28
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#61 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABbP-Pfk6MfQWC5NDMO6CkpWkAHC05uXks5umgxqgaJpZM4AukFd>
.
--
Samuel A. Falvo II
|
This commit addresses #57 .
PROBLEM
Too many times when working on Whiskey, I might make a seemingly
innocuous change to the program which causes timeouts to occur. The
cause of the timeouts remain a total mystery. The problem is the result
of a cascade of failures:
its socket until it times out,
child process, and so the only indication of a problem is that it's
listed in [TIMEOUT] status.
SOLUTION
Capture all exceptions that you can at the top-most level of the child
process, and marshal that data through the IPC channel back to the
parent process.
Then, alter the parent process to die immediately if an exception is
discovered. Fail fast, fail often.
OTHER BENEFITS
Knowing what's causing some of the timeouts will, I think, be
beneficial from a Buildbot perspective as well. No gaurantees, but it
could be useful in uncovering some of the causes for intermittent
buildbot failures we've seen in the past.
HOW IT WORKS
It works by capturing all exceptions that leak to the main
run_test_file.js script, and forwards them to the _reportException
method on the current TestFile. This stringifies the exception into a
record that is sent, via the usual Whiskey socket, to the parent Whiskey
runner process.
Once there, the lib/util.js parseResultLine method will decode this new
format and, if discovered to be an exception, will encapsulate it in a
triple, which can be uniquely identified as follows:
[true, filename, null] = File end without coverage data
[false, filenameIThink, coverageData] = File end with coverage data.
[null, testName, testResults] = Test invokation results
[string, string, null] = Exception report. (New as of this patch)
This is interpreted by the lib/run.js method
TestRunner._handleConnection(), which encapsulates the provided results
into a new exception to be handled by the parent process immediately.
The exception that is thrown is not an exact duplicate, but rather just
a simple object with a message field, formatted with text suitable for
human-readable display, as that appears to be Whiskey's default action
at this stage. I'm not sure where the code resides that handles the
exception thrown in run.js.