Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propegates exceptions from child to parent process. #61

Conversation

sam-falvo
Copy link
Contributor

This commit addresses #57 .

PROBLEM

Too many times when working on Whiskey, I might make a seemingly
innocuous change to the program which causes timeouts to occur. The
cause of the timeouts remain a total mystery. The problem is the result
of a cascade of failures:

  1. Something I changed causes an exception.
  2. stdout/stderr is written to, and then the child Node process dies.
  3. Meanwhile, the parent is unaware of this happening, and so sits on
    its socket until it times out,
  4. After the timeout, the parent process has no idea what killed the
    child process, and so the only indication of a problem is that it's
    listed in [TIMEOUT] status.

SOLUTION

Capture all exceptions that you can at the top-most level of the child
process, and marshal that data through the IPC channel back to the
parent process.

Then, alter the parent process to die immediately if an exception is
discovered. Fail fast, fail often.

OTHER BENEFITS

Knowing what's causing some of the timeouts will, I think, be
beneficial from a Buildbot perspective as well. No gaurantees, but it
could be useful in uncovering some of the causes for intermittent
buildbot failures we've seen in the past.

HOW IT WORKS

It works by capturing all exceptions that leak to the main
run_test_file.js script, and forwards them to the _reportException
method on the current TestFile. This stringifies the exception into a
record that is sent, via the usual Whiskey socket, to the parent Whiskey
runner process.

Once there, the lib/util.js parseResultLine method will decode this new
format and, if discovered to be an exception, will encapsulate it in a
triple, which can be uniquely identified as follows:

[true, filename, null] = File end without coverage data
[false, filenameIThink, coverageData] = File end with coverage data.
[null, testName, testResults] = Test invokation results
[string, string, null] = Exception report. (New as of this patch)

This is interpreted by the lib/run.js method
TestRunner._handleConnection(), which encapsulates the provided results
into a new exception to be handled by the parent process immediately.

The exception that is thrown is not an exact duplicate, but rather just
a simple object with a message field, formatted with text suitable for
human-readable display, as that appears to be Whiskey's default action
at this stage. I'm not sure where the code resides that handles the
exception thrown in run.js.

This commit addresses cloudkick#57 .

PROBLEM

Too many times when working on Whiskey, I might make a seemingly
innocuous change to the program which causes timeouts to occur.  The
cause of the timeouts remain a total mystery.  The problem is the result
of a cascade of failures:

1) Something I changed causes an exception.
2) stdout/stderr is written to, and then the child Node process dies.
3) Meanwhile, the parent is unaware of this happening, and so sits on
its socket until it times out,
4) After the timeout, the parent process has no idea what killed the
child process, and so the only indication of a problem is that it's
listed in [TIMEOUT] status.

SOLUTION

Capture all exceptions that you can at the top-most level of the child
process, and marshal that data through the IPC channel back to the
parent process.

Then, alter the parent process to die immediately if an exception is
discovered.  Fail fast, fail often.

OTHER BENEFITS

Knowing what's causing some of the timeouts will, I *think*, be
beneficial from a Buildbot perspective as well.  No gaurantees, but it
*could* be useful in uncovering some of the causes for intermittent
buildbot failures we've seen in the past.

HOW IT WORKS

It works by capturing all exceptions that leak to the main
run_test_file.js script, and forwards them to the _reportException
method on the current TestFile.  This stringifies the exception into a
record that is sent, via the usual Whiskey socket, to the parent Whiskey
runner process.

Once there, the lib/util.js parseResultLine method will decode this new
format and, if discovered to be an exception, will encapsulate it in a
triple, which can be uniquely identified as follows:

[true, filename, null] = File end without coverage data
[false, filenameIThink, coverageData] = File end with coverage data.
[null, testName, testResults] = Test invokation results
[string, string, null] = Exception report. (New as of this patch)

This is interpreted by the lib/run.js method
TestRunner._handleConnection(), which encapsulates the provided results
into a new exception to be handled by the parent process immediately.

The exception that is thrown is not an exact duplicate, but rather just
a simple object with a message field, formatted with text suitable for
human-readable display, as that appears to be Whiskey's default action
at this stage.  I'm not sure where the code resides that handles the
exception thrown in run.js.
result = testUtil.parseResultLine(line);
end = result[0];
if (typeof end === 'string') {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to use underscore here: if (_.isString(end)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and staged for the next commit. It'll read underscore.isString instead, though, as that's the convention for using underscore in Whiskey.

self._connection = connection;
callback();
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a semicolon here for lint

// away.
jsonPayload = JSON.stringify(JSON.parse(result[1]), null, 2);
msg = sprintf("** EXCEPTION CAUGHT IN CHILD PROCESS **\n\n%s\n\n%s", end, jsonPayload);
throw {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you throw something it should be an instance of Error class - throw new Error(msg);.

Throwing regular object or string is nasty and breaks a lot of things which rely on instanceof Error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as of f62212c

@Kami
Copy link
Member

Kami commented Jun 13, 2013

This should definitely help with Whiskey development, but we need test for it.

Also a quick comment about "other benefits" section - I don't think this should in any way change or make discovering intermediate failures and timeouts easier.

Assuming that there are no bugs in Whiskey usual cause for a timeout is that some resource is not being properly closed and test.finish() not being called. This pull request doesn't change that.

@@ -491,6 +489,17 @@ TestFile.prototype._getConnection = function(callback) {
});
};

TestFile.prototype._reportException = function(exc) {
var payload;
exc.message = exc.message || 'Message field not set in exception object';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we should re-factor code at https://github.com/sam-falvo/whiskey/blob/a3f8af9b11e8057340db2f5b5c5efca2dac9a330/lib/common.js#L171 into a separate function (e.g. getExceptionObject and also use it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some fo this code is now obsolete, thanks to f62212c ; I'll be cleaning this up next. OOPS My comment applies to the wrong chunk of code. Disregard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the the linked code -- I'm not seeing how it's relevant. Can you explain? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It knows how to create an Error object with a string message attribute even if you throw something else which is not an actual instance of Error class (e.g. string, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH! Duhh! Sorry; I spaced it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 06a227a

@sam-falvo
Copy link
Contributor Author

I'm not sure how to write a test for this feature without deliberately injecting a failure inside of Whiskey itself. Alternate ideas?

Samuel A. Falvo II added 6 commits June 13, 2013 16:03
With two sets of mutually similar if-statements in caller-side and
callee-side logic, I decided to collapse the whole shebang and use
object polymorphism to eliminate much of the control flow confusion.

The end result is:

1) Easier to expand in the future if need be, and,
2) Easier for someone fluent in OO to understand what's happening.
The PR comment suggested that I alter the callback so that I pass the
connection to it.  However, doing this breaks Whiskey.  With the code as
presented, at least on my laptop, all of Whiskey's test cases pass.
@sam-falvo
Copy link
Contributor Author

@robert-chiniquy @Kami Now that I have this obnoxious mess sitting in github, is there any way to clean this up? Should I just delete this branch, and start a new branch and PR?

@robert-chiniquy
Copy link
Member

@sam-falvo IMO keep the same PR and re-do the commitlog, you can force-push here and we have good history.

@robert-chiniquy
Copy link
Member

Not sure about the best way to test this.

@Kami
Copy link
Member

Kami commented Jun 16, 2013

@sam-falvo @robert-chiniquy Yep, you can rebase and squash all the commits into one, although I don't really care that much. It depends on the feature scope, but personally I usually prefer writing good commit messages and just preserving the commit history (without doing a rebase) so people can see how the code evolved.

@sam-falvo
Copy link
Contributor Author

Just a tickle to see if this is good or not? Thanks!

@robert-chiniquy
Copy link
Member

Okay so:

  • @Kami I think @sam-falvo needs some guidance on how to test this
  • @sam-falvo let me know, we can get together and clean up the commit history sometime soon

@robert-chiniquy
Copy link
Member

@Kami ^ ping

@sam-falvo
Copy link
Contributor Author

Just checking for updates. Thanks.

@sam-falvo
Copy link
Contributor Author

Nothing heard in five years. Closing PR. Someone else is free to assume control over this PR if they wish.

@sam-falvo sam-falvo closed this Oct 19, 2018
@robertchiniquy-okta
Copy link

@sam-falvo
Copy link
Contributor Author

sam-falvo commented Oct 19, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants