Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Reduce short living allocations #717

Open
A-Herzog opened this issue Jun 20, 2020 · 11 comments
Open

Proposal: Reduce short living allocations #717

A-Herzog opened this issue Jun 20, 2020 · 11 comments
Labels
Performance Issues related to the performance of the Rhino engine

Comments

@A-Herzog
Copy link

The implicit memory management of Java allows to implement algorithms very straight forward but relying too much on this concept can result in very heavy garbage collector loads.

In my JS workloads I can see (using Flight Recorder) there are two lines of code in Rhino which result in 90 GB of short living allocations per minute. For both cases I would like to suggest improvements which will reduce these allocations to zero (and therefore reduce the time needed to collect them in gc to zero). The disadvantage is, that in both cases some additional code is necessary, so the clearness of the methods is reduced.

org.mozilla.javascript.ScriptRuntime

When executing an "i++" in JS the previously maybe as Integer stored "i" becomes a Double. (The increment method tests if i is a Number and in this case Number.doubleValue() is used.) Boxing small integer values does not need any allocation because Integer.valueOf(int) is using a cache. Double.valueOf(double) always needs to allocate a Double object, so a simple for (var i=0;i<10;i++) results in 10 Double allocations. Even more: When creating a JS variable by "var i=3;" it will be stored as an Integer. But the "int i=0" in the for loop generates a Double - for some unknown reasons. So "i++" will operate on Double all the time.

By using the following code in ScriptRuntime the unnecessary use of Double variables can be avoided. (Please note the "Added code starts/ends here" markers. Everything else is not changed.)

private static Object doScriptableIncrDecr(Scriptable target,
		String id,
		Scriptable protoChainStart,
		Object value,
		int incrDecrMask)
{
	boolean post = ((incrDecrMask & Node.POST_FLAG) != 0);

	// Added code starts here
	if (value instanceof Integer) {
		int intNumber = ((Integer)value).intValue();
		if ((incrDecrMask & Node.DECR_FLAG) == 0) {
			++intNumber;
		} else {
			--intNumber;
		}
		final Integer intResult = wrapInt(intNumber);
		target.put(id, protoChainStart, intResult);
		return post?value:intResult;
	}

	if (value instanceof Double) {
		final double d=((Double)value).doubleValue();
		if (Math.floor(d)==d && Math.abs(d)<NativeNumber.MAX_SAFE_INTEGER) {
			int intNumber = (int)d;
			if ((incrDecrMask & Node.DECR_FLAG) == 0) {
				++intNumber;
			} else {
				--intNumber;
			}
			final Integer intResult = wrapInt(intNumber);
			target.put(id, protoChainStart, intResult);
			return post?value:intResult;
		}
	}
	// Added code ends here

	double number;
	if (value instanceof Number) {
		number = ((Number)value).doubleValue();
	} else {
		number = toNumber(value);
		if (post) {
			// convert result to number
			value = wrapNumber(number);
		}
	}
	if ((incrDecrMask & Node.DECR_FLAG) == 0) {
		++number;
	} else {
		--number;
	}
	Number result = wrapNumber(number);
	target.put(id, protoChainStart, result);
	if (post) {
		return value;
	}
	return result;
}

org.mozilla.javascript.optimizer.OptRuntime

Here we have a static method for boxing a function argument to an one-element array:

public static Object call1(..., Object arg0,...) {  
	return fun.call(..., new Object[] { arg0 } );  
}

Avoiding these Object[1] allocations is a bit more complex because "call1" is static and we need to ensure the code keeps thread safe. So for this case I would propose this:

private final static ThreadLocal<Object[]> call1obj=new ThreadLocal<>();

/**
 * Implement ....(arg) call shrinking optimizer code.
 */
public static Object call1(Callable fun, Scriptable thisObj, Object arg0,
		Context cx, Scriptable scope)
{
	Object[] args=call1obj.get();
	if (args==null) call1obj.set(args=new Object[1]);

	args[0]=arg0;
	try {
		return fun.call(cx, scope, thisObj, args );
	} finally {
		args[0]=null; /* Help the gc */
	}
}

Measurement results

The following two Java Mission Control screenshots show the difference in memory allocations using the current Rhino code and the current Rhino code with the two additions above:

Current Rhino code
FlightRecorder-Memory-Rhino-current-version
Rhino using the two additions
FlightRecorder-Memory-Rhino-on-memoy-diet

If "keeping the code simple and maintainable" is most important, it's ok to discard my proposals. But if a reduction of gc load is of interest, this may help. (BTW: The improved Rhino version would be less gc intensive than Nashorn.)

@rbri
Copy link
Collaborator

rbri commented Jun 21, 2020

Hi Alexander,
i think this is a really interesting suggestion. Just made a pull request based on your first suggestion to have some code base to discuss. I guess Greg will also have some suggestions here.

@rbri
Copy link
Collaborator

rbri commented Jun 21, 2020

Regarding your second suggestion i'm not really sure your fix is correct. Without a deeper look at the code i have some fear if one function calls another one and then again addresses the args array. And i personally think thread locals are more often the root of problems and not the solution ;-).
Will have a deeper look at this later.

Many, many thanks for your contribution.

@p-bakker
Copy link
Collaborator

@rbri assuming the PR you made for the first part of this proposal got merged, can this issue be closed?

I think you're right that the 2nd part of the proposal has issues when static call1(...) method gets called multiple times within the same call stack, as a new call to call1 would modify the argument array for a previous call

@tuchida
Copy link
Contributor

tuchida commented Jun 28, 2021

		if ((incrDecrMask & Node.DECR_FLAG) == 0) {
			++intNumber;
		} else {
			--intNumber;
		}

What happens when intNumber is Integer.MAX_VALUE or Integer.MIN_VALUE?

@rbri
Copy link
Collaborator

rbri commented Jun 28, 2021

The changes i have done in this pr had no positive effect - i spend some time on this and i found no way to improve the overall performance so far. So the PR was done to preserve my initial work and as starting point for other (or if i have some time to come back to this)

@p-bakker
Copy link
Collaborator

p-bakker commented Jun 28, 2021

Link to the PR that was closed without merging: #720

@rbri
Copy link
Collaborator

rbri commented Jun 28, 2021

Strange maybe i have never pushed the PR because there was no effect.

@p-bakker
Copy link
Collaborator

I had just found it in the Closed PR section: #720

@gbrail
Copy link
Collaborator

gbrail commented Aug 18, 2024

#1563 does indeed do the first thing, which is to make the increment operation (and many other math operations) use Integers, and use Integer.valueOf, much more aggressively. Merging that will partially resolve this issue.

On the second one, I too am concerned about the complexity and possible issues with replacing those small arrays with thread local arrays. I also know that Java is highly optimized for small, short-lived object allocations and I'd love to see an actual performance improvement (like a benchmark running faster) before assuming that this would improve performance.

@rbri
Copy link
Collaborator

rbri commented Aug 19, 2024

The changes i have done in this pr had no positive effect - i spend some time on this and i found no way to improve the overall performance so far. So the PR was done to preserve my initial work and as starting point for other (or if i have some time to come back to this)

For me there was no effect at all - i guess this is the wrong way if you like to make the thing faster and using less resources.

@p-bakker
Copy link
Collaborator

For me there was no effect at all

Performance impact of these changes could imho really depends on the machine you're running on and the Java version & config being used, as I think any slowdown would be because of the garbage collector kicking in and consuming CPU cycles.

Given a big enough machine and the right garbage collector, you might not notice a slowdown with the current code in a testrun at all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Issues related to the performance of the Rhino engine
Projects
None yet
Development

No branches or pull requests

5 participants