Skip to content

FELIX-6778 Reduce number of re-creation(s) of SCR Component Registry thread #419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

glimmerveen
Copy link
Contributor

Replaced the pattern of creating and cancelling a java.util.Timer, with a ScheduledThreadPoolExecutor. This is configured to keep its core Thread(s) around for up to 10 seconds without any tasks. This enables a low 'ds.service.changecount.timeout' to be used, without having to pay the cost of a lot of short-lived threads being created.

…l.Timer, with a ScheduledThreadPoolExecutor. This is configured to keep its core Thread(s) around for up to 10 seconds without any tasks. This enables a low 'ds.service.changecount.timeout' to be used, without having to pay the cost of a lot of short-lived threads being created.
@stbischof
Copy link
Contributor

Soundy interesting.
Let me Integrate the OSGi tck for component first.
To be sure we pass all internal and tck Tests.

@stbischof
Copy link
Contributor

stbischof commented May 10, 2025

some struggle with junit jupiter test execution, vintage works
#410

and with to much log output,
you may can helpt with this
#420

@glimmerveen
Copy link
Contributor Author

In what area do you see the test execution struggling? I see one test failure (Felix6274Test), but that appears on main branch as well, and looks unrelated to this change.

@paulrutter
Copy link
Contributor

Without knowing much about this specific part of Felix and SCR in general, would java 21 virtual threads be of interest here? It would make the code more complex, as we want to support pre java 21 as well, but we have similar code in JettyService for Jetty 12.

@stbischof
Copy link
Contributor

@paulrutter
a also like the idea to may move up from jdk 8 to ...
but 21 is to much for the most projects. and really has to be discussed in he Project.
@karlpauls @cziegeler

@paulrutter
Copy link
Contributor

With reflection we could support both pre and post java 21 versions.
See

Method newVirtualThreadPerTaskExecutorMethod = null;

@glimmerveen
Copy link
Contributor Author

Without knowing much about this specific part of Felix and SCR in general, would java 21 virtual threads be of interest here? It would make the code more complex, as we want to support pre java 21 as well, but we have similar code in JettyService for Jetty 12.

In case of SCR, I don't think its use of threads (SCR Component Registry, or SCR Component Actor) benefit particularly from specific properties of virtual threads, as I interpret Oracle's description of virtual threads:

Use virtual threads in high-throughput concurrent applications, especially those that consist of a great number of concurrent tasks that spend much of their time waiting. Server applications are examples of high-throughput applications because they typically handle many client requests that perform blocking I/O operations such as fetching resources.

These properties do really fit with something like Jetty, but for SCR I don't really see what it would bring.

@paulrutter
Copy link
Contributor

paulrutter commented May 11, 2025

I was triggered by this part of your description: 'without having to pay the cost of a lot of short-lived threads being created'.
Virtual threads would solve for this, although a simple threadpool executor is also fine for this usecase.

@stbischof
Copy link
Contributor

With reflection we could support both pre and post java 21 versions. See

Method newVirtualThreadPerTaskExecutorMethod = null;

Wondered how this should work.
Reflective for the method i okay. But if jav 17 it will throw an exception. So how will it start?

@laeubi
Copy link

laeubi commented May 12, 2025

Technically you could even use MR-Jar here, but I really don't think that Virtual Threads give a benefit here and a simple Threadpool would be sufficient.

@paulrutter
Copy link
Contributor

With reflection we could support both pre and post java 21 versions. See

Method newVirtualThreadPerTaskExecutorMethod = null;

Wondered how this should work. Reflective for the method i okay. But if jav 17 it will throw an exception. So how will it start?

In the catch clause, one would just use a regular ScheduledThreadPoolExecutor.
But let's not make it more complicated and stick with the current approach.

@cziegeler
Copy link
Contributor

@paulrutter a also like the idea to may move up from jdk 8 to ... but 21 is to much for the most projects. and really has to be discussed in he Project. @karlpauls @cziegeler

from my perspective, Java 11 is definitely no problem, Java 17 should be ok but might already cause problems for some, Java 21 will cause problems for more. Given the bandwidth I would go for a complicated solution that leverages Java 21, but also works with lower versions

@laeubi
Copy link

laeubi commented May 12, 2025

a also like the idea to may move up from jdk 8 to ...

I just wanted to mention that there is some hard work going on to keep the core OSGi framework at java 8, moving SCR to anything higher would probably render this useless for those consumers.

@tjwatson
Copy link
Member

I need SCR on Java 8 for the now.

Do we have performance metrics that show an improvement here?

Can someone explain what ds.service.changecount.timeout controls? If I read the code right, it provides a delay for setting the service.changecount. Is that to avoid blasting service modified events for the ServiceComponentRuntime registration? So we always wait some amount of time before setting the service property?

@glimmerveen
Copy link
Contributor Author

I need SCR on Java 8 for the now.

Do we have performance metrics that show an improvement here?

I don't have a specific performance metric I can refer to, but I did notice that with a tiny ds.service.changecount.timeout (10, or 0 even), and a lot of updates to the changecount, that the timer (and by extension the underlying thread) gets re-created a lot.

Another benefit is that the code becomes a bit simpler, as there is no need to cancel midway (nor the synchronisation that is currently present to ensure this happens consistently).

Can someone explain what ds.service.changecount.timeout controls? If I read the code right, it provides a delay for setting the service.changecount. Is that to avoid blasting service modified events for the ServiceComponentRuntime registration? So we always wait some amount of time before setting the service property?

I presume this is the reason, but I am not familiar with the exact reason(s) for this design.

@tjwatson
Copy link
Member

Another benefit is that the code becomes a bit simpler, as there is no need to cancel midway (nor the synchronisation that is currently present to ensure this happens consistently).

I fully agree. I wanted to understand what the reason was behind the setting to help me understand if there is a more simple way. I really don't like the current usage of Timer here at all.

Copy link

@laeubi laeubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fine and I don't see anything that requires Java > 8 here

return new Thread(r, "SCR Component Registry");
}
});
threadPoolExecutor.setKeepAliveTime(10, TimeUnit.SECONDS);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
threadPoolExecutor.setKeepAliveTime(10, TimeUnit.SECONDS);
threadPoolExecutor.setKeepAliveTime(m_configuration.serviceChangecountTimeout()*2, TimeUnit.MILLISECONDS);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be open to maintaining a minimum keep alive of 500ms? The suggested change would, with a very short timeout of 10ms, cause the thread to be cleaned up almost as fast/often as with the current Timer approach.

Something along the lines of:

threadPoolExecutor.setKeepAliveTime(Math.max(500, m_configuration.serviceChangecountTimeout()*2), TimeUnit.MILLISECONDS);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You maybe want to make it configurable by a system property. Maybe even we do not need to ever timeout the thread anyways?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having any timeout would further simply things (ie using one of the standard factory methods, over manually initializing one of the ScheduledExecutorService implementations). I opted for this, as wanted to stay as close to the current behaviour, and thus wanted to retain the cleanup of the thread, with one change that the thread is not cleaned-up immediately but after some period of time.

I can make that period Thread timeout configurable, similar to how the changecount timeout is configurable, and even include a way to never have a timeout, but with the latter as well, I also see this adding some complexity as well. For instance, should there be some kind of check on the Thread timeout vs the changecount timeout?

Coming back to your initial suggestion, would something like this work:

threadPoolExecutor.setKeepAliveTime(Math.max(m_configuration.componentRegistryThreadTimeout(), m_configuration.serviceChangecountTimeout()*2), TimeUnit.MILLISECONDS);

So having the thread timeout be 2x the changecount timeout, with a configurable minimum which is by default 500ms?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want to stop creating new threads I suggest we could combine this scheduled executor with the ComponentActorThread into a single executor. I'm not sure why we have our own implementation of what looks like a simple executor with ComponentActorThread. But this thread lives for the life of the active SCR bundle. We could just make this a single scheduled executor that we post work to for both the component actor thread and for scheduling this change count service property update. Then we don't have to worry about managing any keep alive or otherwise since this thread is always there.

Although we could also let this single thread terminate in the pool of one also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjwatson I like the idea of moving the execution of the service.changecount property to the existing SCR Actor thread. I can update this PR if there are no objections to this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds useful here. Especially as we can assume that the scheduled action do not really "block" anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed commit 181336b which moves the service.changecount update towards the SCR Component Actor. The realisation of the SCR Component Actor is now implemented using a ThreadPool (which offers the scheduling of runnables, which is needed to realise the timeout of the service.changecount update).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial peek at the changes looks good. I like the simplification. I still want to take a closer look to do a proper review.

One small concern is if the two task "types" may have some condition that they are waiting for which would be provided by the other task type. For example, the component actor task cannot complete until the change count task completes and the component actor task gets put into the queue of tasks before the change count task can get scheduled.

I'm not sure how realistic that scenario is, but it is something we are going to have to look out for.

Replaced the SCR Component Registry executor with the SCR Component Actor
Added integration test that validates service.changecount property updates
@@ -776,9 +758,6 @@ public void run()
}

public void shutdown() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this function is now empty, I assume it can be removed (or should other cleanup happen here still)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my assumption also given the executor will shutdown and cleanup instead.

@@ -560,7 +562,7 @@ public <T> void leaveCreate(final ServiceReference<T> serviceReference)
* @param serviceReference
* @param actor
*/
public synchronized <T> void missingServicePresent( final ServiceReference<T> serviceReference, ComponentActorThread actor )
public synchronized <T> void missingServicePresent( final ServiceReference<T> serviceReference, ScheduledExecutorService actor )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should stop passing the actor here since it is the same object passed to the constructor and stored in m_componentActor.

static
{
descriptorFile = "/integration_test_simple_components.xml";
DS_SERVICE_CHANGECOUNT_TIMEOUT = 1000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change the ComponentTestBase.DS_SERVICE_CHANGECOUNT_TIMEOUT for all other tests run after this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it does. I replicated the pattern from DS_LOGLEVEL, but that suffers from the same issue; once a test sets it, all subsequent tests (which is not necessarily always consistent across builds) use the same settings, unless it gets explicitly (un)set in one of these subsequent tests.

Without a more significant refactoring effort, I don't really see a way from this specific test to set a specific system property in Pax Exam.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reset it back in an @AfterClass method?

Copy link
Contributor Author

@glimmerveen glimmerveen May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reset it back in an @AfterClass method?

It looks like the @AfterClass is called within the forked JVM, and unfortunately does not affect the variable set in the static initialiser block (whom's value resides in the main unit process, and is input for the @Configuration annotated method).

# Conflicts:
#	scr/src/test/java/org/apache/felix/scr/integration/ComponentTestBase.java
Removed m_componentActor parameter from ComponentRegistry.missingServicePresent()
@stbischof
Copy link
Contributor

You can rebase, than we have the logging fix in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants