Skip to content

Release 3.7.2#1035

Merged
fcmonteiro merged 41 commits into
COVESA:masterfrom
andrefesilva:release_3.7.2
Apr 20, 2026
Merged

Release 3.7.2#1035
fcmonteiro merged 41 commits into
COVESA:masterfrom
andrefesilva:release_3.7.2

Conversation

@andrefesilva
Copy link
Copy Markdown
Contributor

No description provided.

Jorge Saraiva and others added 30 commits April 16, 2026 10:46
Handles the internal header w/ cmake instead of using macros in the header files
It is just annoying to every time there a need to include internal.hpp header,
to have the following block

 #ifdef ANDROID
 #include "../../configuration/include/internal_android.hpp"
 #else
 #include "../../configuration/include/internal.hpp"
 #endif // ANDROID

Therefore, CMakeLists file is now updated so that we can simply write
Android internal header file is now located in the following directory:
implementation/configuration/include/android/
some messages should not be logged as errors, so downgrade it's level
Enables and updates memory_test
After the introduction of b0549a4, the memory_test failures became more obvious.
A separate experiment was done, proving the test would fail regardless of having b0549a4.
However, as the failure started to be more deterministic, it was purposely disabled w/ b0549a4.
Changes in this PR

Enables the test again;
Increases the message sender interval from 1ms -> 5ms. (It is reasonable to do this, as after b0549a4 the overhead
of exchanged messages is higher. Nonetheless, the defined threshold (1.15) is kept).
fprintf will of course not add any..
Summary:
Fix data races in the service discovery by replacing function-local static variables with per-instance members,
ensuring state is isolated within each service_discovery_impl instance.
Details:
Two independent races were caused by function-local static variables that were implicitly shared
across all service_discovery_impl instances and threads in the process, while only being protected
by instance-level mutexes.
Issue 1 - Session sequence tracking race
service_discovery_impl::check_session_id_sequence() used a function-local
static std::map<std::pair<boost::asio::ip::address, bool>, session_t> to track the last seen
session per sender and multicast/unicast path.
Although on_message() holds sessions_received_mutex_ when calling this function, that mutex only
protects state within a single instance. Multiple instances could still concurrently access the shared static
map under different mutexes, leading to data races.
Issue 2 - Multicast timer initialization race
service_discovery_impl::on_message() used a function-local static bool must_start_last_msg_received_timer
to control initial arming of last_msg_received_timer_.
This flag was shared across all instances but accessed under last_msg_received_timer_mutex_,
which is instance-local. Concurrent instances could therefore read/write the shared flag under different
mutexes, causing a data race.
Fix
Regarding Issue 1, sessions_received_by_peer_ is now an instance member of service_discovery_impl,
replacing the previously shared static map. This variable is also cleared in service_discovery_impl::start()
with the rest of the session tracking state.
Regarding Issue 2, must_start_last_msg_received_timer_ was also moved to inside the class,
ensuring it's not shared by different instances.
NoteThese races can only happen if VSOMEIP_ENABLE_MULTIPLE_ROUTING_MANAGERS is set.
Remove trailing whitespaces from codebase
After adaf7ad fixed the book keeping of the environment data
of a routing proxy, a couple of problems remained:

the tcp connection data might went out of sync with local_services,
local_services might become out of sync with the availability status
communciated to the client.

To resolve these inconsistencies the same strategy as for adaf7ad was
applied:

Move the functions accessing the relevant data from rmb -> rmc/rmi.
Move the relevant data from rmb -> rmc/rmi
Prune the data and the moved functions

This strategy was applied to the formerly called local_services_,
local_service_history_, guests_ and the pending_subscriptions_.
During the process it became obvious that:
local_service_history_ and pending_subscriptions_ are only used
within the routing_manager_client and that the lifetime
of the data contained in guests_ should be bound to the endpoint
lifetime for the router (due to lazy connection this is not possible
in the client).
But before any of these changes were made stress tests were
written that would fail reliable with the former implementation
(and earlier versions). To have these tests fail reliable
there was the need to ensure when exactly a command would
be received and forwarded. For this purpose the
data_pipeline + the command_gate have been introduced.
The data_pipeline is supposed to be the input_queue for any socket.
For now the data_pipeline can either directly forward the data,
or apply a gate for local_messages. But this is envisioned to be
extended by the possibility to also gate someip_messages.
It is therefore the generic "back-end" of the inflow control.
The command_gate forms the first "front-end" of the inflow control.
It is supposed to be used by a test case directly to control tightly what
data is when allowed. It is planned that a corresponding someip_gate
will be written in the future.
With the initial gate one can now write stress tests more easily
in which data of multiple sources can be awaited and handed into
one application in rapid succession.
Allow remote subscriptions as soon as the service is considered offered by the daemon.
Description
Despite being a good solution, allowing remote subscription only when the router either received his own multicast
offer or the completion of the unicast asynchronous send operation triggered some race conditions.

On those occurences, a partner ECU received the offer and sent the subscription before the offering ECU handled
its own multicast offer, this triggered a Subscription Nack as the control mechanism introduced in 38fd11e  didn't
yet enabled the service to accept remote subscriptions.
This PR removes the control mechanism as a first step solution, further on, some sort of mitigation shall be
implemented to prevent the race condition handled by 38fd11e.
And add an initial test for the broken-but-important-to-keep
availability behavior
Since the routing_manger_impl requires no client or server endpoints,
there is no need for the inheritance of the endpoint_manager_impl.
Trigger routing_state_handler_ in all routing state cases
The routing_state_handler_ was only being triggered on RESUME, but for the control_plugin_test to know the
routing state in SUSPEND on the vsomeip-daemon side, it also needs to be triggered on SUSPEND.
In addition, the trigger was added in all other routing state cases.
This change fixes an issue where UDP errors were being ignored on
receive because the payload length as less than the full SOME/IP
message.
This length check was introduced to prevent out-of-bounds errors
but it also unintentionally prevented errors from being processed
since there's never any payload associated with them.
The fix involves moving the check into the branch where we didn't
receive an error.
This refactoring is done as part of a bigger refactoring separating
the client logic from the router logic. Because the client and the
router have very different needs of book keeping the
routing_manager_base is expected to be replaced.
Therefore the usage is replaced by a new interface dedicated
to the hosting of local endpoints.
The problematic scenario:
The router is the sender and guarantees that the message
is send out over one channel before stopping, but shortly
afterwards breaks this and the routing connection.
On the receiving side the routing connection break might
lead to a breakage of all connections.
Introduces hybrid mode via configuration parameter
Adds comprehensive hybrid mode networking support allowing applications to dynamically
choose between UDS (Unix Domain Sockets) and TCP based on vsomeip configuration file.
Protocol changes

VSOMEIP_ASSIGN_CLIENT - Can include now address+port

Configuration

New uds_preferred parameter allowing per-application UDS/TCP preference settings

Endpoint Management - Enhanced routing with hybrid mode awareness:

Same-machine connections prioritize UDS when configured
Cross-machine connections fall back to TCP automatically
Configurable per-router and per-application

Testing

Adds new test suite for this feature
Adds helper to verifiy connection type
Covers few dfferent scenarios depending on configurations set.
It would happen from time, to time that due to the missing fix
from 7673afb the routing_info would really be lost.
But sometimes it also happend that the test is cycling twice
through UNAVAIL, AVAIL leading to a false positive failure.
This PR is concerned with the later problem, ensuring that
this test should be stable after 7673afb and this PR.
Ensure that a pinged client is removed from the "to be checked list",
when the client deregisters.
If two (client) ECUs do requests for the same service and use the same
client-id, it will loss of responses. We detect it, and that log must be
an error!
Fix the offer test external
Previously, the offer test external was passing without even
running the second container, and effectively was doing nothing.
As such, fix the test so that it can properly test that the same service
cannot be offered by 2 different application, and separate the test
cases so that in one test the local offer is the first one, being the one
that is accepted, and on the other test case it is the remote offer that is
the first one and it is accepted.
Serviceinfo had 3 mutexes.
Merge them all into one.
This one function used the socket without the lock.
I cross checked. This should be the only one missing.
Most recently logs in the receive_cbk were added that
not acquire the lock. The other usages required the lock
dedicated to this purpose for which reason it should be
fine to move the lock in the function itself.
This change fixes an issue in the UDP endpoint state machine which
would trigger an available/unavailable loop when trying to send
messages to a closed port.
The issue originates when the daemon does not observe the STOP
OFFER from the service provider, so it keeps trying to send
messages to the closed UDP port where the service was previously
offered.
This results in a "Connection refused" error in the client endpoint,
which triggers an unavailability trigger, a reconnect, and an
immediate availability trigger.
If a client tries to send a message again, the process repeats
until the offer expires.
The fix for this issue involves stopping the endpoint when we get
the "Connection refused" error and waiting for the next OFFER to
restart it.
This way, we get an unavailable when the error occurs and an
available only when the service is offered again, thus preventing
the error loop.
Clean-up of RS_DIAGNOSIS
Does not seem to used anymore.
Nuke it to simplify our routing states
Includes logger_ext header to android bp
Follow up to 9392789
The current book-keeping for offered services by a client
is overly complicated in routing_manager_base, complicating
the synchronization attempts of the provider side of an application.
Additionally the services_ struct had different semantics for the
routing_manager_client (the services the application itself is offering)
and the router (all services known).
Therefore all related functions and data structs have been moved
to rmi without further change, and a pruned version into rmc.
E.g. rmc does not require any remote_services_ map.
Extends ecu config with custom interface
Extends ecu config with custom interface
while at it:
Adds a fake boardnet test where a consumer subscribesto a service that considers
multiple fields.
Verifies if all the initial events are received.
New fix to the offer test external
What lead to the previous failure of the offer test external
was the fact that it was using future promises to check
service availability, without checking the value.
What did happen was that when the client registered the
service, it would be unavailable and it would set the
promise even before the service became available for the
first time.
As such, use a bool and a cv to ensure that the client gets
the correct availability of the service, and as such the client
will only stop after getting the unavailability after it has
subscribed to the service.
Disables SD configuration for IPC testing
After the b0549a4, the Service Discovery
configuration is no longer required to have it enabled, even for IPC testing.
(Previously, when router was the one offering this configuration would be a requirement to make it testable).
Furthermore, having the SD enabled, causes lots of verbosity and disturbs log analysis.
When service provider's routing manager is processing the offers,
it adds each service to the services_ maps, but the
serviceinfo for each service is still incomplete as it has yet to do
init_service_info. This last function checks the services reliability,
creates the endpoints and sets them accordingly.
However, consumers are sending FINDS which service discovery processes,
sees the service in the map, it thinks all info is there, so it sends the
unicast OFFER, but with missing options.
To fix this, introduce a initial state to the serviceinfo, preparation,
which is only left once the endpoints have been set.
Test routing state handler states with fake_socket
test_connection_control.cpp updated to add a new test that verifies the routing_state_handler is correctly triggered
when set_routing_state is called.
kai-moritzkumkar and others added 11 commits April 20, 2026 13:17
Move the event and eventsgroup maps out of rmb into
rmc/rmi. This enables dedicated client vs. router
refactorings.
Pruning is skipped almost completely in this PR,
only shadow events are avoided in the rmc and the
virtual "is_routing_manager" is no longer required.
This change fixes an issue in environments with multiple vsomeip
apps trying to log to the same file, where they could overwrite
each others logs.
The fix involves adding the std::ios_base::app flag to append
logs to the end-of-file instead of after the previous write.
Note that due to how the logs are buffered before being written
to the file, the total order of the logs between different apps
might not be maintained (i.e. interleaving of timestamps).
This change also adds the app name to these logs to help to tell
them apart.
routing_manager_base shall no longer implement the
event_dispatcher or the routing_host interface.
Cleanup the offer test big sd message
Cleanup the offer test big sd message
Follow up from hybrid mode PR

Add retry mechanism for local uds acceptor creation, needed for situations where guest applications
were started before the router
Logging changes
New test cases focused on connection breaks between guest apps and router
New unit test for assign_client_command
Payload was not being printed due to missing negation
on header-only check
Also bring back the TC: starting string for android logs
Fix a data race in the initial event test client availability handling.
initial_event_test_client::on_availability() accessed other_services_available_ from multiple availability_handler
threads without synchronization. One callback could update an entry in the map while another callback was
iterating the same map with std::all_of, resulting in a data race. This could also trigger the
"subscribe on availability" path multiple times once the last services became available.
Fix
The fix was to guard all accesses to other_services_available_ with availability_mutex_,
decide all_services_are_available_ state while holding that lock, and gate the subscribe on availability
flow so it runs exactly once using subscribed_on_availability_.
Mitigate TCP port assignment log spam
Noticed that whenever an application is started this log would get spammed until a free port was found
lati::init: Could not bind, Address already in use,
Change logging so that it only prints the message after a few failed attempts
This log line was removed with:
8822a53 (accidentally I assume)
and should explain the absence of the SOMEIP ready log
vSomeIP-Lib 3.7.2

Handles the internal header w/ cmake
add posttests_log_level_tests
change log levels
tests: Enables memory_test
logger: fix missing endpoint in TERMINATE
Solve test_boardnet_with_fake_sockets failures
Remove trailing whitespaces
Fix availability, host state, tcp address race in rmc
Allow subscriptions as soon as the service is offered
tests: add "broken behavior" tests
Separate the two endpoint_manager
Trigger routing_state_handler_ in all states
fix drop of udp errors
decouple endpoint_manager_base from routing_manager_base
Remove flaky shutdown_tests
Introduces hybrid mode
Add robustness to flaky stress test
Fix missing clean-up of pinged clients
sei: upgrade log to error
Fix offer test external
only one mutex for serviceinfo
Fix missing lock in udp_client_endpoint
Fix UDP connection availability loop
Nuke RS_DIAGNOSIS
logger_ext header to include in android bp
Move offered services into rmc/rmi
tests: Extends ecu config with custom interface
Fix offer test external v2
tests: Disables SD configuration for IPC testing
add preparation stage to serviceinfo
Test routing state handler states with fake_socket
Move events out out routing_manager_base
fix log to file
Move implementation of event_dispatcher
Cleanup offer test big sd msg
Hybrid mode follow up changes
fix payload not being printed on android
Fix initial event client data race
mitigate TCP port assignment log spam
@fcmonteiro fcmonteiro merged commit c70ced4 into COVESA:master Apr 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants