Release 3.7.2#1035
Merged
Merged
Conversation
Handles the internal header w/ cmake instead of using macros in the header files It is just annoying to every time there a need to include internal.hpp header, to have the following block #ifdef ANDROID #include "../../configuration/include/internal_android.hpp" #else #include "../../configuration/include/internal.hpp" #endif // ANDROID Therefore, CMakeLists file is now updated so that we can simply write Android internal header file is now located in the following directory: implementation/configuration/include/android/
some messages should not be logged as errors, so downgrade it's level
Enables and updates memory_test After the introduction of b0549a4, the memory_test failures became more obvious. A separate experiment was done, proving the test would fail regardless of having b0549a4. However, as the failure started to be more deterministic, it was purposely disabled w/ b0549a4. Changes in this PR Enables the test again; Increases the message sender interval from 1ms -> 5ms. (It is reasonable to do this, as after b0549a4 the overhead of exchanged messages is higher. Nonetheless, the defined threshold (1.15) is kept).
fprintf will of course not add any..
Summary: Fix data races in the service discovery by replacing function-local static variables with per-instance members, ensuring state is isolated within each service_discovery_impl instance. Details: Two independent races were caused by function-local static variables that were implicitly shared across all service_discovery_impl instances and threads in the process, while only being protected by instance-level mutexes. Issue 1 - Session sequence tracking race service_discovery_impl::check_session_id_sequence() used a function-local static std::map<std::pair<boost::asio::ip::address, bool>, session_t> to track the last seen session per sender and multicast/unicast path. Although on_message() holds sessions_received_mutex_ when calling this function, that mutex only protects state within a single instance. Multiple instances could still concurrently access the shared static map under different mutexes, leading to data races. Issue 2 - Multicast timer initialization race service_discovery_impl::on_message() used a function-local static bool must_start_last_msg_received_timer to control initial arming of last_msg_received_timer_. This flag was shared across all instances but accessed under last_msg_received_timer_mutex_, which is instance-local. Concurrent instances could therefore read/write the shared flag under different mutexes, causing a data race. Fix Regarding Issue 1, sessions_received_by_peer_ is now an instance member of service_discovery_impl, replacing the previously shared static map. This variable is also cleared in service_discovery_impl::start() with the rest of the session tracking state. Regarding Issue 2, must_start_last_msg_received_timer_ was also moved to inside the class, ensuring it's not shared by different instances. NoteThese races can only happen if VSOMEIP_ENABLE_MULTIPLE_ROUTING_MANAGERS is set.
Remove trailing whitespaces from codebase
After adaf7ad fixed the book keeping of the environment data of a routing proxy, a couple of problems remained: the tcp connection data might went out of sync with local_services, local_services might become out of sync with the availability status communciated to the client. To resolve these inconsistencies the same strategy as for adaf7ad was applied: Move the functions accessing the relevant data from rmb -> rmc/rmi. Move the relevant data from rmb -> rmc/rmi Prune the data and the moved functions This strategy was applied to the formerly called local_services_, local_service_history_, guests_ and the pending_subscriptions_. During the process it became obvious that: local_service_history_ and pending_subscriptions_ are only used within the routing_manager_client and that the lifetime of the data contained in guests_ should be bound to the endpoint lifetime for the router (due to lazy connection this is not possible in the client). But before any of these changes were made stress tests were written that would fail reliable with the former implementation (and earlier versions). To have these tests fail reliable there was the need to ensure when exactly a command would be received and forwarded. For this purpose the data_pipeline + the command_gate have been introduced. The data_pipeline is supposed to be the input_queue for any socket. For now the data_pipeline can either directly forward the data, or apply a gate for local_messages. But this is envisioned to be extended by the possibility to also gate someip_messages. It is therefore the generic "back-end" of the inflow control. The command_gate forms the first "front-end" of the inflow control. It is supposed to be used by a test case directly to control tightly what data is when allowed. It is planned that a corresponding someip_gate will be written in the future. With the initial gate one can now write stress tests more easily in which data of multiple sources can be awaited and handed into one application in rapid succession.
Allow remote subscriptions as soon as the service is considered offered by the daemon. Description Despite being a good solution, allowing remote subscription only when the router either received his own multicast offer or the completion of the unicast asynchronous send operation triggered some race conditions. On those occurences, a partner ECU received the offer and sent the subscription before the offering ECU handled its own multicast offer, this triggered a Subscription Nack as the control mechanism introduced in 38fd11e didn't yet enabled the service to accept remote subscriptions. This PR removes the control mechanism as a first step solution, further on, some sort of mitigation shall be implemented to prevent the race condition handled by 38fd11e.
And add an initial test for the broken-but-important-to-keep availability behavior
Since the routing_manger_impl requires no client or server endpoints, there is no need for the inheritance of the endpoint_manager_impl.
Trigger routing_state_handler_ in all routing state cases The routing_state_handler_ was only being triggered on RESUME, but for the control_plugin_test to know the routing state in SUSPEND on the vsomeip-daemon side, it also needs to be triggered on SUSPEND. In addition, the trigger was added in all other routing state cases.
This change fixes an issue where UDP errors were being ignored on receive because the payload length as less than the full SOME/IP message. This length check was introduced to prevent out-of-bounds errors but it also unintentionally prevented errors from being processed since there's never any payload associated with them. The fix involves moving the check into the branch where we didn't receive an error.
This refactoring is done as part of a bigger refactoring separating the client logic from the router logic. Because the client and the router have very different needs of book keeping the routing_manager_base is expected to be replaced. Therefore the usage is replaced by a new interface dedicated to the hosting of local endpoints.
The problematic scenario: The router is the sender and guarantees that the message is send out over one channel before stopping, but shortly afterwards breaks this and the routing connection. On the receiving side the routing connection break might lead to a breakage of all connections.
Introduces hybrid mode via configuration parameter Adds comprehensive hybrid mode networking support allowing applications to dynamically choose between UDS (Unix Domain Sockets) and TCP based on vsomeip configuration file. Protocol changes VSOMEIP_ASSIGN_CLIENT - Can include now address+port Configuration New uds_preferred parameter allowing per-application UDS/TCP preference settings Endpoint Management - Enhanced routing with hybrid mode awareness: Same-machine connections prioritize UDS when configured Cross-machine connections fall back to TCP automatically Configurable per-router and per-application Testing Adds new test suite for this feature Adds helper to verifiy connection type Covers few dfferent scenarios depending on configurations set.
It would happen from time, to time that due to the missing fix from 7673afb the routing_info would really be lost. But sometimes it also happend that the test is cycling twice through UNAVAIL, AVAIL leading to a false positive failure. This PR is concerned with the later problem, ensuring that this test should be stable after 7673afb and this PR.
Ensure that a pinged client is removed from the "to be checked list", when the client deregisters.
If two (client) ECUs do requests for the same service and use the same client-id, it will loss of responses. We detect it, and that log must be an error!
Fix the offer test external Previously, the offer test external was passing without even running the second container, and effectively was doing nothing. As such, fix the test so that it can properly test that the same service cannot be offered by 2 different application, and separate the test cases so that in one test the local offer is the first one, being the one that is accepted, and on the other test case it is the remote offer that is the first one and it is accepted.
Serviceinfo had 3 mutexes. Merge them all into one.
This one function used the socket without the lock. I cross checked. This should be the only one missing. Most recently logs in the receive_cbk were added that not acquire the lock. The other usages required the lock dedicated to this purpose for which reason it should be fine to move the lock in the function itself.
This change fixes an issue in the UDP endpoint state machine which would trigger an available/unavailable loop when trying to send messages to a closed port. The issue originates when the daemon does not observe the STOP OFFER from the service provider, so it keeps trying to send messages to the closed UDP port where the service was previously offered. This results in a "Connection refused" error in the client endpoint, which triggers an unavailability trigger, a reconnect, and an immediate availability trigger. If a client tries to send a message again, the process repeats until the offer expires. The fix for this issue involves stopping the endpoint when we get the "Connection refused" error and waiting for the next OFFER to restart it. This way, we get an unavailable when the error occurs and an available only when the service is offered again, thus preventing the error loop.
Clean-up of RS_DIAGNOSIS Does not seem to used anymore. Nuke it to simplify our routing states
Includes logger_ext header to android bp Follow up to 9392789
The current book-keeping for offered services by a client is overly complicated in routing_manager_base, complicating the synchronization attempts of the provider side of an application. Additionally the services_ struct had different semantics for the routing_manager_client (the services the application itself is offering) and the router (all services known). Therefore all related functions and data structs have been moved to rmi without further change, and a pruned version into rmc. E.g. rmc does not require any remote_services_ map.
Extends ecu config with custom interface Extends ecu config with custom interface while at it: Adds a fake boardnet test where a consumer subscribesto a service that considers multiple fields. Verifies if all the initial events are received.
New fix to the offer test external What lead to the previous failure of the offer test external was the fact that it was using future promises to check service availability, without checking the value. What did happen was that when the client registered the service, it would be unavailable and it would set the promise even before the service became available for the first time. As such, use a bool and a cv to ensure that the client gets the correct availability of the service, and as such the client will only stop after getting the unavailability after it has subscribed to the service.
Disables SD configuration for IPC testing After the b0549a4, the Service Discovery configuration is no longer required to have it enabled, even for IPC testing. (Previously, when router was the one offering this configuration would be a requirement to make it testable). Furthermore, having the SD enabled, causes lots of verbosity and disturbs log analysis.
When service provider's routing manager is processing the offers, it adds each service to the services_ maps, but the serviceinfo for each service is still incomplete as it has yet to do init_service_info. This last function checks the services reliability, creates the endpoints and sets them accordingly. However, consumers are sending FINDS which service discovery processes, sees the service in the map, it thinks all info is there, so it sends the unicast OFFER, but with missing options. To fix this, introduce a initial state to the serviceinfo, preparation, which is only left once the endpoints have been set.
Test routing state handler states with fake_socket test_connection_control.cpp updated to add a new test that verifies the routing_state_handler is correctly triggered when set_routing_state is called.
Move the event and eventsgroup maps out of rmb into rmc/rmi. This enables dedicated client vs. router refactorings. Pruning is skipped almost completely in this PR, only shadow events are avoided in the rmc and the virtual "is_routing_manager" is no longer required.
This change fixes an issue in environments with multiple vsomeip apps trying to log to the same file, where they could overwrite each others logs. The fix involves adding the std::ios_base::app flag to append logs to the end-of-file instead of after the previous write. Note that due to how the logs are buffered before being written to the file, the total order of the logs between different apps might not be maintained (i.e. interleaving of timestamps). This change also adds the app name to these logs to help to tell them apart.
routing_manager_base shall no longer implement the event_dispatcher or the routing_host interface.
Cleanup the offer test big sd message Cleanup the offer test big sd message
Follow up from hybrid mode PR Add retry mechanism for local uds acceptor creation, needed for situations where guest applications were started before the router Logging changes New test cases focused on connection breaks between guest apps and router New unit test for assign_client_command
Payload was not being printed due to missing negation on header-only check Also bring back the TC: starting string for android logs
Fix a data race in the initial event test client availability handling. initial_event_test_client::on_availability() accessed other_services_available_ from multiple availability_handler threads without synchronization. One callback could update an entry in the map while another callback was iterating the same map with std::all_of, resulting in a data race. This could also trigger the "subscribe on availability" path multiple times once the last services became available. Fix The fix was to guard all accesses to other_services_available_ with availability_mutex_, decide all_services_are_available_ state while holding that lock, and gate the subscribe on availability flow so it runs exactly once using subscribed_on_availability_.
Mitigate TCP port assignment log spam Noticed that whenever an application is started this log would get spammed until a free port was found lati::init: Could not bind, Address already in use, Change logging so that it only prints the message after a few failed attempts
This log line was removed with: 8822a53 (accidentally I assume) and should explain the absence of the SOMEIP ready log
vSomeIP-Lib 3.7.2 Handles the internal header w/ cmake add posttests_log_level_tests change log levels tests: Enables memory_test logger: fix missing endpoint in TERMINATE Solve test_boardnet_with_fake_sockets failures Remove trailing whitespaces Fix availability, host state, tcp address race in rmc Allow subscriptions as soon as the service is offered tests: add "broken behavior" tests Separate the two endpoint_manager Trigger routing_state_handler_ in all states fix drop of udp errors decouple endpoint_manager_base from routing_manager_base Remove flaky shutdown_tests Introduces hybrid mode Add robustness to flaky stress test Fix missing clean-up of pinged clients sei: upgrade log to error Fix offer test external only one mutex for serviceinfo Fix missing lock in udp_client_endpoint Fix UDP connection availability loop Nuke RS_DIAGNOSIS logger_ext header to include in android bp Move offered services into rmc/rmi tests: Extends ecu config with custom interface Fix offer test external v2 tests: Disables SD configuration for IPC testing add preparation stage to serviceinfo Test routing state handler states with fake_socket Move events out out routing_manager_base fix log to file Move implementation of event_dispatcher Cleanup offer test big sd msg Hybrid mode follow up changes fix payload not being printed on android Fix initial event client data race mitigate TCP port assignment log spam
anaritarodrigues
approved these changes
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.