-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Version information
Observed behavior
- The federation manager never achieves the synchronization point readyToResign
- A federate will not receive the simulation end interaction until one timestep after the end time.
- A federate will never achieve the sychronization point readyToResign
Expected behavior
The federation manager never achieves the synchronization point readyToResign
The terminateSimulation method is called when the federation manager exits either due to stop time or the terminate command. This method invokes killEntireFederation, which calls System.exit(). As a result, everything after the terminateSimulation call is dead code.
cpswt-core/cpswt-core/federation-manager/src/main/java/org/cpswt/hla/FederationManager.java
Lines 685 to 726 in f7d580b
| public void terminateSimulation() { | |
| _killingFederation = true; | |
| recordMainExecutionLoopEndTime(); | |
| this.setFederateState(FederateState.TERMINATING); | |
| synchronized (super.lrc) { | |
| try { | |
| SimEnd e = new SimEnd(); | |
| e.set_originFed(getFederateId()); | |
| e.set_sourceFed(getFederateId()); | |
| double tmin = time.getTime() + super.getLookAhead(); | |
| e.sendInteraction(getLRC(), tmin); | |
| } catch (Exception e) { | |
| e.printStackTrace(); | |
| } | |
| } | |
| running = false; | |
| paused = false; | |
| // Wait for 2 seconds for SimEnd to reach others | |
| CpswtUtils.sleep(CpswtDefaults.SimEndWaitingTimeMillis); | |
| logger.info("Simulation terminated"); | |
| this.setFederateState(FederateState.TERMINATED); | |
| // Wait for 10 seconds for Simulation to gracefully exit | |
| CpswtUtils.sleep(2000); | |
| // If simulation has still not exited gracefully, run kill command | |
| killEntireFederation(); | |
| } | |
| public void killEntireFederation() { | |
| _killingFederation = true; | |
| recordMainExecutionLoopEndTime(); | |
| System.exit(0); | |
| } |
Due to this System.exit() command, none of the below code related to resignation is executed. The System.exit() command should be removed, and the federation manager life cycle adjusted to exit without error for the two cases of stop time and the terminate command.
cpswt-core/cpswt-core/federation-manager/src/main/java/org/cpswt/hla/FederationManager.java
Lines 525 to 533 in f7d580b
| prepareForFederatesToResign(); | |
| if(useSyncPoints) { | |
| logger.info("Waiting for \"ReadyToResign\" ... "); | |
| readyToResign(); | |
| logger.info("Done with resign"); | |
| } | |
| waitForFederatesToResign(); |
A federate will not receive the simulation end interaction until one timestep after the end time.
The federation manager sends simulation end as a TSO message:
cpswt-core/cpswt-core/federation-manager/src/main/java/org/cpswt/hla/FederationManager.java
Lines 693 to 697 in f7d580b
| SimEnd e = new SimEnd(); | |
| e.set_originFed(getFederateId()); | |
| e.set_sourceFed(getFederateId()); | |
| double tmin = time.getTime() + super.getLookAhead(); | |
| e.sendInteraction(getLRC(), tmin); |
When SynchronizedFederate receives the simulation end message (in the next timestep), it stores it in the variable _receivedSimEnd. This variable is processed in a method called enteredTimeGrantedState.
cpswt-core/cpswt-core/federate-base/src/main/java/org/cpswt/hla/SynchronizedFederate.java
Lines 1036 to 1067 in f7d580b
| protected void enteredTimeGrantedState() { | |
| if(_receivedSimEnd != null) { | |
| handleIfSimEnd(SimEnd.get_handle(), _receivedSimEnd, null); | |
| } | |
| } | |
| protected void handleIfSimEnd(int interactionClass, ReceivedInteraction theInteraction, LogicalTime theTime) { | |
| if (SimEnd.match(interactionClass)) { | |
| logger.info("{}: SimEnd interaction received, exiting...", getFederateId()); | |
| try { | |
| // getLRC().tick(); | |
| getLRC().resignFederationExecution(ResignAction.DELETE_OBJECTS); | |
| } catch (Exception e) { | |
| logger.error("Error during resigning federate: {}", getFederateId()); | |
| logger.error(e.getMessage()); | |
| } | |
| // Wait for 10 seconds for Federation Manager to recognize that the federate has resigned. | |
| try { | |
| Thread.sleep(CpswtDefaults.SimEndWaitingTimeMillis); | |
| } catch (Exception e) { | |
| logger.error(e.getMessage()); | |
| } | |
| // TODO: CONSIDER SETTING UP A SHUTDOWN HOOK | |
| // this one will terminate the JVM not only the current process | |
| Runtime.getRuntime().exit(0); | |
| // Exit | |
| System.exit(0); | |
| } | |
| } |
The enteredTimeGrantedState method is invoked by each Java federate after a time grant is received from the RTI:
The federation manager (as per the prior issue) calls System.exit() at t = EndTime. However, the federates will not receive simulation end until t = (EndTime + 1). As a result, the federates execution is blocked until the federation manager invokes exit. Because the federation manager exits in an ungraceful manner (it never resigns; the resignation is in the dead code block), the federates have to wait until the JGroups heartbeat messages detect the federation manager as crashed. The end result is that a federate will not advance to t = (EndTime + 1) to detect simulation end for roughly 30 seconds after the simulation has terminated.
It is likely impossible to have all federates exit in the same logical timestep (due to different step sizes). It seems to make the most sense that simulation end should be sent as an RO message without a timestamp. Then before the federation manager (and any federate) waits for synchronization on readyToResign, it should turn off time regulation to prevent it from blocking the logical time execution of the federates with abnormal step sizes.
A federate will never achieve the sychronization point readyToResign
The enteredTimeGrantedState method that is invoked by SynchronizedFederate when it receives simulation end leads to another call to System.exit().
cpswt-core/cpswt-core/federate-base/src/main/java/org/cpswt/hla/SynchronizedFederate.java
Lines 1036 to 1067 in f7d580b
| protected void enteredTimeGrantedState() { | |
| if(_receivedSimEnd != null) { | |
| handleIfSimEnd(SimEnd.get_handle(), _receivedSimEnd, null); | |
| } | |
| } | |
| protected void handleIfSimEnd(int interactionClass, ReceivedInteraction theInteraction, LogicalTime theTime) { | |
| if (SimEnd.match(interactionClass)) { | |
| logger.info("{}: SimEnd interaction received, exiting...", getFederateId()); | |
| try { | |
| // getLRC().tick(); | |
| getLRC().resignFederationExecution(ResignAction.DELETE_OBJECTS); | |
| } catch (Exception e) { | |
| logger.error("Error during resigning federate: {}", getFederateId()); | |
| logger.error(e.getMessage()); | |
| } | |
| // Wait for 10 seconds for Federation Manager to recognize that the federate has resigned. | |
| try { | |
| Thread.sleep(CpswtDefaults.SimEndWaitingTimeMillis); | |
| } catch (Exception e) { | |
| logger.error(e.getMessage()); | |
| } | |
| // TODO: CONSIDER SETTING UP A SHUTDOWN HOOK | |
| // this one will terminate the JVM not only the current process | |
| Runtime.getRuntime().exit(0); | |
| // Exit | |
| System.exit(0); | |
| } | |
| } |
As a result, all code after the main processing loop of the code generated Java federates is dead code. In this particular case, I do not believe the dead code even includes readyToResign; I don't think Java federates ever call the method readyToResign. However, it does prevent the execution of any user code appended to the end of the main execution loop.
Again, the System.exit() call should be removed and the life cycle of the Java federates should be adjusted to exit through the return statement in the Java implementation main method. This will require the while condition for Java federates to be set to something that can eventually evaluate to false:
Steps to reproduce issue
Terminate a federation and observe the shutdown behavior of each federate.