This repository contains a solution to "The Way of Water" programming exercise from the 2024 Dynamic Programming and Optimal Control course. The problem involves navigating a drone safely across a discretized lake grid to its designated position, avoiding collisions with static drones and a pursuing swan, while accounting for environmental disturbances such as lake currents.
The goal is to compute an optimal policy for navigating a drone through a grid-like state space to reach a goal position with minimal cost. The cost is determined by:
- Time taken to reach the goal.
- Energy spent controlling the drone.
- The number of drones deployed due to failures (e.g., collisions).
- State Space: Defined by the position of the drone and the swan.
- Action Space: Represents eight possible movement directions or staying still.
- Disturbances: Include random currents, swan movement, and spawning of a new drone after failure.
- Dynamics: Governed by deterministic inputs and probabilistic disturbances.
The solution implements the following features to achieve efficient and correct computations:
-
Parallelized Computation of Transition Probabilities (
P_ij)- Leveraging Numpy indexing, this feature accelerates the computation of state transition probabilities, essential for dynamic programming algorithms.
-
Value Iteration with Gauss-Seidel Updates
- This implementation of the value iteration algorithm uses Gauss-Seidel updates for faster convergence. By updating states sequentially within each iteration, it reduces computational overhead compared to standard synchronous methods.
-
Asynchronous Policy Iteration
- Used in carefully selected scenarios, asynchronous policy iteration allows state updates without strict adherence to sequential order. This approach improves performance in environments with sparse dependencies.
-
Parallelized Stage Cost Computation
- The calculation of expected stage costs (
Q) is parallelized to handle the large state space efficiently. This ensures scalability and reduces runtime significantly.
- The calculation of expected stage costs (
ComputeTransitionProbabilities.py: Computes the transition probabilities for all state-action pairs.ComputeExpectedStageCosts.py: Computes the expected stage costs for all state-action pairs.Solver.py: Implements dynamic programming algorithms to compute the optimal cost and policy.utils.py: Contains shared utility functions such as state and action indexing.
To run the solution:
- Set up the conda environment:
conda env create -f environment.yml conda activate dpoc_pe
- Run the main script:
python3 main.py
- Use the visualization tool to debug and analyze results:
python3 visualization.py
- Correctness: The solution is validated against predefined test cases.
- Performance: Parallelization and algorithmic optimizations ensure efficient handling of large state spaces.
- Scoring: Submissions are ranked based on average runtime over multiple problem instances.
This repository is a submission for the programming exercise and adheres to the specified structure and requirements. If any further detail is needed you can refere to [email protected] or [email protected]