Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions examples/rl/actor_critic_cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
import os

os.environ["KERAS_BACKEND"] = "tensorflow"
import gym
import gymnasium as gym
import numpy as np
import keras
from keras import ops
Expand Down Expand Up @@ -98,13 +98,13 @@
episode_count = 0

while True: # Run until solved
state = env.reset()[0]
obs, _ = env.reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The env.reset() call now returns the observation into the obs variable, but the rest of the loop expects this value to be in the state variable. This will cause a NameError on line 106 when state is used before it's assigned a value. To fix this, the observation should be assigned to state.

Suggested change
obs, _ = env.reset()
state, _ = env.reset()

episode_reward = 0
with tf.GradientTape() as tape:
for timestep in range(1, max_steps_per_episode):

state = ops.convert_to_tensor(state)
state = ops.expand_dims(state, 0)
state = tf.convert_to_tensor(state)
state = tf.expand_dims(state, 0)
Comment on lines +106 to +107
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This change replaces keras.ops with direct tf calls. This makes the code backend-specific and inconsistent with other parts of the script that use keras.ops (e.g., lines 116 and 160). For Keras examples, it's best practice to use the backend-agnostic keras.ops API where possible.

The ValueError mentioned in the PR description might have been a symptom of the uninitialized state variable, which is addressed in another comment. After fixing that issue, keras.ops should be used here for consistency.

Suggested change
state = tf.convert_to_tensor(state)
state = tf.expand_dims(state, 0)
state = ops.convert_to_tensor(state)
state = ops.expand_dims(state, 0)


# Predict action probabilities and estimated future rewards
# from environment state
Expand Down
Loading