Action History Wrapper¶

class gymcts.gymcts_action_history_wrapper.ActionHistoryMCTSGymEnvWrapper(env, action_mask_fn: str | Callable[[Env], ndarray] | None = None, buffer_length: int = 100)[source]¶

Bases: GymctsABC, Wrapper

A wrapper for gym environments that implements the GymctsABC interface. It uses the action history as state representation. Please note that this is not the most efficient way to implement the state representation. It is supposed to be used to see if your use-case works well with the MCTS algorithm. If it does, you can consider implementing all GymctsABC methods in a more efficient way. The action history is a list of actions taken in the environment. The state is represented as a list of actions taken in the environment. The state is used to restore the environment using the load_state method.

It is supposed to be used to see if your use-case works well with the MCTS algorithm. If it does, you can consider implementing all GymctsABC methods in a more efficient way.

action_masks() → ndarray | None[source]¶

Returns the action masks for the environment. If the action_mask_fn is not set, it returns None.

Returns:

get_state() → list[int][source]¶

Returns the current state of the environment. The state is a list of actions taken in the environment, namely all action that have been taken in the environment so far (since the last reset).

Returns:: a list of actions taken in the environment

get_valid_actions() → list[int][source]¶

Returns a list of valid actions for the current state of the environment.

Returns:: a list of valid actions

is_terminal() → bool[source]¶

Returns True if the environment is in a terminal state, False otherwise.

Returns:

load_state(state: list[int]) → None[source]¶

Loads the state of the environment. The state is a list of actions taken in the environment.

The environment is reset and all actions in the state are performed in order to restore the environment to the same state.

This works only for deterministic environments!

Parameters:: state – the state to load
Returns:: None

rollout() → float[source]¶

Performs a random rollout from the current state of the environment and returns the return (sum of rewards) of the rollout.

Returns:: the return of the rollout

step(action: WrapperActType) → tuple[WrapperObsType, SupportsFloat, bool, bool, dict[str, Any]][source]¶

Performs a step in the environment. It adds the action to the action history and updates the terminal flag.

Parameters:: action – action to perform in the environment
Returns:: the step tuple of the environment (obs, reward, terminated, truncated, info)