DeepCopy Wrapper

class gymcts.gymcts_deepcopy_wrapper.DeepCopyMCTSGymEnvWrapper(env, action_mask_fn: str | Callable[[Env], ndarray] | None = None, buffer_length: int = 100, record_video: bool = False)[source]

Bases: GymctsABC, Wrapper

A wrapper for gym environments that implements the GymctsABC interface. It uses deepcopys as state representation. Please note that this is not the most efficient way to implement the state representation. It is supposed to be used to see if your use-case works well with the MCTS algorithm. If it does, you can consider implementing all GymctsABC methods in a more efficient way.

action_masks() ndarray | None[source]

Returns the action masks for the environment. If the action_mask_fn is not set, it returns None. :return: the action masks for the environment

get_state() Any[source]

Returns the current state of the environment as a deepcopy of the environment. :return: a deepcopy of the environment

get_valid_actions() list[int][source]

Returns a list of valid actions for the current state of the environment. This used to obtain potential actions/subsequent sates for the MCTS tree.

Returns:

the list of valid actions

is_terminal() bool[source]

Returns True if the environment is in a terminal state, False otherwise.

Returns:

True if the environment is in a terminal state, False otherwise.

load_state(state: Any) None[source]

The load_state method is not implemented. The state is loaded by replacing the env with the ‘state’ (the copy provided my ‘get_state’). ‘self’ in a method cannot be replaced with another object (as far as i know).

Parameters:

state – a deepcopy of the environment

Returns:

None

rollout() float[source]

Performs a rollout from the current state of the environment and returns the return (sum of rewards) of the rollout.

Returns:

the return of the rollout

step(action: WrapperActType) tuple[WrapperObsType, SupportsFloat, bool, bool, dict[str, Any]][source]

Performs a step in the environment. This method is used to update the wrapper with the new state and the new action, to realize the terminal state functionality.

Parameters:

action – action to perform in the environment

Returns:

the step tuple of the environment (obs, reward, terminated, truncated, info)