DeepCopy Wrapper¶
- class gymcts.gymcts_deepcopy_wrapper.DeepCopyMCTSGymEnvWrapper(env, action_mask_fn: str | Callable[[Env], ndarray] | None = None, buffer_length: int = 100, record_video: bool = False)[source]¶
Bases:
GymctsABC,WrapperA wrapper for gym environments that implements the GymctsABC interface. It uses deepcopys as state representation. Please note that this is not the most efficient way to implement the state representation. It is supposed to be used to see if your use-case works well with the MCTS algorithm. If it does, you can consider implementing all GymctsABC methods in a more efficient way.
- action_masks() ndarray | None[source]¶
Returns the action masks for the environment. If the action_mask_fn is not set, it returns None. :return: the action masks for the environment
- get_state() Any[source]¶
Returns the current state of the environment as a deepcopy of the environment. :return: a deepcopy of the environment
- get_valid_actions() list[int][source]¶
Returns a list of valid actions for the current state of the environment. This used to obtain potential actions/subsequent sates for the MCTS tree.
- Returns:
the list of valid actions
- is_terminal() bool[source]¶
Returns True if the environment is in a terminal state, False otherwise.
- Returns:
True if the environment is in a terminal state, False otherwise.
- load_state(state: Any) None[source]¶
The load_state method is not implemented. The state is loaded by replacing the env with the ‘state’ (the copy provided my ‘get_state’). ‘self’ in a method cannot be replaced with another object (as far as i know).
- Parameters:
state – a deepcopy of the environment
- Returns:
None
- rollout() float[source]¶
Performs a rollout from the current state of the environment and returns the return (sum of rewards) of the rollout.
- Returns:
the return of the rollout
- step(action: WrapperActType) tuple[WrapperObsType, SupportsFloat, bool, bool, dict[str, Any]][source]¶
Performs a step in the environment. This method is used to update the wrapper with the new state and the new action, to realize the terminal state functionality.
- Parameters:
action – action to perform in the environment
- Returns:
the step tuple of the environment (obs, reward, terminated, truncated, info)