MCTS Tree Components¶
- class gymcts.gymcts_node.GymctsNode(action: int | None, parent: TGymctsNode | None, env_reference: GymctsABC)[source]¶
Bases:
object- best_action_weight: float = 0.05¶
- get_best_action() int[source]¶
Returns the best action of the node. The best action is the action that has the highest score. The score is calculated using the get_score() method. The best action is the action that has the highest score. The best action is the action that has the highest score.
- Returns:
the best action of the node.
- get_max_value() float[source]¶
Returns the maximum value of the node. The maximum value is the maximum value of the node.
- Returns:
the maximum value of the node.
- get_random_child() TGymctsNode[source]¶
Returns a random child of the node. A random child is a child that is selected randomly from the list of children. :return:
- get_root() TGymctsNode[source]¶
Returns the root node of the tree. The root node is the node that has no parent.
- Returns:
the root node of the tree.
- get_score() float[source]¶
Returns the score of the node. The score is calculated using the mean value and the maximum value of the node. The score is calculated using the formula: score = (1 - a) * mean_value + a * max_value where a is the best action weight.
- Returns:
the score of the node.
- is_leaf() bool[source]¶
Returns true if the node is a leaf node. A leaf node is a node that has no children. A leaf node is a node that has no children.
- Returns:
true if the node is a leaf node, false otherwise.
- is_root() bool[source]¶
Returns true if the node is a root node. A root node is a node that has no parent.
- Returns:
true if the node is a root node, false otherwise.
- max_tree_depth()[source]¶
Returns the maximum depth of the tree. The depth of a node is the number of edges from the node to the root node.
- Returns:
the maximum depth of the tree.
- max_value: float = -inf¶
- mean_value: float = 0¶
- min_value: float = inf¶
- n_children_recursively()[source]¶
Returns the number of children of the node recursively. The number of children of a node is the number of children of the node plus the number of children of all children of the node.
- Returns:
the number of children of the node recursively.
- score_variate: Literal['UCT_v0', 'UCT_v1', 'UCT_v2'] = 'UCT_v0'¶
- state: Any = None¶
- terminal: bool = False¶
- traverse_nodes() Generator[TGymctsNode, None, None][source]¶
Traverse the tree and yield all nodes in the tree.
- Returns:
a generator that yields all nodes in the tree.
- tree_policy_score()[source]¶
TODO: update docstring
The score for an action that would transition between the parent and child. For vanilla MCTS, this is the UCB1 score.
The UCB1 score is calculated using the formula:
UCT (Upper Confidence Bound applied to Trees) exploration terms:
- UCT_v0:
c * √( 2 * ln(N(s)) / N(s,a) )
- UCT_v1:
c * √( ln(N(s)) / (1 + N(s,a)) )
- UCT_v2:
c * ( √(N(s)) / (1 + N(s,a)) )
- Where:
N(s) = number of times state s has been visited N(s,a) = number of times action a was taken from state s c = exploration constant
where: - mean_value is the mean value of the node - c is a constant that controls the exploration-exploitation trade-off (GymctsNode.ubc_c) - parent_visit_count is the number of times the parent node has been visited - visit_count is the number of times the node has been visited
If the node has not been visited yet, the score is set to infinity.
prior_score = child.prior * math.sqrt(parent.visit_count) / (child.visit_count + 1)
- if child.visit_count > 0:
# The value of the child is from the perspective of the opposing player value_score = -child.value()
- else:
value_score = 0
return value_score + prior_score
- Returns:
- ubc_c = 0.707¶
UCT (Upper Confidence Bound applied to Trees) exploration terms:
- UCT 0:
c * √( 2 * ln(N(s)) / N(s,a) )
- UCT 1:
c * √( ln(N(s)) / (1 + N(s,a)) )
- UCT 2:
c * ( √(N(s)) / (1 + N(s,a)) )
- Where:
N(s) = number of times state s has been visited N(s,a) = number of times action a was taken from state s c = exploration constant
- visit_count: int = 0¶