BlueFixedActionWrapper
Bases: BaseWrapper
Maintains action spaces with fixed sizes and ordering across episodes.
On initialization, this wrapper creates a sorted list of all the hosts and subnets each agent can interact with in the CC4 EnterpriseScenario.
On reset, the action space is populated using these sorted lists, translating hostnames to IP addresses where needed, such that any given action index will always correspond to a specific host. If a host does not exist in the current episode, the action will be replaced with a no-op (Sleep) action. Agents can check whether an action corresponds to an active host by consulting action_mask().
Note: This wrapper does not change the observation space. See the
companion wrapper BlueFlatWrapper
for vector observations of
fixed length and order.
Attributes
Functions
__init__
Initialize the BlueFixedActionWrapper for blue agents.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
env |
CybORG
|
An instance of CybORG. Must not modify action_space. |
required |
pad_spaces |
bool
|
Ensure all observation and action spaces are the same size across all agents by padding the space with the Sleep action. This is a requirement for some RL libraries. |
False
|
*args |
Extra arguments are ignored. |
()
|
|
**kwargs |
Extra arguments are ignored. |
()
|
action_labels
Returns an ordered list of human-readable actions.
action_mask
Returns an ordered list corresponding to whether an action is valid or not.
action_space
cached
Returns the discrete space corresponding to the given agent.
action_spaces
cached
Returns discrete space with optional padding for each agent.
get_action_space
Returns all information about an agent's action space.
hosts
Returns an ordered list of names of hosts the agent can interact with.
reset
Reset the environment and update the action space.
Parameters: All arguments are forwarded to the env provided to init.
Returns:
Name | Type | Description |
---|---|---|
observation |
dict[str, Any]
|
The observations corresponding to each agent. Forwarded directly from the env provided to init. |
info : dict[str, dict] Information dictionaries corresponding to each agent. Each dictionary contains the key "action_mask" that maps to a list[bool] where each element corresponds to whether the action at the element's index targets a host or subnet that exists for the duration of the episode.
step
step(actions: dict[str, int | Action] = None, messages: dict[str, Any] = None, **kwargs: dict[str, Any]) -> tuple[dict[str, Any], dict[str, float], dict[str, bool], dict[str, bool], dict[str, dict]]
Take a step in the enviroment using action indices.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actions |
dict[str, int]
|
The action index corresponding to each agent.
These indices will be mapped to CybORG actions using the equivalent of |
None
|
messages |
dict[str, Any]
|
Messages from each agent. If an agent does not specify a message, it will send an empty message. |
None
|
**kwargs |
dict[str, Any]
|
Extra keywords are forwarded. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
observation |
dict[str, Any]
|
The observations corresponding to each agent. Forwarded directly from the env provided to init. |
rewards |
dict[str, float]
|
Rewards for each agent. |
terminated |
dict[str, bool]
|
Flags whether the agent finished normally. |
truncated |
dict[str, bool]
|
Flags whether the agent was stopped by env. |
info |
dict[str, dict]
|
Information dictionaries corresponding to each agent. Each dictionary contains the key "action_mask" that maps to a list[bool] where each element corresponds to whether the action at the element's index targets a host or subnet that exists for the duration of the episode. |