✨ Features
Wrappers
Following the Gym interface, GEM provides wrappers to easily add and change functionality. Wrappers are registered in the WRAPPER_FACTORY.
The main wrapper types are: observation wrappers, tool wrappers, and episode tracking wrappers.
tool env wrapper (optional) → observation wrapper (optional) → episode tracking wrapper (optional).
Observation Wrappers
Observation wrappers are used to convert the sequence of game states and agent actions into a string which is used as the prompt for the LLM agent at the next step.
Observation Wrapper Examples
Wrapper name | Description | Example (Mastermind) |
---|---|---|
no wrapper | The observation string from the environment. | "At turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
concat | The sequence of environment observation strings from all previous steps concatenated together. | "You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
concat_with_action | The sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together. | "You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
concat_chat (default) | The sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together with the chat template applied to denote "user" (environment) vs "assistant" (agent) turns. | "<|im_start|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.<|im_end|>\n<|im_start|>assistant\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}<|im_end|> <|im_start|>user\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).<|im_end|>\n<|im_start|>assistant\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.<|im_end|>\n<|im_start|>user\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s).<|im_end|>\n<|im_start|>assistant" |
concat_chat_on_reset | Same as concat_with_action but the chat template tag is applied at the start. | "<|im_start|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
Tool Env Wrapper
GEM supports integrating multiple tools to the same agent. Tools are handled by the tool wrapper.
The input to env.step()
is “action”, a string which is typically the response from the LLM. With the tool env wrapper, when env.step(action)
is called, the tool env wrapper iterates through each tool and attempts to parse and execute the action. If any tool is executed successfully, the observation from that tool is returned. If no tool is executed successfully, the action is passed to the wrapped environment.
gem.tools.tool_env_wrapper.ToolEnvWrapper
Attributes
- envThe wrapped environment.
- tools (List[BaseTool])A list of tools.
- tool_reward (float = 0.05)Reward if a tool is called.
- tool_success_reward (float = 0.05)Additional reward if the tool call is executed without errors.
- max_tool_uses (int = 10)Maximum number of tool uses allowed.
Returns
- obs (str)The ToolEnvWrapper.env.reset() output (ie. the environment question), with a list of the available tools and instructions concatenated onto the end.
- info (dict)Extra info about the episode state.
Parameters
- action (str)The response from the LLM agent.
Returns
- observation (str)The output of the tool call if a tool call is found, otherwise the observation from ToolEnvWrapper.env.step().
- reward (float)tool_reward if a tool call is found (+ tool_success_reward if the tool call is executed without errors), otherwise the reward from ToolEnvWrapper.env.step()
- terminated (bool)Whether the episode is terminated.
- truncated (bool)Whether the episode is truncated.
- info (dict)Extra info about the episode state.
Episode Tracking Wrapper
The tracking wrapper logs statistics over the episode, including cumulative_rewards etc. It is not required but can be useful for debugging.
Vectorization
GEM supports collecting multiple episodes in parallel, including asynchronously stepping each of the environments (which may include tool calls etc.). VectorEnv environments automatically reset so that when an episode from one of the parallel environments ends, it is automatically resets and begins the next episode.
Benefits
- Improved Throughput: Run multiple environments simultaneously for faster data collection
- Automatic Reset: Environments automatically reset when episodes end, ensuring continuous operation
- Asynchronous Execution: Each environment can step independently, maximizing efficiency
- Tool Support: Vectorized environments fully support tool usage across all parallel instances
Usage
Use make_vec()
instead of make()
when creating environments:
import gem
# Create vectorized environment with 8 parallel instances
vec_env = gem.make_vec("game:GuessTheNumber-v0", num_envs=8)
# Reset all environments
observations, infos = vec_env.reset()
# Step all environments
actions = [env.sample_random_action() for _ in range(8)]
observations, rewards, terminated, truncated, infos = vec_env.step(actions)
Key Features
- Automatic Management: No need to manually handle environment resets
- Scalable: Easily adjust the number of parallel environments based on your computational resources
- Compatible: Works with all GEM environments, tools, and wrappers
- Efficient: Optimized for minimal overhead in parallel execution
Use Cases
Vectorization is particularly useful for:
- Training reinforcement learning agents
- Collecting large datasets efficiently
- Running evaluation experiments across multiple episodes
- Testing agent performance with statistical significance