✨ Features

Wrappers

Following the Gym interface, GEM provides wrappers to easily add and change functionality. Wrappers are registered in the WRAPPER_FACTORY.

The main wrapper types are: observation wrappers, tool wrappers, and episode tracking wrappers.

Note: Order is important! Wrappers should be added in the following order:
tool env wrapper (optional) → observation wrapper (optional) → episode tracking wrapper (optional).

Observation Wrappers

Observation wrappers are used to convert the sequence of game states and agent actions into a string which is used as the prompt for the LLM agent at the next step.

Observation Wrapper Examples

Wrapper name	Description	Example (Mastermind)
no wrapper	The observation string from the environment.	`"At turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."`
`concat`	The sequence of environment observation strings from all previous steps concatenated together.	`"You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."`
`concat_with_action`	The sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together.	"You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."
`concat_chat` (default)	The sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together with the chat template applied to denote "user" (environment) vs "assistant" (agent) turns.	"<\|im_start\|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.<\|im_end\|>\n<\|im_start\|>assistant\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}<\|im_end\|> <\|im_start\|>user\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).<\|im_end\|>\n<\|im_start\|>assistant\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.<\|im_end\|>\n<\|im_start\|>user\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s).<\|im_end\|>\n<\|im_start\|>assistant"
`concat_chat_on_reset`	Same as concat_with_action but the chat template tag is applied at the start.	"<\|im_start\|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."

Tool Env Wrapper

GEM supports integrating multiple tools to the same agent. Tools are handled by the tool wrapper.

The input to env.step() is “action”, a string which is typically the response from the LLM. With the tool env wrapper, when env.step(action) is called, the tool env wrapper iterates through each tool and attempts to parse and execute the action. If any tool is executed successfully, the observation from that tool is returned. If no tool is executed successfully, the action is passed to the wrapped environment.

gem.tools.tool_env_wrapper.ToolEnvWrapper

Attributes

env
The wrapped environment.
tools (List[BaseTool])
A list of tools.
tool_reward (float = 0.05)
Reward if a tool is called.
tool_success_reward (float = 0.05)
Additional reward if the tool call is executed without errors.
max_tool_uses (int = 10)
Maximum number of tool uses allowed.

.reset()

Returns

obs (str)
The ToolEnvWrapper.env.reset() output (ie. the environment question), with a list of the available tools and instructions concatenated onto the end.
info (dict)
Extra info about the episode state.

.step(action: str)

Parameters

action (str)
The response from the LLM agent.

Returns

observation (str)
The output of the tool call if a tool call is found, otherwise the observation from ToolEnvWrapper.env.step().
reward (float)
tool_reward if a tool call is found (+ tool_success_reward if the tool call is executed without errors), otherwise the reward from ToolEnvWrapper.env.step()
terminated (bool)
Whether the episode is terminated.
truncated (bool)
Whether the episode is truncated.
info (dict)
Extra info about the episode state.

Episode Tracking Wrapper

The tracking wrapper logs statistics over the episode, including cumulative_rewards etc. It is not required but can be useful for debugging.

Vectorization

GEM supports collecting multiple episodes in parallel, including asynchronously stepping each of the environments (which may include tool calls etc.). VectorEnv environments automatically reset so that when an episode from one of the parallel environments ends, it is automatically resets and begins the next episode.

Performance tip: Use vectorization for better throughput when training agents on multiple episodes simultaneously.

Benefits

Improved Throughput: Run multiple environments simultaneously for faster data collection
Automatic Reset: Environments automatically reset when episodes end, ensuring continuous operation
Asynchronous Execution: Each environment can step independently, maximizing efficiency
Tool Support: Vectorized environments fully support tool usage across all parallel instances

Usage

Use make_vec() instead of make() when creating environments:

import gem

# Create vectorized environment with 8 parallel instances
vec_env = gem.make_vec("game:GuessTheNumber-v0", num_envs=8)

# Reset all environments
observations, infos = vec_env.reset()

# Step all environments
actions = [env.sample_random_action() for _ in range(8)]
observations, rewards, terminated, truncated, infos = vec_env.step(actions)

Key Features

Automatic Management: No need to manually handle environment resets
Scalable: Easily adjust the number of parallel environments based on your computational resources
Compatible: Works with all GEM environments, tools, and wrappers
Efficient: Optimized for minimal overhead in parallel execution

Use Cases

Vectorization is particularly useful for:

Training reinforcement learning agents
Collecting large datasets efficiently
Running evaluation experiments across multiple episodes
Testing agent performance with statistical significance