🌍 Environments
Overview
GEM supports a diverse range of environments and makes it easy to add your own. GEM provides four main categories of environments, each designed for different types of agent training and evaluation.
All GEM environments follow a consistent interface pattern:
env.reset()
- Initialize/reset the environmentenv.step(action)
- Take an action and get the next stateenv.sample_random_action()
- Get a random valid action
This design closely follows the Gymnasium standard, making it easy to integrate with existing RL frameworks and tools.
Games
Interactive game environments including Sudoku, Minesweeper, Wordle, and more from the TextArena collection.
We maintain local versions of many of the TextArena games with (i) improved dense game reward design and (ii) compatible gym-style interface.
Available Game Environments
Environment | Description |
---|---|
game:GuessTheNumber | The agent has multiple guesses to guess the hidden number. The agent receives whether the hidden number is higher or lower than its guess. |
game:Mastermind | The agent has multiple guesses to guess the hidden code. The agent receives black and white pegs depending on the number of correct digits in the right and wrong places. |
game:Minesweeper | The agent must reveal all safe grid squares without revealing a mine. For each revealed square the agent receives the number of adjacent squares that contain mines. |
game:Wordle | The agent must guess the hidden word. After each turn the agent receives feedback ("G"=correct letter + correct position, "Y"=correct letter + incorrect position, "X"=incorrect letter). |
game:FifteenPuzzle | Arrange tiles on the board into ascending order using the empty space to slide tiles into different positions. |
game:Hangman | The objective of the game is to guess the word by providing one letter guesses or the entire word. |
game:Sudoku | Classic Sudoku Game. `easy` version renders a 4x4 board. |
game:TowerofHanoi | a classic single-player puzzle game where the objective is to move a stack of disks from one tower to another following specific rules. |
Difficulty Variants
Each environment additionally has -easy
, -hard
, and -random
variants, where -random
denotes that the environment is set to a random level of difficulty at each reset.
Adding New Games
Adding new games is easy. Simply include .step()
, .reset()
functions and register the environment with a new name.
Math
Mathematical reasoning environments with automatic answer parsing and checking, compatible with various math datasets.
GEM’s math environment class includes automatic answer parsing and checking and is designed to be compatible with any math dataset. To add a new environment simply register the dataset. A typical use case is combining these with access to the python tool to train the agent to utilize code.
Available Math Environments
Environment | Dataset |
---|---|
math:ASDIV2k | ASDIV-2k |
math:GSM8k | GSM-8k |
math:Math12k | MATH-12k |
math:ORZ57k | ORZ-57k |
Features
- Automatic Answer Parsing: Built-in parsing for mathematical expressions and numerical answers
- Answer Checking: Automatic validation of agent responses against ground truth
- Dataset Compatibility: Works with any math dataset that follows the standard format
- Tool Integration: Designed to work seamlessly with Python tool for computational assistance
Code
Code generation and evaluation environments that automatically test solutions in sandboxed environments.
GEM’s code environment class automatically evaluates success by running the test cases in a sandbox. This class can be used with any code dataset consisting of the task and test cases.
Available Code Environments
Environment | Dataset |
---|---|
code:CodeContest | CodeContest |
code:Taco8k | TACO-8k |
Features
- Automatic Code Evaluation: Runs test cases in a secure sandbox environment
- Test Case Validation: Compares agent-generated code against provided test cases
- Sandbox Diversity: Two execution options are available.
- Sandboxed environment using bubblewrap
- Implementation with Python’s
subprocess
code.
- Dataset Diversity: Compatible with any code dataset that includes problems and test cases
Question-Answering
QA environments designed for integrated search tool usage to train agents in information retrieval and reasoning.
GEM’s question-answering environments are designed to allow integrated search tool usage to train the agent to use search functionality. Additional question-answering environments can be added by simply registering the dataset.
Available QA Environments
Environment | Dataset |
---|---|
qa:NaturalQuestions | NaturalQuestions |
qa:HotpotQA | HotpotQA |
logic:RuleTaker-d0 | RuleTaker-d0-70k |
logic:RuleTaker-d1 | RuleTaker-d1-70k |
logic:RuleTaker-d2 | RuleTaker-d2-70k |
logic:RuleTaker-d3 | RuleTaker-d3-70k |
logic:RuleTaker-d5 | RuleTaker-d5-70k |
Environment Types
- Natural Questions: Real-world questions that people ask search engines, requiring factual knowledge and reasoning
- HotpotQA: Multi-hop reasoning questions that require gathering information from multiple sources
- RuleTaker: Logical reasoning environments with varying complexity levels (d0 through d5), where agents must apply rules to derive conclusions
Reasoning Gym
We include all tasks in Reasoning Gym in our package, which could be simply used by calling make(rg:[sub_task_name])
.