Current Framework Version: 0.1.0
PCG Benchmark is a framework to test and compare different content generators over different problems. The framework follows the same design methodology of OpenAI Gym which makes it easy to use, test, and expand for new problems.
This repo contains the framework that is used in our paper: https://arxiv.org/abs/2503.21474. Please check the following repo for the experiments: https://github.com/amidos2006/benchmark_experiments. To cite the framework or the paper, use the following bibliography
@inproceedings{khalifa2025pcgbenchmark,
title={The Procedural Content Generation Benchmark: An Open-source Testbed for Generative Challenges in Games},
author={Khalifa, Ahmed and Gallota, Roberto and Barthet, Matthew and Liapis, Antonios and Togelius, Julian and Yannakakis, Georgios N.},
booktitle={Foundations of Digital Games Conference},
year={2025},
publisher={ACM}
}
There is two ways to install this repo, directly from github or by cloning locally then installing it
- To install the package from github, run
pip install git+https://github.com/amidos2006/pcg_benchmark.git
. (Don't worry it will install all the dependencies automatically which arenumpy
andPIL
). - If everything goes fine, the PCG Benchmark is ready to be used. Check the following section on how to use it.
- Clone this repo to your local machine.
- To install the package, run
pip install -e .
from inside the repo folder. (Don't worry it will install all the dependencies automatically which arenumpy
andPIL
). - If everything goes fine, the PCG Benchmark is ready to be used. Check the following section on how to use it.
The PCG Benchmark follows the same design consideration of OpenAI Gym in its simplicity and ease of usage. The PCG Benchmark is just an interface for a multitude of problems. Each problem has its own representation, control parameters, and functions to test quality, diversity, and controllability. To learn more about the problems check the following problems section. Each problem has a problem name that can be used to construct the environment. The problem name usually follows the following pattern.
{problem_name}-{variant_name}-{version}
where the default version is always {problem_name}-{version}
. For example, the Zelda problem has the following name zelda-v0
. It has two variants, one with lots of enemies called zelda-enemie-v0
and one with a large map size called zelda-large-v0
.
To construct a problem to solve, you need to import the framework pcg_benchmark
and you can use the make
function to create an environment. The make
function takes the environment name and it returns a problem environment. For example, to create an environment for zelda-v0
follow the following code:
import pcg_benchmark
env = pcg_benchmark.make('zelda-v0')
The framework also provides two important functions list
and register
. The list
function returns all the problems that exist in the framework. register
on the other hand is used to register a new problem with the framework. For more details on how to create a new problem, look into the problems readme.md.
The created problem environment provides multiple functions that can be used to test if content passes the quality
, diversity
, and controlability
criteria. All of these function can be called directly from one function called evaluate
. The evaluate
function can take either one input (contents
to evaluate) which in that case only returns quality
and diversity
or two inputs (contents
to evaluate and controls
to evaluate against for controllability) which in that case return all the metrics (quality
, diversity
, and controlability
). The environment also provides a function to get details about content called info
and spaces similar to OpenAI Gym Spaces. There are two spaces content_space
which defines the content search space (representation space of all the content) and control_space
defines parameters that can be used to control the generated content and their possible values. You can use directly sample
function from the space to sample random content and control parameters from the different spaces. You can also use range
function from the space to find the minimum and maximum values for the contents and control parameters. Finally, you can render the content using render
function. Here is an example of getting a random content 100 content and evaluating it then rendering it.
import pcg_benchmark
# create a problem environment for the zelda problem
env = pcg_benchmark.make('zelda-v0')
# generate 100 random content from the content_space
contents = [env.content_space.sample() for _ in range(100)]
# geberate 1 random control parameters from the control_space to evaluate all the content against
control = env.control_space.sample()
# evaluate contents and control from quality, diversity, controlability metrics
# quality is the percentage of the 100 levels that has passed the quality criteria
# diversity is the percentage of the 100 levels that are different from each other
# controlability is the percentage of the 100 levels that fits with the control parameters
# details is a dictionary with "quality", "diversity", and "controlability" keys that have float array of 100 numbers between 0 and 1 which represents how close to solve the problem
# infos is an array of dictionaries that contain details about each content
quality, diversity, controlability, details, infos = env.evaluate(contents, control)
# generate images for each content
imgs = env.render(contents)
This example use one control parameter for all the content but you can sample more than one parameter for each content if that is what you are looking for by changing the control line to sample one control for each content and then use the evaluate function normally:
# geberate 100 random control parameters from the control_space to evaluate each content against one
controls = [env.control_space.sample() for _ in range(100)]
# evaluate contents and control from quality, diversity, controlability metrics
quality, diversity, controlability, details, infos = env.evaluate(contents, controls)
If you want to test only one thing like quality
, diversity
, or controlability
. You can use the corresponding function with the same name. These functions can take either 1 content, an array of content, 1 info dictionary, or an array of info dictionaries. info
function is very useful as it generates all the useful information for the other functions. You can cache these values and use them instead of content so it doesn't need to do exhaustive calculations or simulations (It can be used for optimization). Finally, if you want to fix the random number generator used, please use seed
function and provide a seed value to make sure that all the random number generators are set.
The framework supports multitude of problems that can be found at pcg_benchmark.probs
. To understand more about each problem go to any of their folders and check the README files. Here a list of the current 12 problems:
Name | Description | Problem Name | Example |
---|---|---|---|
Arcade Rules | create a small rule set for a simple arcade game | arcade-v0 |
![]() |
Binary | create a simple 2D fully connected maze | binary-v0 |
![]() |
Building | create an isometric building of using different falling cubes | building-v0 |
![]() |
Dangerous Dave | create a playable dangeroud dave level | ddave-v0 |
![]() |
Elimination | create a playable elimination word game level | elimination-v0 |
![]() |
Isaac | create a playable binding of isaac dungeon | isaac-v0 |
![]() |
Lode Runner | create a playable lode runner level using 2x2 tile patterns | loderunner-v0 |
![]() |
Lode Runner Tile | create a playable lode runner level using single tiles | loderunnertile-v0 |
![]() |
MiniDungeons | create a puzzle roguelike playable dungeon for mini dungeons | mdungeons-v0 |
![]() |
Super Mario Bros | create a playable super mario bros level using vertical slices | smb-v0 |
![]() |
Super Mario Bros Tile | create a playable super mario bros level using single tiles | smbtile-v0 |
![]() |
Sokoban | create a playable sokoban level | sokoban-v0 |
![]() |
Talakat | create a bullet pattern for bullet hell games | talakat-v0 |
![]() |
Zelda | create a simple playable arcade dungeon crawler game | zelda-v0 |
![]() |
To understand how to add new problems to the framework, please check the main README.md in the probs folder.
You can check the example generators from our paper that you can find in the following repository https://github.com/amidos2006/benchmark_experiments inside the generators folder. It contains 3 different generators that were tested in the paper random
(Random Search), es
(Mu + Lambda Evolution Strategy), and ga
(Genetic Algorithm). To create any generator other than these, you usually need a way to navigate the search space.
In optimization algorithms, this can be through crossover or mutation. The spaces class has a global function that could help with moving in the representation space called contentSwap
. The function takes two content and probability value to generate a new content that combines between both. If the probability is 50% then you have a uniform crossover function. For mutation or small change, the same function can be used for that. Make sure the second content is a new random content and the probability is low like 0.1 or 0.05. This will create a uniform mutation function. If you want to limit the number of swaps, you can set maxSwaps
to any value above 0, and if you want to seed the random number generator, please set seed
parameter to any value. Here is an example of both crossover function and mutation function.
from pcg_benchmark.spaces import contentSwap
# uniform cross over
def uniform_crossover(prob_env, content1, content2):
return contentSwap(content1, content2, 0.5)
# 5% uniform mutation by default
def uniform_mutation(prob_env, content, percentage=0.05):
return contentSwap(content, prob_env.content_space.sample(), percentage)
The other needed function to create a generator besides sampling randomly, and discovering a neighboring content is to evaluate the content concerning quality
, diversity
, or controlability
. We recommend for every content you generate the info data with it and use it instead of the content for all the calculations.
def fitness(env, info):
return env.quality(info)
Here is a full example of a simple mu+lambda ES algorithm with mu=lambda=50 to generate content for the zelda-v0
problem for 100 generations.
import pcg_benchmark
from pcg_benchmark.spaces import contentSwap
# uniform mutation
def uniform_mutation(prob_env, content, percentage=0.05):
return contentSwap(content, prob_env.content_space.sample(), percentage)
# calculate the fitness based on individual (content, info)
def fitness(env, individual):
return env.quality(individual[1])
# create the problem environment for zelda
env = pcg_benchmark.make("zelda-v0")
# create a random starting population of 50 individuals (content, info)
content = [env.content_space.sample() for _ in range(50)]
population = [(c, env.info(c)) for c in content]
# run for 100 generations
for _ in range(100):
# create a new children from each indvidual in the population
new_content = [uniform_mutation(prob_env, c) for c, _ in population]
# create the new population of size mu+lambda (50+50)
new_population = population + [(content, env.info(content)) for content in new_content]
# kill the weakest 50 individuals
new_population.sort(key=lambda c: fitness(env, c), reverse=True)
population = new_population[:50]
# stop if the best indvidual solve the problem
if fitness(env, population[0]) >= 1:
break
Finally, if you want to evolve content assuming the content is always a flat float array. the Space
class have two helpful functions towards that. The sampleFlat
and restructure
. The sampleFlat
will return a float array that represents the content instead of the structured shape, while restructure
takes a float array and make it back to the content shape and fix any wrong values in it.
Thanks to Kenny for creating 1-Bit Pack which was used for most of the 12 problems in the benchmark. Even the ones that didn't use it were inspired by the color palette used in that pack.
Bug reports and pull requests are welcome on GitHub at https://github.com/amidos2006/pcg_benchmark/.
This code is available as open source under the terms of the MIT License.