Glyph ID Handling

**This is a migrated copy of the original feature request here: facebookresearch/nle#21**

Currently, each dungeon tile (ignoring the char/color/specials observation that's also available) is an int16 between 0 and nethack.MAX_GLYPH == 5976. We use an embedding lookup table of that size embedding_dim == 32. That's 5976 * 32 == 191232 floating points, or 191232 * 16 == 3059712 bits, or ~0.3MB. That doesn't seem too much but there's some https://github.com/pytorch/pytorch/issues/24912. Also, it does not give the agent a cue that certain ids (e.g., dog and large dog) are more related than others (large dog vs wall).

The way these glyphs are organized is that first come all the monsters (NUMMONS many, which is 381), then pets (again NUMMONS many because in theory every monster can be tame, then a single glyph for an invisible monster (GLYPH_INVIS_OFF, which is 762), then a glyph for each "detected" monster (again NUMMONS many). For some obscure reason, then there's corpses, which are not monsters (but there's NUMMONS many), and then there's ridden monsters, which are monsters (NUMMONS many). The check glyph_is_monster(glyph) does [this](https://github.com/fairinternal/NetHack/blob/rl/include/display.h#L407):

#define glyph_is_monster(glyph)                            \
    (glyph_is_normal_monster(glyph) || glyph_is_pet(glyph) \
     || glyph_is_ridden_monster(glyph) || glyph_is_detected_monster(glyph))
This makes a list like [i for i in range(nethack.MAX_GLYPH) if nethack.glyph_is_monster(i)] have length nethack.NUMMONS*4 == 1524, but it's not contiguous.

Cf. https://github.com/fairinternal/NetHack/blob/rl/win/rl/helper.cc#L37 for a list of the offsets and take a look at the comment in https://github.com/fairinternal/NetHack/blob/rl/include/display.h#L235 explaining this.

After monsters there's MAXPCHARS == 96 cmap entries for dungeon features, then there's zap beams (NUM_ZAP << 2 == 8 << 2 == 32 many). Then there's NUMMONS << 3 == 3048 (!) "swallow" glyphs. That's a lot for stuff that basically never happens to our agents. Then there's WARNCOUNT == 6 warning glyphs and finally NUMMONS statue glyphs.

As a graphic representation, the glyph ids are:

MMMMMMPPPPPPDDDDDD%%%%%RRRRRROOOOOOOCXZSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSTTTTTT
MonstsPets--DetectBody-RiddenObjectsCXZSwaaaaaaalllllllllllooooooooooowwww-----------Statue

Where

glyph_labels = {
    GLYPH_MON_OFF: "M",  # 6.38%
    GLYPH_PET_OFF: "P",  # 6.38%
    GLYPH_INVIS_OFF: " ",  # 0.02%
    GLYPH_DETECT_OFF: "D",  # 6.38%
    GLYPH_BODY_OFF: "%",  # 6.38%
    GLYPH_RIDDEN_OFF: "R",  # 6.38%
    GLYPH_OBJ_OFF: "O",  # 7.58%
    GLYPH_CMAP_OFF: "C",  # 1.46%
    GLYPH_EXPLODE_OFF: "X",  # 1.05%
    GLYPH_ZAP_OFF: "Z",  # 0.54%
    GLYPH_SWALLOW_OFF: "S",  # 51.00%
    GLYPH_WARNING_OFF: "W",  # 0.10%
    GLYPH_STATUE_OFF: "T",  # 6.38%
    MAX_GLYPH: "-",
}
More than half of all glyph ids are swallow!

We should rethink the featurization of the glyph ids.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Glyph ID Handling #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Glyph ID Handling #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions