Skip to content

The microest optimization wrt pointer tagging #84

Open
@alt-romes

Description

@alt-romes

I had a thought recently about something that affects runtime performance, but possibly in such a way so minimal that it might not even be worth adding to the book.

Nonetheless...

GHC tags pointers to evaluated heap-allocated data types with their tag. So, for example, pointers to heap-allocated Maybes will be tagged with +1 if the allocated-value is Nothing and with +2 if it is Just ..., (and with +0 if it is a thunk, in which case we evaluate the thunk which will return a tagged pointer (with either +1 or +2)).

{-# LANGUAGE MagicHash #-}
import Prelude(print)
import GHC.Exts

data Maybe a = Nothing
             |  Just a

{-# NOINLINE f #-}
f x = case x of
        Nothing -> 1#
        Just _    -> 2#

main = print (I# (f (Just 4)))

Then, to pattern match on x in the body of f, we don't need to dereference the pointer to the heap and check the constructor info, for we can simply (when x is evaluated) look at its tag.
The above example compiles to the following cmm code with ghc -dno-typeable-binds -fforce-recomp -ddump-opt-cmm -ddump-to-file X.hs:

f_rib_entry() { //  [R2]
      ...
      cOo: // global
          I64[Sp - 8] = block_cOf_info;
          R1 = _sO4::P64;
          Sp = Sp - 8;
          // Check if tag is 0
          if (R1 & 7 != 0) goto cOf; else goto cOg;
      cOg: // tag==0 -> evaluate thunk
          call (I64[R1])(R1) returns to cOf, args: 8, res: 8, upd: 8;
      cOf: // tag/=0 -> pattern match through constructor tag
          _sO5::P64 = R1;
          _cOl::P64 = _sO5::P64 & 7;
          // Check if tag is 1 or 2
          if (_cOl::P64 != 1) goto cOk; else goto cOj;
      cOk: // Tag is 2, con is Just, return unboxed 2
          R1 = 2;
          Sp = Sp + 8;
          call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
      cOj: // Tag is 1, con is Nothing, return unboxed 1
          R1 = 1;
          Sp = Sp + 8;
          call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
    }
}

The question is what happens when we have more than 7 constructors (the tag can only be in the last 3 bits of the pointer, and 0 means unevaluated data type).
In the case of a datatype with more than 7 constructors, a tag of 7 means that the constructor index is 7 or higher, and therefore we have to dereference the pointer and look at the constructor info. The other tags from 0-6 retain their meaning.

Therefore, the most common constructors of a datatype with >7 constructors are better off being in the first 6 constructors, and the least common should come later. So if you have a datatype of which 80% of the times is constructed with some Con1, it is more performant to have it be one of the first 6.

This could possibly matter in really, REALLY, tight loops 😝, though I would love to see such a case. Or a gigantic code base in which all the useful constructors are defined last.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions