The microest optimization wrt pointer tagging

I had a thought recently about something that affects runtime performance, but possibly in such a way so minimal that it might not even be worth adding to the book.

Nonetheless...

GHC tags pointers to evaluated heap-allocated data types with their tag. So, for example, pointers to heap-allocated Maybes will be tagged with `+1` if the allocated-value is `Nothing` and with `+2` if it is `Just ...`, (and with `+0` if it is a thunk, in which case we evaluate the thunk which will return a tagged pointer (with either +1 or +2)).
```haskell
{-# LANGUAGE MagicHash #-}
import Prelude(print)
import GHC.Exts

data Maybe a = Nothing
             |  Just a

{-# NOINLINE f #-}
f x = case x of
        Nothing -> 1#
        Just _    -> 2#

main = print (I# (f (Just 4)))
```
Then, to pattern match on `x` in the body of `f`, we don't need to dereference the pointer to the heap and check the constructor info, for we can simply (when x is evaluated) look at its tag.
The above example compiles to the following cmm code with `ghc -dno-typeable-binds -fforce-recomp -ddump-opt-cmm -ddump-to-file X.hs`:
```c
f_rib_entry() { //  [R2]
      ...
      cOo: // global
          I64[Sp - 8] = block_cOf_info;
          R1 = _sO4::P64;
          Sp = Sp - 8;
          // Check if tag is 0
          if (R1 & 7 != 0) goto cOf; else goto cOg;
      cOg: // tag==0 -> evaluate thunk
          call (I64[R1])(R1) returns to cOf, args: 8, res: 8, upd: 8;
      cOf: // tag/=0 -> pattern match through constructor tag
          _sO5::P64 = R1;
          _cOl::P64 = _sO5::P64 & 7;
          // Check if tag is 1 or 2
          if (_cOl::P64 != 1) goto cOk; else goto cOj;
      cOk: // Tag is 2, con is Just, return unboxed 2
          R1 = 2;
          Sp = Sp + 8;
          call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
      cOj: // Tag is 1, con is Nothing, return unboxed 1
          R1 = 1;
          Sp = Sp + 8;
          call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
    }
}
```

The question is what happens when we have more than 7 constructors (the tag can only be in the last 3 bits of the pointer, and 0 means unevaluated data type).
In the case of a datatype with more than 7 constructors, a tag of 7 means that the constructor index is 7 or higher, and therefore we have to dereference the pointer and look at the constructor info. The other tags from 0-6 retain their meaning.

Therefore, **the most common constructors of a datatype with >7 constructors are better off being in the first 6 constructors**, and the least common should come later. So if you have a datatype of which 80% of the times is constructed with some `Con1`, it is more performant to have it be one of the first 6.

This could possibly matter in really, REALLY, tight loops 😝, though I would love to see such a case. Or a gigantic code base in which all the useful constructors are defined last.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The microest optimization wrt pointer tagging #84

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The microest optimization wrt pointer tagging #84

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions