Description
I had a thought recently about something that affects runtime performance, but possibly in such a way so minimal that it might not even be worth adding to the book.
Nonetheless...
GHC tags pointers to evaluated heap-allocated data types with their tag. So, for example, pointers to heap-allocated Maybes will be tagged with +1
if the allocated-value is Nothing
and with +2
if it is Just ...
, (and with +0
if it is a thunk, in which case we evaluate the thunk which will return a tagged pointer (with either +1 or +2)).
{-# LANGUAGE MagicHash #-}
import Prelude(print)
import GHC.Exts
data Maybe a = Nothing
| Just a
{-# NOINLINE f #-}
f x = case x of
Nothing -> 1#
Just _ -> 2#
main = print (I# (f (Just 4)))
Then, to pattern match on x
in the body of f
, we don't need to dereference the pointer to the heap and check the constructor info, for we can simply (when x is evaluated) look at its tag.
The above example compiles to the following cmm code with ghc -dno-typeable-binds -fforce-recomp -ddump-opt-cmm -ddump-to-file X.hs
:
f_rib_entry() { // [R2]
...
cOo: // global
I64[Sp - 8] = block_cOf_info;
R1 = _sO4::P64;
Sp = Sp - 8;
// Check if tag is 0
if (R1 & 7 != 0) goto cOf; else goto cOg;
cOg: // tag==0 -> evaluate thunk
call (I64[R1])(R1) returns to cOf, args: 8, res: 8, upd: 8;
cOf: // tag/=0 -> pattern match through constructor tag
_sO5::P64 = R1;
_cOl::P64 = _sO5::P64 & 7;
// Check if tag is 1 or 2
if (_cOl::P64 != 1) goto cOk; else goto cOj;
cOk: // Tag is 2, con is Just, return unboxed 2
R1 = 2;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
cOj: // Tag is 1, con is Nothing, return unboxed 1
R1 = 1;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}
}
The question is what happens when we have more than 7 constructors (the tag can only be in the last 3 bits of the pointer, and 0 means unevaluated data type).
In the case of a datatype with more than 7 constructors, a tag of 7 means that the constructor index is 7 or higher, and therefore we have to dereference the pointer and look at the constructor info. The other tags from 0-6 retain their meaning.
Therefore, the most common constructors of a datatype with >7 constructors are better off being in the first 6 constructors, and the least common should come later. So if you have a datatype of which 80% of the times is constructed with some Con1
, it is more performant to have it be one of the first 6.
This could possibly matter in really, REALLY, tight loops 😝, though I would love to see such a case. Or a gigantic code base in which all the useful constructors are defined last.