Switch hashmap implementation to open addressing #301

icgmilk · 2025-09-12T04:47:18Z

Previously, the chaining implementation incurred pointer chasing and per-node allocations, leading to poor cache locality and longer chains under collisions.

Open addressing uses a single contiguous array and removes per-node allocations. We cap the load factor at 50% to keep probe lengths short and reduce clustering.

This change trades ~320 kB of memory for ~30 ms faster execution, primarily due to better cache locality.

Validated with shecc’s test suite (make check).

Notice that current implementation lacks delete state as there is no in-place delete use case.

Performance analysis for out/shecc src/main.c

	Chaining(LF 75%)	Linear Probing(LF 50%)
Maximum resident set size (kbytes)	1206400	1206720
System time	0.33	0.34
main() execution time	1.114 s	1.084 s

Using /usr/bin/time -v and uftrace to benchmark memory usage and execution time.

Chaining

/usr/bin/time -v

        Command being timed: "./out/shecc src/main.c"
        User time (seconds): 0.11
        System time (seconds): 0.33
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.44
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1206400
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 301391
        Voluntary context switches: 1
        Involuntary context switches: 8
        Swaps: 0
        File system inputs: 0
        File system outputs: 704
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

uftrace

   Total time   Self time       Calls  Function
  ==========  ==========  ==========  ====================
    1.114  s    5.873 us           1  main
  633.083 ms    0.867 us           1  parse
  623.992 ms   66.924 us           1  parse_internal
  620.610 ms  251.757 us         526  read_global_statement
  617.063 ms  137.666 us         463  read_global_decl
  606.524 ms  125.851 us         370  read_func_body
  599.514 ms    1.122 ms        1540  read_code_block
  594.806 ms    6.983 ms        9511  read_body_statement
  412.373 ms    2.901 ms       13021  read_expr
  379.043 ms  898.769 us        1376  handle_if_statement
  342.273 ms   20.672 ms      176786  arena_calloc
  308.542 ms  308.542 ms      176786  memset
  307.577 ms    3.275 ms       36367  bb_create
  208.327 ms  342.770 us        3682  read_func_call
  206.120 ms    1.958 ms        3686  read_func_parameters
  198.791 ms    9.433 ms       15729  read_expr_operand
  133.820 ms    1.210 ms           1  liveness_analysis
  127.500 ms    1.952 ms        2581  read_body_assignment
  121.725 ms    9.599 ms       34028  recompute_live_out
  103.999 ms   58.735 ms       43644  compute_live_in
   83.013 ms   54.940 us          98  handle_while_statement
   78.150 ms    0.730 us           1  ssa_build
   73.236 ms   36.065 ms      574269  lex_accept
   67.422 ms    5.402 ms           1  optimize
   65.316 ms  714.586 us       10549  find_var
   63.436 ms    2.064 ms           1  code_generate
   61.352 ms    7.735 ms       64380  emit_ph2_ir
   56.479 ms   24.992 ms       78465  lex_token_impl
   52.328 ms   30.924 ms       13130  find_local_var
   52.112 ms   17.449 ms       85973  bb_forward_traversal
   50.139 ms    7.962 ms       10298  read_lvalue
   46.384 ms    4.419 ms       82858  emit
   42.225 ms   42.225 ms     1351256  strcmp
   41.966 ms   12.696 ms       82862  elf_write_int
   40.226 ms    9.838 ms           1  reg_alloc
   37.682 ms   15.019 ms      595563  lex_accept_internal
   37.195 ms    1.510 us           1  global_release
   37.148 ms  456.684 us           5  arena_free
   36.705 ms  805.692 us        4830  arena_block_free
   35.959 ms   35.959 ms        9706  free
   33.728 ms    8.973 ms       15325  get_operator
   32.003 ms    2.056 ms       37347  lex_expect
   30.072 ms  160.096 us           1  solve_phi_params
   29.951 ms    2.023 ms       37350  lex_expect_internal
   29.607 ms    6.775 ms       10747  bb_solve_phi_params
   29.417 ms   21.346 ms      332990  strbuf_putc
   27.161 ms    3.766 ms       59100  require_var
   26.829 ms  184.378 us         651  handle_return_statement
   25.885 ms   15.733 ms           1  elf_generate
   24.918 ms   24.918 ms      807189  add_live_in
   24.464 ms   15.048 ms           1  peephole
   24.224 ms   15.077 ms        8312  find_type
   23.232 ms    6.350 ms      110908  hashmap_get
   23.200 ms   23.200 ms      852851  var_check_killed
   21.970 ms   17.679 ms       53735  bb_backward_traversal
   20.657 ms   12.144 ms        3914  find_global_var
   18.932 ms   32.070 us           1  solve_globals
   17.773 ms    3.528 ms       31132  new_name
   16.887 ms   11.501 ms      110931  hashmap_get_node
   16.484 ms    5.193 ms       10747  bb_solve_globals
   16.391 ms    8.025 ms      307165  arena_alloc
   13.040 ms    2.836 ms           1  use_chain_build
   12.441 ms  479.844 us        2308  read_full_var_decl
   10.204 ms    3.605 ms       39268  use_chain_add_tail
   10.108 ms    6.414 ms       48495  const_folding
   10.104 ms    8.868 ms      267909  read_char
    9.988 ms    9.988 ms      357813  fputc
    9.751 ms    2.913 ms        7211  read_preproc_directive
    9.538 ms   59.346 us           1  build_reversed_rpo
    9.090 ms    2.856 ms          35  load_source_file
    8.956 ms    5.436 ms           1  cfg_flatten
    8.744 ms    4.776 ms       10744  bb_reset_and_solve_locals
    8.744 ms  330.606 us         428  finalize_logical
    8.542 ms    8.447 ms      350847  strbuf_extend
    8.484 ms    5.138 ms       45417  refresh
    8.378 ms  746.800 us        4830  arena_block_create
    8.032 ms    6.206 ms       73320  bb_add_ph2_ir
    7.771 ms    2.844 ms       30258  var_add_killed_bb
    7.650 ms    7.650 ms        9691  malloc
    7.583 ms    7.583 ms       43644  merge_live_in
    7.485 ms    4.718 ms       45224  add_insn
    7.373 ms    6.780 ms      246253  lex_peek
    7.341 ms   23.418 us           1  build_rdom
    7.127 ms    1.559 ms       27520  lookup_keyword
    7.064 ms    1.485 ms       27516  find_alias
    6.953 ms    3.811 ms       10747  dce_insn
    6.949 ms  735.574 us       13638  lex_ident
    6.875 ms   73.751 us           1  build_rpo
    6.665 ms    1.216 ms       22455  find_func
    6.251 ms    1.485 ms       13715  lex_ident_internal
    6.188 ms  601.464 us       10412  read_ternary_operation
    6.006 ms  768.786 us        2387  read_inner_var_decl
    5.731 ms  220.847 us         396  read_parameter_list_decl
    5.533 ms    2.137 ms       33933  prepare_operand
    5.392 ms    5.392 ms      217159  is_alnum
    5.295 ms   30.288 us           1  build_df
    5.222 ms  109.621 us         608  read_logical
    5.217 ms   44.863 us           1  build_dom
    5.200 ms   25.578 us           1  build_rdf
    5.150 ms   36.989 us           1  unwind_phi
    4.952 ms    3.059 ms       17857  strbuf_puts
    4.571 ms    3.931 ms           1  build_idom
    4.432 ms    3.384 ms       46974  cse
    4.105 ms  626.805 us        3086  read_numeric_param
    3.811 ms    3.811 ms      106982  check_live_out
    3.714 ms    1.844 ms       26027  gen_name_to
    3.465 ms    3.465 ms      114404  hashmap_hash_index
    3.408 ms  598.971 us        7014  intern_string
    3.247 ms  813.598 us       15084  find_macro
    3.225 ms    3.225 ms       10747  bb_build_df
    3.203 ms    2.250 ms       35852  rename_var
    3.142 ms    3.142 ms       48495  dce_init_mark
    2.991 ms    2.289 ms       26100  prepare_dest
    2.941 ms    1.000 ms       17485  load_var
    2.836 ms    1.310 ms       10377  spill_live_out
    2.825 ms    1.524 ms       10747  bb_build_rdom
    2.816 ms  679.915 us        5060  spill_alive
    2.768 ms    1.851 ms       10747  bb_build_dom
    2.668 ms  574.648 us       10713  find_constant
    2.636 ms    1.499 ms       10747  bb_unwind_phi
    2.456 ms  409.389 us        1103  read_literal_param
    2.288 ms    2.288 ms       84375  skip_whitespace
    2.286 ms    1.054 ms       10261  spill_var
    2.285 ms    2.285 ms       56745  bb_add_killed_var
    2.161 ms  237.846 us        4147  require_typed_var
    2.064 ms  806.155 us           1  dce_sweep
    2.034 ms    1.039 ms           1  solve_phi_insertion
    1.953 ms    1.180 ms        4193  find_member
    1.934 ms   35.434 us         137  handle_address_of_operator
    1.887 ms  267.060 us           3  cppd_control_flow_skip_lines
    1.870 ms    1.870 ms       26027  __sprintf_chk
    1.842 ms    7.629 us           2  parse_array_init
    1.769 ms    1.769 ms       64010  update_elf_offset
    1.701 ms  126.526 us        2232  lex_token
    1.685 ms    1.157 ms           1  build_r_idom
    1.664 ms   19.622 us          28  parse_struct_field_init
    1.656 ms    1.656 ms       64317  add_existed_ph2_ir
    1.619 ms    1.179 ms       17965  __lw
    1.528 ms    1.528 ms       36209  update_consumed
    1.501 ms    1.501 ms           1  arm_lower
    1.452 ms   82.580 us         396  add_func
    1.431 ms    1.043 ms       16046  __mov_i
    1.360 ms    1.360 ms       38249  strcpy
    1.357 ms    1.306 ms       47436  insn_fusion
    1.318 ms    1.318 ms       48495  mark_const
    1.301 ms    1.301 ms       19305  rdom_connect
    1.256 ms    1.256 ms       10747  is_block_unreachable
    1.254 ms  726.873 us        8453  resize_var
    1.224 ms    1.224 ms       47436  triple_pattern_optimization
    1.223 ms  892.273 us       13599  __sw
    1.203 ms    1.203 ms       45990  redundant_move_elim
    1.197 ms    1.197 ms       47116  eval_const_arithmetic
    1.188 ms    1.188 ms       46369  find_in_regs
    1.186 ms    1.186 ms       45989  eliminate_load_store_pairs
    1.178 ms    1.178 ms       46978  eval_const_unary
    1.138 ms    1.138 ms       45990  strength_reduction
    1.137 ms  430.191 us        3403  append_unwound_phi_insn
    1.106 ms    1.106 ms       45990  comparison_optimization
    1.099 ms    1.099 ms       45990  bitwise_optimization
    1.099 ms    1.099 ms       45990  algebraic_simplification
    1.088 ms    1.088 ms         370  simple_sccp
    1.073 ms    1.073 ms       38294  get_stack_top_subscript_var
    1.047 ms    1.047 ms       44040  is_cse_candidate
    1.044 ms  326.601 us        3403  append_phi_operand
    1.039 ms  213.020 us        1973  hashmap_put
    1.009 ms    1.009 ms       32374  strncmp
  948.133 us  948.133 us       17816  memcpy
  933.284 us  400.138 us        9072  fn_add_global
  932.521 us  219.092 us        3877  arena_alloc_symbol
  931.329 us   22.150 us         372  arena_alloc_func
  917.164 us  917.164 us       20356  dom_connect
  893.859 us  893.859 us       37659  vreg_clear_phys
  889.200 us  889.200 us       16018  fgets
  872.570 us  872.570 us       36283  opstack_push
  870.687 us  870.687 us       36283  opstack_pop
  848.910 us   39.826 us         107  handle_single_dereference
  846.182 us  846.182 us       18227  strncpy
  809.591 us  809.591 us       31132  pop_name
  788.218 us  788.218 us       23860  strlen
  780.987 us  780.987 us       31992  arm_transfer
  758.171 us  758.171 us       31299  __mov
  679.540 us  357.995 us        2884  insert_phi_insn
  642.894 us  642.894 us       10747  bb_build_rdf
  640.779 us  640.779 us        6110  intersect
  638.235 us  391.500 us        1973  hashmap_node_new
  632.189 us   98.061 us        1103  write_symbol
  616.890 us  417.446 us        6599  extend_liveness
  604.697 us  604.697 us       23957  get_size
  583.533 us  583.533 us       10567  add_live_gen
  567.075 us  413.413 us        6292  __add_r
  564.439 us  564.439 us         370  optimize_constant_casts
  528.511 us  528.511 us        6570  reverse_intersect
  506.020 us  106.645 us        1635  add_symbol
  500.465 us  500.465 us       19337  track_var_use
  492.937 us   53.741 us           1  libc_generate
  468.617 us  468.617 us       13465  memcmp
  439.196 us   70.687 us         797  __c
  431.147 us  297.428 us       11950  perform_side_effect
  407.251 us   31.192 us          64  handle_sizeof_operator
  385.449 us  385.449 us       13608  bb_connect
  381.015 us  380.295 us       15819  find_macro_param_src_idx
  376.940 us  376.940 us        9516  __strcpy_chk
  373.667 us   53.780 us         266  read_char_param
  369.566 us  260.690 us        2113  add_block
  364.517 us   62.093 us        1111  elf_write_str
  315.800 us  315.800 us        4705  var_check_in_scope
  305.067 us   29.750 us         184  add_constant
  289.084 us  211.198 us        3215  __teq
  283.018 us  283.018 us       10747  bb_reverse_reversed_index
  282.360 us   42.313 us         311  truncate_unchecked
  278.235 us  278.235 us       10747  bb_build_rpo
  275.439 us  275.439 us       10747  bb_build_reversed_rpo
  264.791 us  264.791 us       10747  bb_reverse_index
  264.414 us  264.414 us       10747  bb_index_reversed_rpo
  259.675 us  259.675 us       10747  bb_index_rpo
  259.547 us    0.556 us           4  read_indirect_call
  224.740 us   83.681 us        1586  __zero
  199.402 us  145.334 us        2235  __mov_r
  177.863 us   64.923 us        1138  elf_write_byte
  170.777 us  120.051 us        1563  is_numeric
  158.970 us    3.855 us          65  read_partial_var_decl
  152.359 us  138.127 us          15  arena_free_trailing_blocks
  145.487 us   17.391 us          72  add_alias
  144.538 us  105.983 us        1594  __cmp_r
  143.794 us    5.917 us          17  read_global_assignment
  134.327 us    6.243 us          28  compute_element_address
  124.396 us   68.307 us           6  hashmap_rehash
  105.948 us  105.948 us        4435  __movw
  105.651 us  105.651 us        4435  __movt
   89.998 us    0.568 us           2  compact_all_arenas
   81.729 us   81.729 us          36  fopen
   73.432 us   53.548 us         370  add_ph2_ir
   67.343 us   67.343 us          36  fclose
   66.514 us    6.065 us         107  require_deref_var
   63.813 us    0.884 us           3  compact_arenas_selective
   56.768 us   41.457 us         634  __add_i
   55.928 us    5.423 us          97  require_ref_var
   51.010 us   51.010 us        2151  is_fusible_insn
   50.392 us    3.746 us           1  global_init
   47.697 us    1.987 us           7  skip_macro_body
   47.494 us   13.005 us          10  elf_write_blk
   46.781 us    7.714 us          56  compute_field_address
   46.348 us   33.444 us         536  __sub_r
   46.158 us    0.325 us           4  read_constant_expr
   46.016 us    1.788 us           1  elf_generate_sections
   45.833 us    3.581 us           5  read_constant_infix_expr
   44.768 us    8.113 us         131  lookup_directive
   43.023 us   43.023 us        1492  get_operator_prio
   42.744 us   42.744 us        1667  is_pointer_operation
   42.190 us    1.949 us           8  strbuf_free
   41.778 us   41.778 us        1578  arm_get_cond
   37.882 us   27.636 us         424  __cmp_i
   30.729 us    1.505 us           7  add_macro
   28.706 us    7.401 us          27  find_best_spill
   21.305 us   15.501 us         184  calculate_spill_cost
   19.134 us   13.910 us         214  __sb
   18.977 us   13.766 us         214  __lb
   18.027 us   13.610 us         184  arena_alloc_constant
   17.868 us    0.404 us           7  arena_alloc_macro
   15.978 us    1.378 us           5  read_constant_expr_operand
   15.203 us    0.962 us           5  arena_init
   14.377 us   14.377 us         582  is_hex
   13.991 us   10.059 us         163  __rsb_i
   13.817 us    1.669 us           8  strbuf_create
   13.710 us   13.710 us          15  calloc
   13.703 us    3.538 us          17  read_primary_constant
   12.938 us   12.938 us         548  align_size
   10.742 us    0.739 us           1  elf_generate_header
   10.623 us    2.469 us           9  hashmap_create
    8.253 us    0.684 us           1  lex_init_keywords
    8.019 us    8.019 us         301  size_var
    7.600 us    1.123 us           7  promote_unchecked
    7.282 us    0.633 us           4  add_named_type
    7.102 us    5.364 us          72  arena_alloc_alias
    7.078 us    1.378 us           9  arena_alloc_traversal_args
    6.578 us    6.578 us         262  promote
    6.226 us    1.568 us          23  hashmap_contains
    5.664 us    0.483 us           1  lex_init_directives
    5.160 us    3.770 us          57  __and_i
    4.660 us    4.660 us         163  read_numeric_constant
    4.381 us    1.155 us           4  handle_pointer_arithmetic
    4.252 us    3.115 us          47  __and_r
    3.705 us    3.705 us          34  __snprintf_chk
    3.674 us    3.674 us          10  initialize_struct_field
    3.596 us    1.827 us           9  hashmap_free
    3.172 us    3.172 us          34  snprintf
    3.050 us    0.340 us           1  elf_add_symbol
    2.412 us    1.759 us          27  __or_r
    2.290 us    1.669 us          26  __eor_r
    2.172 us    2.172 us          91  __mul
    1.760 us    1.760 us          73  add_type
    1.453 us    0.519 us           1  elf_align
    1.099 us    0.185 us           3  check_def
    0.960 us    0.960 us          16  bb_disconnect
    0.848 us    0.100 us           1  lexer_cleanup
    0.358 us    0.358 us           9  round_up_pow2
    0.284 us    0.212 us           3  __mvn_r
    0.170 us    0.170 us           7  __sxtb
    0.119 us    0.119 us           5  get_unary_operator_prio
    0.118 us    0.118 us           4  get_pointer_element_size
                                   1  exit

Open Addressing

/usr/bin/time -v

        Command being timed: "./out/shecc src/main.c"
        User time (seconds): 0.13
        System time (seconds): 0.34
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.47
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1206720
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 301485
        Voluntary context switches: 1
        Involuntary context switches: 6
        Swaps: 0
        File system inputs: 0
        File system outputs: 704
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

uftrace

   Total time   Self time       Calls  Function
  ==========  ==========  ==========  ====================
    1.084  s    9.430 us           1  main
  621.999 ms    0.416 us           1  parse
  607.400 ms   63.173 us           1  parse_internal
  604.310 ms  241.352 us         525  read_global_statement
  601.036 ms  137.410 us         462  read_global_decl
  591.046 ms  132.196 us         369  read_func_body
  584.208 ms    1.027 ms        1538  read_code_block
  579.750 ms    6.693 ms        9508  read_body_statement
  403.099 ms    2.841 ms       13025  read_expr
  372.188 ms  898.344 us        1376  handle_if_statement
  345.385 ms   20.297 ms      177119  arena_calloc
  311.782 ms  311.782 ms      177119  memset
  311.127 ms    3.273 ms       36371  bb_create
  203.578 ms  317.608 us        3678  read_func_call
  201.296 ms    1.799 ms        3682  read_func_parameters
  190.213 ms    8.692 ms       15743  read_expr_operand
  123.988 ms    1.914 ms        2581  read_body_assignment
  122.526 ms    1.036 ms           1  liveness_analysis
  108.669 ms    8.585 ms       34073  recompute_live_out
   93.638 ms   48.265 ms       43706  compute_live_in
   81.809 ms   56.336 us          99  handle_while_statement
   75.290 ms    1.398 us           1  ssa_build
   72.330 ms    8.030 ms           1  optimize
   69.115 ms   33.418 ms      574457  lex_accept
   58.628 ms    1.683 ms           1  code_generate
   58.573 ms  699.686 us       10555  find_var
   56.928 ms    7.145 ms       64450  emit_ph2_ir
   52.893 ms   22.285 ms       78501  lex_token_impl
   52.781 ms   17.339 ms       85941  bb_forward_traversal
   48.642 ms    7.694 ms       10306  read_lvalue
   46.911 ms   25.274 ms       13136  find_local_var
   42.659 ms   42.659 ms     1349898  strcmp
   42.503 ms    3.426 ms       82935  emit
   39.240 ms    1.278 us           1  global_release
   39.193 ms  202.671 us           5  arena_free
   39.078 ms   10.721 ms       82939  elf_write_int
   39.002 ms  634.973 us        4831  arena_block_free
   38.423 ms   38.423 ms        9713  free
   38.204 ms    9.084 ms           1  reg_alloc
   36.218 ms   15.042 ms      595742  lex_accept_internal
   31.466 ms    8.053 ms       15345  get_operator
   29.569 ms    1.622 ms       37365  lex_expect
   28.572 ms  184.974 us           1  solve_phi_params
   28.495 ms   20.324 ms      333296  strbuf_putc
   28.074 ms    6.422 ms       10743  bb_solve_phi_params
   27.951 ms    1.796 ms       37368  lex_expect_internal
   26.196 ms    2.987 ms       59268  require_var
   25.827 ms  180.208 us         651  handle_return_statement
   24.942 ms   24.942 ms      807660  add_live_in
   23.383 ms   23.383 ms      853434  var_check_killed
   23.029 ms    5.595 ms      110912  hashmap_get
   22.791 ms   12.544 ms           1  elf_generate
   22.671 ms   18.085 ms       53715  bb_backward_traversal
   22.414 ms   12.813 ms           1  peephole
   21.894 ms   12.741 ms        8309  find_type
   18.517 ms   10.049 ms        3911  find_global_var
   18.391 ms   44.372 us           1  solve_globals
   17.442 ms   11.877 ms      110935  hashmap_get_node
   16.792 ms    2.962 ms       31216  new_name
   16.615 ms    8.007 ms      305662  arena_alloc
   15.757 ms    4.755 ms       10743  bb_solve_globals
   14.598 ms    4.090 ms          35  load_source_file
   12.287 ms    2.947 ms           1  use_chain_build
   11.746 ms  459.509 us        2304  read_full_var_decl
   10.347 ms    9.083 ms      267735  read_char
   10.098 ms    5.313 ms       10740  bb_reset_and_solve_locals
   10.085 ms   10.085 ms      358069  fputc
    9.689 ms   81.491 us           1  build_reversed_rpo
    9.585 ms    5.721 ms       48568  const_folding
    9.340 ms    2.840 ms       39364  use_chain_add_tail
    9.300 ms    2.840 ms        7208  read_preproc_directive
    8.972 ms    8.882 ms      351139  strbuf_extend
    8.627 ms  715.733 us        4831  arena_block_create
    8.613 ms  329.204 us         428  finalize_logical
    8.213 ms    4.632 ms           1  cfg_flatten
    8.069 ms    4.651 ms       45487  refresh
    7.937 ms    7.937 ms        9693  malloc
    7.825 ms    5.982 ms       73387  bb_add_ph2_ir
    7.804 ms    4.551 ms       17843  strbuf_puts
    7.662 ms   34.438 us           1  build_rdom
    7.500 ms    2.617 ms       30342  var_add_killed_bb
    7.417 ms    4.537 ms       45301  add_insn
    7.389 ms    6.787 ms      246387  lex_peek
    7.382 ms    4.594 ms       10743  dce_insn
    6.824 ms    1.104 ms       22450  find_func
    6.649 ms  639.780 us       13654  lex_ident
    6.557 ms    1.361 ms       27522  find_alias
    6.446 ms    1.353 ms       27526  lookup_keyword
    6.147 ms   74.865 us           1  build_rpo
    6.109 ms  575.602 us       10406  read_ternary_operation
    6.044 ms    1.448 ms       13731  lex_ident_internal
    5.899 ms    5.899 ms       43706  merge_live_in
    5.861 ms  727.821 us        2382  read_inner_var_decl
    5.473 ms   35.749 us           1  build_rdf
    5.448 ms  110.401 us         608  read_logical
    5.438 ms    5.438 ms      216995  is_alnum
    5.365 ms   36.743 us           1  build_df
    5.325 ms   48.512 us           1  unwind_phi
    5.257 ms  209.499 us         395  read_parameter_list_decl
    5.161 ms    1.922 ms       34022  prepare_operand
    5.012 ms   49.665 us           1  build_dom
    4.512 ms    3.855 ms           1  build_idom
    4.377 ms    3.311 ms       47046  cse
    4.083 ms  617.493 us        3097  read_numeric_param
    3.909 ms    3.909 ms      107351  check_live_out
    3.679 ms    1.723 ms       26116  gen_name_to
    3.646 ms  582.780 us        7004  intern_string
    3.522 ms    3.522 ms      115274  hashmap_hash_index
    3.303 ms    3.303 ms       10743  bb_build_df
    3.187 ms  758.267 us       15087  find_macro
    3.071 ms    1.238 ms           1  dce_sweep
    3.048 ms    2.099 ms       35952  rename_var
    3.017 ms    1.737 ms       10743  bb_build_rdom
    2.991 ms    2.279 ms       26178  prepare_dest
    2.816 ms  566.535 us       10718  find_constant
    2.788 ms    2.788 ms       48568  dce_init_mark
    2.785 ms    1.220 ms       10374  spill_live_out
    2.782 ms  801.092 us       17491  load_var
    2.776 ms    1.640 ms       10743  bb_unwind_phi
    2.702 ms    1.889 ms       10743  bb_build_dom
    2.654 ms  619.036 us        5057  spill_alive
    2.363 ms  364.880 us        1102  read_literal_param
    2.331 ms    2.331 ms       84413  skip_whitespace
    2.294 ms    2.294 ms       56912  bb_add_killed_var
    2.205 ms    2.205 ms       36302  update_consumed
    2.173 ms  930.355 us       10266  spill_var
    2.092 ms  224.204 us        4142  require_typed_var
    2.005 ms    1.432 ms           1  build_r_idom
    1.967 ms   36.127 us         138  handle_address_of_operator
    1.961 ms    1.024 ms           1  solve_phi_insertion
    1.956 ms    1.956 ms       26116  __sprintf_chk
    1.871 ms    1.052 ms        4208  find_member
    1.830 ms    1.830 ms       10743  is_block_unreachable
    1.810 ms    1.810 ms       32347  strncmp
    1.804 ms    1.804 ms       64081  update_elf_offset
    1.732 ms    1.732 ms           1  arm_lower
    1.712 ms  217.291 us           3  cppd_control_flow_skip_lines
    1.695 ms    1.695 ms       16005  fgets
    1.690 ms    6.467 us           2  parse_array_init
    1.676 ms    1.676 ms       64387  add_existed_ph2_ir
    1.627 ms    1.186 ms       17978  __lw
    1.606 ms   16.423 us          28  parse_struct_field_init
    1.558 ms    1.558 ms         369  simple_sccp
    1.557 ms   97.905 us        2232  lex_token
    1.471 ms   85.044 us         395  add_func
    1.456 ms    1.064 ms       16092  __mov_i
    1.414 ms    1.414 ms       18212  strncpy
    1.387 ms    1.387 ms       48568  mark_const
    1.350 ms    1.298 ms       47520  insn_fusion
    1.328 ms    1.328 ms       36293  strcpy
    1.313 ms    1.313 ms       47520  triple_pattern_optimization
    1.279 ms    1.279 ms       19294  rdom_connect
    1.259 ms    1.259 ms       23839  strlen
    1.254 ms    1.254 ms       47188  eval_const_arithmetic
    1.250 ms  728.139 us        8446  resize_var
    1.221 ms    1.221 ms       47050  eval_const_unary
    1.220 ms    1.220 ms       46069  redundant_move_elim
    1.219 ms  886.959 us       13591  __sw
    1.207 ms    1.207 ms         369  optimize_constant_casts
    1.204 ms    1.204 ms       46453  find_in_regs
    1.202 ms    1.202 ms       46068  eliminate_load_store_pairs
    1.154 ms    1.154 ms       46069  strength_reduction
    1.135 ms  452.828 us        3399  append_unwound_phi_insn
    1.123 ms    1.123 ms       46069  comparison_optimization
    1.117 ms    1.117 ms       46069  bitwise_optimization
    1.117 ms    1.117 ms       46069  algebraic_simplification
    1.071 ms  268.821 us        1971  hashmap_put
    1.067 ms    1.067 ms       38391  get_stack_top_subscript_var
    1.065 ms    1.065 ms       44104  is_cse_candidate
    1.052 ms    1.052 ms       19801  memcpy
  975.208 us  271.964 us        3399  append_phi_operand
  933.960 us   22.906 us         371  arena_alloc_func
  905.563 us  205.707 us        3869  arena_alloc_symbol
  901.981 us  901.981 us       37748  vreg_clear_phys
  885.572 us  380.938 us        9080  fn_add_global
  882.267 us  882.267 us       36375  opstack_push
  882.142 us  882.142 us       36375  opstack_pop
  835.463 us  835.463 us       31216  pop_name
  812.740 us  812.740 us       20351  dom_connect
  784.135 us  784.135 us       31997  arm_transfer
  782.114 us   38.563 us         107  handle_single_dereference
  764.874 us  764.874 us       31369  __mov
  761.674 us   87.006 us           1  libc_generate
  737.097 us  737.097 us       10743  bb_build_rdf
  674.668 us   89.334 us         797  __c
  658.589 us  658.589 us       10573  add_live_gen
  657.589 us  657.589 us        6111  intersect
  631.020 us  328.077 us        2881  insert_phi_insn
  612.157 us  612.157 us       23978  get_size
  607.076 us   87.685 us        1102  write_symbol
  588.174 us  386.463 us        6592  extend_liveness
  573.114 us  573.114 us        6590  reverse_intersect
  560.350 us  405.456 us        6317  __add_r
  511.865 us  511.865 us       19435  track_var_use
  507.803 us  106.874 us        1631  add_symbol
  469.737 us  284.953 us        1971  arena_strdup
  462.371 us  462.371 us       13404  memcmp
  435.739 us  299.101 us       11942  perform_side_effect
  408.820 us  408.820 us        9502  __strcpy_chk
  400.741 us  400.741 us       13606  bb_connect
  384.924 us  384.180 us       15821  find_macro_param_src_idx
  378.162 us  265.050 us        2110  add_block
  369.141 us   55.842 us        1110  elf_write_str
  368.948 us   29.427 us          63  handle_sizeof_operator
  358.390 us   50.845 us         266  read_char_param
  311.507 us   25.045 us         184  add_constant
  306.670 us  306.670 us        4718  var_check_in_scope
  291.801 us  213.389 us        3216  __teq
  288.882 us  288.882 us       10743  bb_reverse_reversed_index
  276.094 us  276.094 us       10743  bb_build_reversed_rpo
  276.093 us   40.161 us         311  truncate_unchecked
  274.520 us  274.520 us       10743  bb_build_rpo
  266.951 us  266.951 us       10743  bb_reverse_index
  266.718 us  266.718 us       10743  bb_index_reversed_rpo
  261.176 us  261.176 us       10743  bb_index_rpo
  256.129 us    0.567 us           4  read_indirect_call
  229.984 us   99.803 us          11  hashmap_rehash
  211.755 us   67.271 us        1597  __zero
  200.962 us  146.843 us        2230  __mov_r
  169.879 us  114.468 us        1560  is_numeric
  166.554 us  154.122 us          15  arena_free_trailing_blocks
  157.862 us   51.492 us        1136  elf_write_byte
  148.440 us  109.383 us        1605  __cmp_r
  143.993 us    3.666 us          64  read_partial_var_decl
  138.113 us   14.780 us          72  add_alias
  134.734 us    5.628 us          17  read_global_assignment
  107.166 us  107.166 us        4433  __movt
  107.122 us  107.122 us        4433  __movw
  104.535 us    0.856 us           2  compact_all_arenas
  102.887 us  102.887 us          36  fopen
   95.521 us    5.642 us           1  global_init
   80.847 us   80.847 us          36  fclose
   78.597 us   78.597 us          20  calloc
   78.341 us   58.115 us         369  add_ph2_ir
   65.619 us    6.296 us         107  require_deref_var
   63.544 us    0.669 us           3  compact_arenas_selective
   59.536 us    5.519 us          97  require_ref_var
   58.786 us   43.579 us         627  __add_i
   52.045 us   52.045 us        2154  is_fusible_insn
   49.141 us   35.997 us         538  __sub_r
   45.569 us    1.675 us           7  skip_macro_body
   45.168 us    6.397 us         131  lookup_directive
   45.126 us   45.126 us        1589  arm_get_cond
   43.534 us   43.534 us        1683  is_pointer_operation
   42.874 us    0.404 us           4  read_constant_expr
   42.470 us    1.403 us           5  read_constant_infix_expr
   42.349 us    6.650 us          56  compute_field_address
   42.313 us   42.313 us        1490  get_operator_prio
   42.142 us    9.936 us          10  elf_write_blk
   41.795 us    1.331 us           8  strbuf_free
   41.111 us    1.625 us           1  elf_generate_sections
   40.866 us    5.502 us          28  compute_element_address
   37.769 us   27.587 us         418  __cmp_i
   33.742 us    3.625 us           9  hashmap_create
   29.341 us    1.436 us           7  add_macro
   27.105 us    6.383 us          27  find_best_spill
   23.290 us    1.581 us           5  arena_init
   21.797 us    2.716 us           8  strbuf_create
   20.994 us   15.752 us         214  __lb
   20.740 us   15.531 us         214  __sb
   20.722 us   14.781 us         184  calculate_spill_cost
   18.280 us   13.802 us         184  arena_alloc_constant
   17.655 us    0.426 us           7  arena_alloc_macro
   15.086 us    1.334 us           5  read_constant_expr_operand
   14.408 us   14.408 us         582  is_hex
   13.526 us    9.552 us         163  __rsb_i
   12.962 us   12.962 us         541  align_size
   12.750 us    3.066 us          17  read_primary_constant
   10.575 us    2.414 us          23  hashmap_contains
    9.184 us    0.203 us           1  elf_generate_header
    8.406 us    0.601 us           1  lex_init_keywords
    7.986 us    7.986 us         301  size_var
    7.426 us    0.593 us           4  add_named_type
    7.308 us    0.626 us           9  arena_alloc_traversal_args
    7.149 us    0.459 us           1  lex_init_directives
    6.775 us    5.016 us          72  arena_alloc_alias
    6.656 us    6.656 us         261  promote
    6.512 us    6.512 us          34  __snprintf_chk
    5.970 us    0.885 us           7  promote_unchecked
    5.417 us    5.417 us          34  snprintf
    5.145 us    3.770 us          57  __and_i
    5.126 us    5.126 us         163  read_numeric_constant
    4.527 us    3.301 us          50  __and_r
    4.179 us    1.016 us           4  handle_pointer_arithmetic
    3.798 us    1.331 us           9  hashmap_free
    3.451 us    3.451 us          10  initialize_struct_field
    2.767 us    0.357 us           1  elf_add_symbol
    2.568 us    2.568 us         108  __mul
    2.449 us    1.789 us          27  __or_r
    2.439 us    1.801 us          26  __eor_r
    1.759 us    1.759 us          73  add_type
    1.643 us    1.643 us          16  bb_disconnect
    1.228 us    0.452 us           1  elf_align
    1.102 us    0.187 us           3  check_def
    1.034 us    0.115 us           1  lexer_cleanup
    0.561 us    0.561 us           9  round_up_pow2
    0.289 us    0.216 us           3  __mvn_r
    0.176 us    0.176 us           7  __sxtb
    0.122 us    0.122 us           4  get_pointer_element_size
    0.121 us    0.121 us           5  get_unary_operator_prio
                                   1  exit

Summary by cubic

Switch hashmap from separate chaining to open addressing with linear probing, rehashing at 50% load. This improves cache locality and delivers a small, consistent speedup with similar memory usage.

Refactors
- Replace buckets + linked lists with a flat table and a state flag; remove next pointers.
- Rehash doubles capacity and reinserts with linear probing; size tracked correctly.
- hashmap_put probes and updates existing keys; keys stored via arena_strdup; rehash at 50% load.
- Public API remains the same: create, put, get, contains, free.
Performance
- shecc src/main.c: main() time 1.114s → 1.084s (~2.7% faster); MRSS ~1206400kB → ~1206720kB.

src/defs.h

src/globals.c

visitorckw

I noticed the PR description includes some performance numbers, which is great. However, the commit message only vaguely mentions "slightly higher memory usage" and "significantly improved execution time." Could you include the actual measurements in the commit message as well?

src/globals.c

src/defs.h

visitorckw · 2025-09-18T19:00:59Z

Just out of curiosity, how did you test the correctness of this patch?

icgmilk · 2025-09-24T15:14:17Z

Just out of curiosity, how did you test the correctness of this patch?

I validated this patch with shecc's test suite (make check).
I’ll follow up in this PR.

jserv · 2025-10-19T05:18:46Z

I validated this patch with shecc's test suite (make check). I’ll follow up in this PR.

Rebase and validate carefully.

visitorckw

The last sentence of the commit message specifically mentions that delete is not supported. However, IIUC, we've never had a requirement for this functionality. Therefore, I'm not sure why it's so special that we need to explicitly mention something that has never been used or implemented.

Additionally, I saw in the PR's detailed description that your "saved 30 ms" was measured by running ./out/shecc src/main.c. However, the commit message doesn't mention this; it only states it's "30 ms faster." IMHO, without clarifying how this 30 ms was measured, this number is meaningless.

icgmilk · 2025-10-21T16:24:02Z

The last sentence of the commit message specifically mentions that delete is not supported. However, IIUC, we've never had a requirement for this functionality. Therefore, I'm not sure why it's so special that we need to explicitly mention something that has never been used or implemented.

Additionally, I saw in the PR's detailed description that your "saved 30 ms" was measured by running ./out/shecc src/main.c. However, the commit message doesn't mention this; it only states it's "30 ms faster." IMHO, without clarifying how this 30 ms was measured, this number is meaningless.

I mentioned the delete state in my commit message originally because of a comment from cubic-dev-ai, I'll remove that note from the commit message later.

I'll also add the measurement details in the commit message.

Previously, the chaining implementation incurred pointer chasing and per-node allocations, leading to poor cache locality and longer chains under collisions. Open addressing uses a single contiguous array and removes per-node allocations. We cap the load factor at 50% to keep probe lengths short and reduce clustering. This change trades ~320 kB of memory for ~30 ms faster execution (measured with uftrace and usr/bin/time on out/shecc src/main.c), primarily due to better cache locality.

jserv · 2025-10-26T17:39:02Z

Thank @icgmilk for contributing!

jserv requested review from ChAoSUnItY and DrXiao September 12, 2025 04:49

jserv reviewed Sep 12, 2025

View reviewed changes

src/defs.h Outdated Show resolved Hide resolved

jserv reviewed Sep 12, 2025

View reviewed changes