Skip to content

RIF40_NUM_DENOM view performance #150

@peterhambly

Description

@peterhambly

The performance of the RIF40_NUM_DENOM view has been observed to be poor if there are lots of geographies on Postgres, talking 2.8s to run with 6 geographies and 9 pairs.

sahsuland=> \timing on
Timing is on.
sahsuland=> SELECT geography, numerator_table, denominator_table FROM rif40_num_denom;
   geography   |   numerator_table    |     denominator_table
---------------+----------------------+---------------------------
 HALLAND       | HALLAND_CANCER       | HALLAND_POPULATION
 SAHSULAND     | NUM_SAHSULAND_CANCER | POP_SAHSULAND_POP
 STOCKHOLM     | STOCKHOLM_CANCER     | STOCKHOLM_POPULATION
 SWEDEN_COUNTY | SWEDEN_CANCER        | SWEDEN_POPULATION
 SYDVAST       | SYDVAST_TEST_1       | SYDVAST_TEST_1_POPULATION
 SYDVAST       | SYDVAST_TEST_2       | SYDVAST_TEST_2_POPULATION
 SYDVAST       | SYDVAST_TEST_3       | SYDVAST_TEST_3_POPULATION
 SYDVAST       | SYDVAST_TEST_4       | SYDVAST_TEST_4_POPULATION
 USA_2014      | SEER_CANCER          | SEER_POPULATION
(9 rows)

Time: 2840.897 ms

Analysing the query gives:

sahsuland=> EXPLAIN ANALYZE VERBOSE SELECT geography, numerator_table, denominator_table FROM rif40_num_denom;
                                                                                                                                                                               QUERY PLAN                                                                                                                                                                               
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on rif40_num_denom  (cost=29.98..30.00 rows=2 width=274) (actual time=2672.372..2672.373 rows=9 loops=1)
   Output: rif40_num_denom.geography, rif40_num_denom.numerator_table, rif40_num_denom.denominator_table
   ->  Sort  (cost=29.98..29.98 rows=2 width=2322) (actual time=2672.371..2672.372 rows=9 loops=1)
         Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
         Sort Key: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_description
         Sort Method: quicksort  Memory: 27kB
         CTE n
           ->  Nested Loop  (cost=0.00..9.19 rows=1 width=143) (actual time=72.631..836.368 rows=6 loops=1)
                 Output: g_1.geography, g_1.description, n_2.table_name, n_2.description, n_2.automatic, t.theme, t.description
                 Join Filter: ((n_2.theme)::text = (t.theme)::text)
                 Rows Removed by Join Filter: 12
                 ->  Nested Loop  (cost=0.00..8.17 rows=1 width=111) (actual time=72.622..836.311 rows=6 loops=1)
                       Output: g_1.geography, g_1.description, n_2.table_name, n_2.description, n_2.automatic, n_2.theme
                       Join Filter: (rif40_num_denom_validate(g_1.geography, n_2.table_name) = 1)
                       Rows Removed by Join Filter: 36
                       ->  Seq Scan on rif40.rif40_tables n_2  (cost=0.00..6.35 rows=1 width=64) (actual time=12.540..84.342 rows=6 loops=1)
                             Output: n_2.theme, n_2.table_name, n_2.description, n_2.year_start, n_2.year_stop, n_2.total_field, n_2.isindirectdenominator, n_2.isdirectdenominator, n_2.isnumerator, n_2.automatic, n_2.sex_field_name, n_2.age_group_field_name, n_2.age_sex_group_field_name, n_2.age_group_id, n_2.validation_date
                             Filter: ((n_2.isnumerator = 1) AND (n_2.automatic = 1) AND (rif40_is_object_resolvable(n_2.table_name) = 1))
                             Rows Removed by Filter: 14
                       ->  Seq Scan on rif40.rif40_geographies g_1  (cost=0.00..1.03 rows=3 width=47) (actual time=0.003..0.006 rows=7 loops=6)
                             Output: g_1.geography, g_1.description, g_1.hierarchytable, g_1.srid, g_1.defaultcomparea, g_1.defaultstudyarea, g_1.postal_population_table, g_1.postal_point_column, g_1.partition, g_1.max_geojson_digits, g_1.geometrytable, g_1.tiletable, g_1.minzoomlevel, g_1.maxzoomlevel, g_1.adjacencytable, g_1.map_background
                 ->  Seq Scan on rif40.rif40_health_study_themes t  (cost=0.00..1.01 rows=1 width=42) (actual time=0.002..0.003 rows=3 loops=6)
                       Output: t.theme, t.description
         CTE d
           ->  Nested Loop  (cost=0.00..13.17 rows=1 width=61) (actual time=283.976..1809.279 rows=5 loops=1)
                 Output: g_2.geography, d_2.table_name, d_2.description
                 Join Filter: (rif40_num_denom_validate(g_2.geography, d_2.table_name) = 1)
                 Rows Removed by Join Filter: 30
                 ->  Seq Scan on rif40.rif40_tables d_2  (cost=0.00..11.35 rows=1 width=52) (actual time=228.131..1152.516 rows=5 loops=1)
                       Output: d_2.theme, d_2.table_name, d_2.description, d_2.year_start, d_2.year_stop, d_2.total_field, d_2.isindirectdenominator, d_2.isdirectdenominator, d_2.isnumerator, d_2.automatic, d_2.sex_field_name, d_2.age_group_field_name, d_2.age_sex_group_field_name, d_2.age_group_id, d_2.validation_date
                       Filter: ((d_2.isindirectdenominator = 1) AND (d_2.automatic = 1) AND (rif40_auto_indirect_checks(d_2.table_name) IS NULL) AND (rif40_is_object_resolvable(d_2.table_name) = 1))
                       Rows Removed by Filter: 15
                 ->  Seq Scan on rif40.rif40_geographies g_2  (cost=0.00..1.03 rows=3 width=9) (actual time=0.003..0.007 rows=7 loops=5)
                       Output: g_2.geography, g_2.description, g_2.hierarchytable, g_2.srid, g_2.defaultcomparea, g_2.defaultstudyarea, g_2.postal_population_table, g_2.postal_point_column, g_2.partition, g_2.max_geojson_digits, g_2.geometrytable, g_2.tiletable, g_2.minzoomlevel, g_2.maxzoomlevel, g_2.adjacencytable, g_2.map_background
         ->  Unique  (cost=7.56..7.61 rows=2 width=2322) (actual time=2672.354..2672.357 rows=9 loops=1)
               Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
               ->  Sort  (cost=7.56..7.56 rows=2 width=2322) (actual time=2672.353..2672.354 rows=9 loops=1)
                     Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
                     Sort Key: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
                     Sort Method: quicksort  Memory: 27kB
                     ->  Append  (cost=0.00..7.55 rows=2 width=2322) (actual time=356.618..2672.329 rows=9 loops=1)
                           ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..0.07 rows=1 width=2322) (actual time=356.618..2645.697 rows=5 loops=1)
                                 Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, "*SELECT* 1".automatic
                                 ->  Nested Loop  (cost=0.00..0.05 rows=1 width=2320) (actual time=356.617..2645.692 rows=5 loops=1)
                                       Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.theme_name, n.theme_description, d.denominator_table, d.denominator_description, n.automatic
                                       Join Filter: ((n.geography)::text = (d.geography)::text)
                                       Rows Removed by Join Filter: 25
                                       ->  CTE Scan on n  (cost=0.00..0.02 rows=1 width=1726) (actual time=72.636..836.381 rows=6 loops=1)
                                             Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.automatic, n.theme_name, n.theme_description
                                       ->  CTE Scan on d  (cost=0.00..0.02 rows=1 width=712) (actual time=47.330..301.549 rows=5 loops=6)
                                             Output: d.geography, d.denominator_table, d.denominator_description
                           ->  Nested Loop Left Join  (cost=4.50..7.47 rows=1 width=201) (actual time=26.607..26.631 rows=4 loops=1)
                                 Output: nd.geography, g.description, nd.numerator_table, n_1.description, h.theme, h.description, nd.denominator_table, d_1.description, 0
                                 ->  Nested Loop Left Join  (cost=4.37..6.72 rows=1 width=165) (actual time=26.597..26.616 rows=4 loops=1)
                                       Output: nd.geography, nd.numerator_table, nd.denominator_table, g.description, n_1.description, n_1.theme, d_1.description
                                       Join Filter: ((g.geography)::text = (nd.geography)::text)
                                       Rows Removed by Join Filter: 24
                                       ->  Hash Right Join  (cost=4.37..5.66 rows=1 width=127) (actual time=26.588..26.593 rows=4 loops=1)
                                             Output: nd.geography, nd.numerator_table, nd.denominator_table, n_1.description, n_1.theme, d_1.description
                                             Hash Cond: ((d_1.table_name)::text = (nd.denominator_table)::text)
                                             ->  Seq Scan on rif40.rif40_tables d_1  (cost=0.00..1.20 rows=20 width=52) (actual time=0.008..0.008 rows=20 loops=1)
                                                   Output: d_1.theme, d_1.table_name, d_1.description, d_1.year_start, d_1.year_stop, d_1.total_field, d_1.isindirectdenominator, d_1.isdirectdenominator, d_1.isnumerator, d_1.automatic, d_1.sex_field_name, d_1.age_group_field_name, d_1.age_sex_group_field_name, d_1.age_group_id, d_1.validation_date
                                             ->  Hash  (cost=4.36..4.36 rows=1 width=93) (actual time=26.575..26.575 rows=4 loops=1)
                                                   Output: nd.geography, nd.numerator_table, nd.denominator_table, n_1.description, n_1.theme
                                                   Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                   ->  Hash Right Join  (cost=3.07..4.36 rows=1 width=93) (actual time=26.568..26.572 rows=4 loops=1)
                                                         Output: nd.geography, nd.numerator_table, nd.denominator_table, n_1.description, n_1.theme
                                                         Hash Cond: ((n_1.table_name)::text = (nd.numerator_table)::text)
                                                         ->  Seq Scan on rif40.rif40_tables n_1  (cost=0.00..1.20 rows=20 width=62) (actual time=0.008..0.009 rows=20 loops=1)
                                                               Output: n_1.theme, n_1.table_name, n_1.description, n_1.year_start, n_1.year_stop, n_1.total_field, n_1.isindirectdenominator, n_1.isdirectdenominator, n_1.isnumerator, n_1.automatic, n_1.sex_field_name, n_1.age_group_field_name, n_1.age_sex_group_field_name, n_1.age_group_id, n_1.validation_date                                                         ->  Hash  (cost=3.06..3.06 rows=1 width=49) (actual time=26.553..26.553 rows=4 loops=1)
                                                               Output: nd.geography, nd.numerator_table, nd.denominator_table
                                                               Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                               ->  Seq Scan on peter.t_rif40_num_denom nd  (cost=0.00..3.06 rows=1 width=49) (actual time=6.796..26.549 rows=4 loops=1)
                                                                     Output: nd.geography, nd.numerator_table, nd.denominator_table
                                                                     Filter: ((rif40_is_object_resolvable(nd.numerator_table) = 1) AND (rif40_is_object_resolvable(nd.denominator_table) = 1))
                                       ->  Seq Scan on rif40.rif40_geographies g  (cost=0.00..1.03 rows=3 width=47) (actual time=0.002..0.002 rows=7 loops=4)
                                             Output: g.geography, g.description, g.hierarchytable, g.srid, g.defaultcomparea, g.defaultstudyarea, g.postal_population_table, g.postal_point_column, g.partition, g.max_geojson_digits, g.geometrytable, g.tiletable, g.minzoomlevel, g.maxzoomlevel, g.adjacencytable, g.map_background
                                 ->  Index Scan using rif40_health_study_themes_pk on rif40.rif40_health_study_themes h  (cost=0.13..0.74 rows=1 width=42) (actual time=0.003..0.003 rows=1 loops=4)
                                       Output: h.theme, h.description
                                       Index Cond: ((h.theme)::text = (n_1.theme)::text)
 Planning time: 0.838 ms
 Execution time: 2672.509 ms
(83 rows)

Analysis of the output shows that

a) The denominator common table expression took 0.3s to initialise and 1.8s to run

CTE d
  ->  Nested Loop  (cost=0.00..13.17 rows=1 width=61) (actual time=283.976..1809.279 rows=5 loops=1)

And b) that the subsequent denominator/numerator join is OK:

->  Nested Loop  (cost=0.00..0.05 rows=1 width=2320) (actual time=356.617..2645.692 rows=5 loops=1)
       Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.theme_name, n.theme_description, d.denominator_table, d.denominator_description, n.automatic
       Join Filter: ((n.geography)::text = (d.geography)::text)
       Rows Removed by Join Filter: 25
       ->  CTE Scan on n  (cost=0.00..0.02 rows=1 width=1726) (actual time=72.636..836.381 rows=6 loops=1)
             Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.automatic, n.theme_name, n.theme_description
       ->  CTE Scan on d  (cost=0.00..0.02 rows=1 width=712) (actual time=47.330..301.549 rows=5 loops=6)
             Output: d.geography, d.denominator_table, d.denominator_description

This is confirmed by tunning the SQL for d:

sahsuland=> SELECT d1.geography,
sahsuland->            d1.denominator_table,
sahsuland->            d1.denominator_description
sahsuland->       FROM ( SELECT g.geography,
sahsuland(>                     d.table_name AS denominator_table,
sahsuland(>                     d.description AS denominator_description
sahsuland(>                FROM rif40_geographies g,
sahsuland(>                     rif40_tables d
sahsuland(>                   WHERE ((d.isindirectdenominator = 1) AND (d.automatic = 1) AND (rif40_is_object_resolvable(d.table_name) = 1))) d1
sahsuland->      WHERE ((rif40_num_denom_validate(d1.geography, d1.denominator_table) = 1) AND (rif40_auto_indirect_checks(d1.denominator_table) IS NULL))
sahsuland-> ;
   geography   |  denominator_table   |                                 denominator_description
---------------+----------------------+------------------------------------------------------------------------------------------
 SAHSULAND     | POP_SAHSULAND_POP    | population health file
 USA_2014      | SEER_POPULATION      | SEER Population 1972-2013. Georgia starts in 1975, Washington in 1974. 9 States in total
 SWEDEN_COUNTY | SWEDEN_POPULATION    | Sweden Population 19??-20??.
 HALLAND       | HALLAND_POPULATION   | Halland Population 19??-20??.
 STOCKHOLM     | STOCKHOLM_POPULATION | Stockholm Population 19??-20??.
(5 rows)


Time: 1799.655 ms

Furtger tests showed that inner query d1 was efficient and the worst performing function was rif40_auto_indirect_checks.

Potential solutions:

  • Re-write the query so that n and d1 are joined are joined in d and rif40_auto_indirect_checks
    is rewritten to use this join. There is a lot of repeating code in rif40_num_denom_validate and
    rif40_auto_indirect_checks which needs to be called once.
  • Add a new entity RIF40_TABLE_GEOGGRAPHIES to remove the need for outer joins

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions