The performance of the RIF40_NUM_DENOM view has been observed to be poor if there are lots of geographies on Postgres, talking 2.8s to run with 6 geographies and 9 pairs.
sahsuland=> EXPLAIN ANALYZE VERBOSE SELECT geography, numerator_table, denominator_table FROM rif40_num_denom;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on rif40_num_denom (cost=29.98..30.00 rows=2 width=274) (actual time=2672.372..2672.373 rows=9 loops=1)
Output: rif40_num_denom.geography, rif40_num_denom.numerator_table, rif40_num_denom.denominator_table
-> Sort (cost=29.98..29.98 rows=2 width=2322) (actual time=2672.371..2672.372 rows=9 loops=1)
Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
Sort Key: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_description
Sort Method: quicksort Memory: 27kB
CTE n
-> Nested Loop (cost=0.00..9.19 rows=1 width=143) (actual time=72.631..836.368 rows=6 loops=1)
Output: g_1.geography, g_1.description, n_2.table_name, n_2.description, n_2.automatic, t.theme, t.description
Join Filter: ((n_2.theme)::text = (t.theme)::text)
Rows Removed by Join Filter: 12
-> Nested Loop (cost=0.00..8.17 rows=1 width=111) (actual time=72.622..836.311 rows=6 loops=1)
Output: g_1.geography, g_1.description, n_2.table_name, n_2.description, n_2.automatic, n_2.theme
Join Filter: (rif40_num_denom_validate(g_1.geography, n_2.table_name) = 1)
Rows Removed by Join Filter: 36
-> Seq Scan on rif40.rif40_tables n_2 (cost=0.00..6.35 rows=1 width=64) (actual time=12.540..84.342 rows=6 loops=1)
Output: n_2.theme, n_2.table_name, n_2.description, n_2.year_start, n_2.year_stop, n_2.total_field, n_2.isindirectdenominator, n_2.isdirectdenominator, n_2.isnumerator, n_2.automatic, n_2.sex_field_name, n_2.age_group_field_name, n_2.age_sex_group_field_name, n_2.age_group_id, n_2.validation_date
Filter: ((n_2.isnumerator = 1) AND (n_2.automatic = 1) AND (rif40_is_object_resolvable(n_2.table_name) = 1))
Rows Removed by Filter: 14
-> Seq Scan on rif40.rif40_geographies g_1 (cost=0.00..1.03 rows=3 width=47) (actual time=0.003..0.006 rows=7 loops=6)
Output: g_1.geography, g_1.description, g_1.hierarchytable, g_1.srid, g_1.defaultcomparea, g_1.defaultstudyarea, g_1.postal_population_table, g_1.postal_point_column, g_1.partition, g_1.max_geojson_digits, g_1.geometrytable, g_1.tiletable, g_1.minzoomlevel, g_1.maxzoomlevel, g_1.adjacencytable, g_1.map_background
-> Seq Scan on rif40.rif40_health_study_themes t (cost=0.00..1.01 rows=1 width=42) (actual time=0.002..0.003 rows=3 loops=6)
Output: t.theme, t.description
CTE d
-> Nested Loop (cost=0.00..13.17 rows=1 width=61) (actual time=283.976..1809.279 rows=5 loops=1)
Output: g_2.geography, d_2.table_name, d_2.description
Join Filter: (rif40_num_denom_validate(g_2.geography, d_2.table_name) = 1)
Rows Removed by Join Filter: 30
-> Seq Scan on rif40.rif40_tables d_2 (cost=0.00..11.35 rows=1 width=52) (actual time=228.131..1152.516 rows=5 loops=1)
Output: d_2.theme, d_2.table_name, d_2.description, d_2.year_start, d_2.year_stop, d_2.total_field, d_2.isindirectdenominator, d_2.isdirectdenominator, d_2.isnumerator, d_2.automatic, d_2.sex_field_name, d_2.age_group_field_name, d_2.age_sex_group_field_name, d_2.age_group_id, d_2.validation_date
Filter: ((d_2.isindirectdenominator = 1) AND (d_2.automatic = 1) AND (rif40_auto_indirect_checks(d_2.table_name) IS NULL) AND (rif40_is_object_resolvable(d_2.table_name) = 1))
Rows Removed by Filter: 15
-> Seq Scan on rif40.rif40_geographies g_2 (cost=0.00..1.03 rows=3 width=9) (actual time=0.003..0.007 rows=7 loops=5)
Output: g_2.geography, g_2.description, g_2.hierarchytable, g_2.srid, g_2.defaultcomparea, g_2.defaultstudyarea, g_2.postal_population_table, g_2.postal_point_column, g_2.partition, g_2.max_geojson_digits, g_2.geometrytable, g_2.tiletable, g_2.minzoomlevel, g_2.maxzoomlevel, g_2.adjacencytable, g_2.map_background
-> Unique (cost=7.56..7.61 rows=2 width=2322) (actual time=2672.354..2672.357 rows=9 loops=1)
Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
-> Sort (cost=7.56..7.56 rows=2 width=2322) (actual time=2672.353..2672.354 rows=9 loops=1)
Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
Sort Key: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
Sort Method: quicksort Memory: 27kB
-> Append (cost=0.00..7.55 rows=2 width=2322) (actual time=356.618..2672.329 rows=9 loops=1)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..0.07 rows=1 width=2322) (actual time=356.618..2645.697 rows=5 loops=1)
Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, "*SELECT* 1".automatic
-> Nested Loop (cost=0.00..0.05 rows=1 width=2320) (actual time=356.617..2645.692 rows=5 loops=1)
Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.theme_name, n.theme_description, d.denominator_table, d.denominator_description, n.automatic
Join Filter: ((n.geography)::text = (d.geography)::text)
Rows Removed by Join Filter: 25
-> CTE Scan on n (cost=0.00..0.02 rows=1 width=1726) (actual time=72.636..836.381 rows=6 loops=1)
Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.automatic, n.theme_name, n.theme_description
-> CTE Scan on d (cost=0.00..0.02 rows=1 width=712) (actual time=47.330..301.549 rows=5 loops=6)
Output: d.geography, d.denominator_table, d.denominator_description
-> Nested Loop Left Join (cost=4.50..7.47 rows=1 width=201) (actual time=26.607..26.631 rows=4 loops=1)
Output: nd.geography, g.description, nd.numerator_table, n_1.description, h.theme, h.description, nd.denominator_table, d_1.description, 0
-> Nested Loop Left Join (cost=4.37..6.72 rows=1 width=165) (actual time=26.597..26.616 rows=4 loops=1)
Output: nd.geography, nd.numerator_table, nd.denominator_table, g.description, n_1.description, n_1.theme, d_1.description
Join Filter: ((g.geography)::text = (nd.geography)::text)
Rows Removed by Join Filter: 24
-> Hash Right Join (cost=4.37..5.66 rows=1 width=127) (actual time=26.588..26.593 rows=4 loops=1)
Output: nd.geography, nd.numerator_table, nd.denominator_table, n_1.description, n_1.theme, d_1.description
Hash Cond: ((d_1.table_name)::text = (nd.denominator_table)::text)
-> Seq Scan on rif40.rif40_tables d_1 (cost=0.00..1.20 rows=20 width=52) (actual time=0.008..0.008 rows=20 loops=1)
Output: d_1.theme, d_1.table_name, d_1.description, d_1.year_start, d_1.year_stop, d_1.total_field, d_1.isindirectdenominator, d_1.isdirectdenominator, d_1.isnumerator, d_1.automatic, d_1.sex_field_name, d_1.age_group_field_name, d_1.age_sex_group_field_name, d_1.age_group_id, d_1.validation_date
-> Hash (cost=4.36..4.36 rows=1 width=93) (actual time=26.575..26.575 rows=4 loops=1)
Output: nd.geography, nd.numerator_table, nd.denominator_table, n_1.description, n_1.theme
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Right Join (cost=3.07..4.36 rows=1 width=93) (actual time=26.568..26.572 rows=4 loops=1)
Output: nd.geography, nd.numerator_table, nd.denominator_table, n_1.description, n_1.theme
Hash Cond: ((n_1.table_name)::text = (nd.numerator_table)::text)
-> Seq Scan on rif40.rif40_tables n_1 (cost=0.00..1.20 rows=20 width=62) (actual time=0.008..0.009 rows=20 loops=1)
Output: n_1.theme, n_1.table_name, n_1.description, n_1.year_start, n_1.year_stop, n_1.total_field, n_1.isindirectdenominator, n_1.isdirectdenominator, n_1.isnumerator, n_1.automatic, n_1.sex_field_name, n_1.age_group_field_name, n_1.age_sex_group_field_name, n_1.age_group_id, n_1.validation_date -> Hash (cost=3.06..3.06 rows=1 width=49) (actual time=26.553..26.553 rows=4 loops=1)
Output: nd.geography, nd.numerator_table, nd.denominator_table
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on peter.t_rif40_num_denom nd (cost=0.00..3.06 rows=1 width=49) (actual time=6.796..26.549 rows=4 loops=1)
Output: nd.geography, nd.numerator_table, nd.denominator_table
Filter: ((rif40_is_object_resolvable(nd.numerator_table) = 1) AND (rif40_is_object_resolvable(nd.denominator_table) = 1))
-> Seq Scan on rif40.rif40_geographies g (cost=0.00..1.03 rows=3 width=47) (actual time=0.002..0.002 rows=7 loops=4)
Output: g.geography, g.description, g.hierarchytable, g.srid, g.defaultcomparea, g.defaultstudyarea, g.postal_population_table, g.postal_point_column, g.partition, g.max_geojson_digits, g.geometrytable, g.tiletable, g.minzoomlevel, g.maxzoomlevel, g.adjacencytable, g.map_background
-> Index Scan using rif40_health_study_themes_pk on rif40.rif40_health_study_themes h (cost=0.13..0.74 rows=1 width=42) (actual time=0.003..0.003 rows=1 loops=4)
Output: h.theme, h.description
Index Cond: ((h.theme)::text = (n_1.theme)::text)
Planning time: 0.838 ms
Execution time: 2672.509 ms
(83 rows)
a) The denominator common table expression took 0.3s to initialise and 1.8s to run
-> Nested Loop (cost=0.00..0.05 rows=1 width=2320) (actual time=356.617..2645.692 rows=5 loops=1)
Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.theme_name, n.theme_description, d.denominator_table, d.denominator_description, n.automatic
Join Filter: ((n.geography)::text = (d.geography)::text)
Rows Removed by Join Filter: 25
-> CTE Scan on n (cost=0.00..0.02 rows=1 width=1726) (actual time=72.636..836.381 rows=6 loops=1)
Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.automatic, n.theme_name, n.theme_description
-> CTE Scan on d (cost=0.00..0.02 rows=1 width=712) (actual time=47.330..301.549 rows=5 loops=6)
Output: d.geography, d.denominator_table, d.denominator_description
Furtger tests showed that inner query d1 was efficient and the worst performing function was rif40_auto_indirect_checks.
The performance of the RIF40_NUM_DENOM view has been observed to be poor if there are lots of geographies on Postgres, talking 2.8s to run with 6 geographies and 9 pairs.
Analysing the query gives:
Analysis of the output shows that
a) The denominator common table expression took 0.3s to initialise and 1.8s to run
And b) that the subsequent denominator/numerator join is OK:
This is confirmed by tunning the SQL for d:
Furtger tests showed that inner query d1 was efficient and the worst performing function was rif40_auto_indirect_checks.
Potential solutions:
is rewritten to use this join. There is a lot of repeating code in rif40_num_denom_validate and
rif40_auto_indirect_checks which needs to be called once.