RIF40_NUM_DENOM view performance

The performance of the RIF40_NUM_DENOM view has been observed to be poor if there are lots of geographies on Postgres, talking 2.8s to run with 6 geographies and 9 pairs.

```
sahsuland=> \timing on
Timing is on.
sahsuland=> SELECT geography, numerator_table, denominator_table FROM rif40_num_denom;
   geography   |   numerator_table    |     denominator_table
---------------+----------------------+---------------------------
 HALLAND       | HALLAND_CANCER       | HALLAND_POPULATION
 SAHSULAND     | NUM_SAHSULAND_CANCER | POP_SAHSULAND_POP
 STOCKHOLM     | STOCKHOLM_CANCER     | STOCKHOLM_POPULATION
 SWEDEN_COUNTY | SWEDEN_CANCER        | SWEDEN_POPULATION
 SYDVAST       | SYDVAST_TEST_1       | SYDVAST_TEST_1_POPULATION
 SYDVAST       | SYDVAST_TEST_2       | SYDVAST_TEST_2_POPULATION
 SYDVAST       | SYDVAST_TEST_3       | SYDVAST_TEST_3_POPULATION
 SYDVAST       | SYDVAST_TEST_4       | SYDVAST_TEST_4_POPULATION
 USA_2014      | SEER_CANCER          | SEER_POPULATION
(9 rows)

Time: 2840.897 ms
```

Analysing the query gives:
```
sahsuland=> EXPLAIN ANALYZE VERBOSE SELECT geography, numerator_table, denominator_table FROM rif40_num_denom;
                                                                                                                                                                               QUERY PLAN                                                                                                                                                                               
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on rif40_num_denom  (cost=29.98..30.00 rows=2 width=274) (actual time=2672.372..2672.373 rows=9 loops=1)
   Output: rif40_num_denom.geography, rif40_num_denom.numerator_table, rif40_num_denom.denominator_table
   ->  Sort  (cost=29.98..29.98 rows=2 width=2322) (actual time=2672.371..2672.372 rows=9 loops=1)
         Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
         Sort Key: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_description
         Sort Method: quicksort  Memory: 27kB
         CTE n
           ->  Nested Loop  (cost=0.00..9.19 rows=1 width=143) (actual time=72.631..836.368 rows=6 loops=1)
                 Output: g_1.geography, g_1.description, n_2.table_name, n_2.description, n_2.automatic, t.theme, t.description
                 Join Filter: ((n_2.theme)::text = (t.theme)::text)
                 Rows Removed by Join Filter: 12
                 ->  Nested Loop  (cost=0.00..8.17 rows=1 width=111) (actual time=72.622..836.311 rows=6 loops=1)
                       Output: g_1.geography, g_1.description, n_2.table_name, n_2.description, n_2.automatic, n_2.theme
                       Join Filter: (rif40_num_denom_validate(g_1.geography, n_2.table_name) = 1)
                       Rows Removed by Join Filter: 36
                       ->  Seq Scan on rif40.rif40_tables n_2  (cost=0.00..6.35 rows=1 width=64) (actual time=12.540..84.342 rows=6 loops=1)
                             Output: n_2.theme, n_2.table_name, n_2.description, n_2.year_start, n_2.year_stop, n_2.total_field, n_2.isindirectdenominator, n_2.isdirectdenominator, n_2.isnumerator, n_2.automatic, n_2.sex_field_name, n_2.age_group_field_name, n_2.age_sex_group_field_name, n_2.age_group_id, n_2.validation_date
                             Filter: ((n_2.isnumerator = 1) AND (n_2.automatic = 1) AND (rif40_is_object_resolvable(n_2.table_name) = 1))
                             Rows Removed by Filter: 14
                       ->  Seq Scan on rif40.rif40_geographies g_1  (cost=0.00..1.03 rows=3 width=47) (actual time=0.003..0.006 rows=7 loops=6)
                             Output: g_1.geography, g_1.description, g_1.hierarchytable, g_1.srid, g_1.defaultcomparea, g_1.defaultstudyarea, g_1.postal_population_table, g_1.postal_point_column, g_1.partition, g_1.max_geojson_digits, g_1.geometrytable, g_1.tiletable, g_1.minzoomlevel, g_1.maxzoomlevel, g_1.adjacencytable, g_1.map_background
                 ->  Seq Scan on rif40.rif40_health_study_themes t  (cost=0.00..1.01 rows=1 width=42) (actual time=0.002..0.003 rows=3 loops=6)
                       Output: t.theme, t.description
         CTE d
           ->  Nested Loop  (cost=0.00..13.17 rows=1 width=61) (actual time=283.976..1809.279 rows=5 loops=1)
                 Output: g_2.geography, d_2.table_name, d_2.description
                 Join Filter: (rif40_num_denom_validate(g_2.geography, d_2.table_name) = 1)
                 Rows Removed by Join Filter: 30
                 ->  Seq Scan on rif40.rif40_tables d_2  (cost=0.00..11.35 rows=1 width=52) (actual time=228.131..1152.516 rows=5 loops=1)
                       Output: d_2.theme, d_2.table_name, d_2.description, d_2.year_start, d_2.year_stop, d_2.total_field, d_2.isindirectdenominator, d_2.isdirectdenominator, d_2.isnumerator, d_2.automatic, d_2.sex_field_name, d_2.age_group_field_name, d_2.age_sex_group_field_name, d_2.age_group_id, d_2.validation_date
                       Filter: ((d_2.isindirectdenominator = 1) AND (d_2.automatic = 1) AND (rif40_auto_indirect_checks(d_2.table_name) IS NULL) AND (rif40_is_object_resolvable(d_2.table_name) = 1))
                       Rows Removed by Filter: 15
                 ->  Seq Scan on rif40.rif40_geographies g_2  (cost=0.00..1.03 rows=3 width=9) (actual time=0.003..0.007 rows=7 loops=5)
                       Output: g_2.geography, g_2.description, g_2.hierarchytable, g_2.srid, g_2.defaultcomparea, g_2.defaultstudyarea, g_2.postal_population_table, g_2.postal_point_column, g_2.partition, g_2.max_geojson_digits, g_2.geometrytable, g_2.tiletable, g_2.minzoomlevel, g_2.maxzoomlevel, g_2.adjacencytable, g_2.map_background
         ->  Unique  (cost=7.56..7.61 rows=2 width=2322) (actual time=2672.354..2672.357 rows=9 loops=1)
               Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
               ->  Sort  (cost=7.56..7.56 rows=2 width=2322) (actual time=2672.353..2672.354 rows=9 loops=1)
                     Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
                     Sort Key: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, (("*SELECT* 1".automatic)::integer)
                     Sort Method: quicksort  Memory: 27kB
                     ->  Append  (cost=0.00..7.55 rows=2 width=2322) (actual time=356.618..2672.329 rows=9 loops=1)
                           ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..0.07 rows=1 width=2322) (actual time=356.618..2645.697 rows=5 loops=1)
                                 Output: "*SELECT* 1".geography, "*SELECT* 1".geography_description, "*SELECT* 1".numerator_table, "*SELECT* 1".numerator_description, "*SELECT* 1".theme_name, "*SELECT* 1".theme_description, "*SELECT* 1".denominator_table, "*SELECT* 1".denominator_description, "*SELECT* 1".automatic
                                 ->  Nested Loop  (cost=0.00..0.05 rows=1 width=2320) (actual time=356.617..2645.692 rows=5 loops=1)
                                       Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.theme_name, n.theme_description, d.denominator_table, d.denominator_description, n.automatic
                                       Join Filter: ((n.geography)::text = (d.geography)::text)
                                       Rows Removed by Join Filter: 25
                                       ->  CTE Scan on n  (cost=0.00..0.02 rows=1 width=1726) (actual time=72.636..836.381 rows=6 loops=1)
                                             Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.automatic, n.theme_name, n.theme_description
                                       ->  CTE Scan on d  (cost=0.00..0.02 rows=1 width=712) (actual time=47.330..301.549 rows=5 loops=6)
                                             Output: d.geography, d.denominator_table, d.denominator_description
                           ->  Nested Loop Left Join  (cost=4.50..7.47 rows=1 width=201) (actual time=26.607..26.631 rows=4 loops=1)
                                 Output: nd.geography, g.description, nd.numerator_table, n_1.description, h.theme, h.description, nd.denominator_table, d_1.description, 0
                                 ->  Nested Loop Left Join  (cost=4.37..6.72 rows=1 width=165) (actual time=26.597..26.616 rows=4 loops=1)
                                       Output: nd.geography, nd.numerator_table, nd.denominator_table, g.description, n_1.description, n_1.theme, d_1.description
                                       Join Filter: ((g.geography)::text = (nd.geography)::text)
                                       Rows Removed by Join Filter: 24
                                       ->  Hash Right Join  (cost=4.37..5.66 rows=1 width=127) (actual time=26.588..26.593 rows=4 loops=1)
                                             Output: nd.geography, nd.numerator_table, nd.denominator_table, n_1.description, n_1.theme, d_1.description
                                             Hash Cond: ((d_1.table_name)::text = (nd.denominator_table)::text)
                                             ->  Seq Scan on rif40.rif40_tables d_1  (cost=0.00..1.20 rows=20 width=52) (actual time=0.008..0.008 rows=20 loops=1)
                                                   Output: d_1.theme, d_1.table_name, d_1.description, d_1.year_start, d_1.year_stop, d_1.total_field, d_1.isindirectdenominator, d_1.isdirectdenominator, d_1.isnumerator, d_1.automatic, d_1.sex_field_name, d_1.age_group_field_name, d_1.age_sex_group_field_name, d_1.age_group_id, d_1.validation_date
                                             ->  Hash  (cost=4.36..4.36 rows=1 width=93) (actual time=26.575..26.575 rows=4 loops=1)
                                                   Output: nd.geography, nd.numerator_table, nd.denominator_table, n_1.description, n_1.theme
                                                   Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                   ->  Hash Right Join  (cost=3.07..4.36 rows=1 width=93) (actual time=26.568..26.572 rows=4 loops=1)
                                                         Output: nd.geography, nd.numerator_table, nd.denominator_table, n_1.description, n_1.theme
                                                         Hash Cond: ((n_1.table_name)::text = (nd.numerator_table)::text)
                                                         ->  Seq Scan on rif40.rif40_tables n_1  (cost=0.00..1.20 rows=20 width=62) (actual time=0.008..0.009 rows=20 loops=1)
                                                               Output: n_1.theme, n_1.table_name, n_1.description, n_1.year_start, n_1.year_stop, n_1.total_field, n_1.isindirectdenominator, n_1.isdirectdenominator, n_1.isnumerator, n_1.automatic, n_1.sex_field_name, n_1.age_group_field_name, n_1.age_sex_group_field_name, n_1.age_group_id, n_1.validation_date                                                         ->  Hash  (cost=3.06..3.06 rows=1 width=49) (actual time=26.553..26.553 rows=4 loops=1)
                                                               Output: nd.geography, nd.numerator_table, nd.denominator_table
                                                               Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                               ->  Seq Scan on peter.t_rif40_num_denom nd  (cost=0.00..3.06 rows=1 width=49) (actual time=6.796..26.549 rows=4 loops=1)
                                                                     Output: nd.geography, nd.numerator_table, nd.denominator_table
                                                                     Filter: ((rif40_is_object_resolvable(nd.numerator_table) = 1) AND (rif40_is_object_resolvable(nd.denominator_table) = 1))
                                       ->  Seq Scan on rif40.rif40_geographies g  (cost=0.00..1.03 rows=3 width=47) (actual time=0.002..0.002 rows=7 loops=4)
                                             Output: g.geography, g.description, g.hierarchytable, g.srid, g.defaultcomparea, g.defaultstudyarea, g.postal_population_table, g.postal_point_column, g.partition, g.max_geojson_digits, g.geometrytable, g.tiletable, g.minzoomlevel, g.maxzoomlevel, g.adjacencytable, g.map_background
                                 ->  Index Scan using rif40_health_study_themes_pk on rif40.rif40_health_study_themes h  (cost=0.13..0.74 rows=1 width=42) (actual time=0.003..0.003 rows=1 loops=4)
                                       Output: h.theme, h.description
                                       Index Cond: ((h.theme)::text = (n_1.theme)::text)
 Planning time: 0.838 ms
 Execution time: 2672.509 ms
(83 rows)
```

Analysis of the output shows that 

a) The denominator common table expression took 0.3s to initialise and 1.8s to run 
```
CTE d
  ->  Nested Loop  (cost=0.00..13.17 rows=1 width=61) (actual time=283.976..1809.279 rows=5 loops=1)
```

And b) that the subsequent denominator/numerator join is OK:

``` 
->  Nested Loop  (cost=0.00..0.05 rows=1 width=2320) (actual time=356.617..2645.692 rows=5 loops=1)
       Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.theme_name, n.theme_description, d.denominator_table, d.denominator_description, n.automatic
       Join Filter: ((n.geography)::text = (d.geography)::text)
       Rows Removed by Join Filter: 25
       ->  CTE Scan on n  (cost=0.00..0.02 rows=1 width=1726) (actual time=72.636..836.381 rows=6 loops=1)
             Output: n.geography, n.geography_description, n.numerator_table, n.numerator_description, n.automatic, n.theme_name, n.theme_description
       ->  CTE Scan on d  (cost=0.00..0.02 rows=1 width=712) (actual time=47.330..301.549 rows=5 loops=6)
             Output: d.geography, d.denominator_table, d.denominator_description
```

This is confirmed by tunning the SQL for d:

```
sahsuland=> SELECT d1.geography,
sahsuland->            d1.denominator_table,
sahsuland->            d1.denominator_description
sahsuland->       FROM ( SELECT g.geography,
sahsuland(>                     d.table_name AS denominator_table,
sahsuland(>                     d.description AS denominator_description
sahsuland(>                FROM rif40_geographies g,
sahsuland(>                     rif40_tables d
sahsuland(>                   WHERE ((d.isindirectdenominator = 1) AND (d.automatic = 1) AND (rif40_is_object_resolvable(d.table_name) = 1))) d1
sahsuland->      WHERE ((rif40_num_denom_validate(d1.geography, d1.denominator_table) = 1) AND (rif40_auto_indirect_checks(d1.denominator_table) IS NULL))
sahsuland-> ;
   geography   |  denominator_table   |                                 denominator_description
---------------+----------------------+------------------------------------------------------------------------------------------
 SAHSULAND     | POP_SAHSULAND_POP    | population health file
 USA_2014      | SEER_POPULATION      | SEER Population 1972-2013. Georgia starts in 1975, Washington in 1974. 9 States in total
 SWEDEN_COUNTY | SWEDEN_POPULATION    | Sweden Population 19??-20??.
 HALLAND       | HALLAND_POPULATION   | Halland Population 19??-20??.
 STOCKHOLM     | STOCKHOLM_POPULATION | Stockholm Population 19??-20??.
(5 rows)


Time: 1799.655 ms
```
Furtger tests showed that inner query d1 was efficient and the worst performing function was *rif40_auto_indirect_checks*.

Potential solutions:

* Re-write the query so that n and d1 are joined are joined in d and *rif40_auto_indirect_checks*
  is rewritten to use this join. There is a lot of repeating code in *rif40_num_denom_validate* and 
  *rif40_auto_indirect_checks* which needs to be called once.
* Add a new entity *RIF40_TABLE_GEOGGRAPHIES* to remove the need for outer joins

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RIF40_NUM_DENOM view performance #150

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RIF40_NUM_DENOM view performance #150

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions