@@ -72,6 +72,249 @@ The following permissions are required for materialized view operations when
7272 * For DEFINER mode: User needs ``SELECT `` permission on the view itself
7373 * For INVOKER mode: User needs ``SELECT `` permission on all underlying base tables
7474
75+ Data Consistency Modes
76+ ----------------------
77+
78+ Materialized views support three data consistency modes that control how queries are optimized
79+ when the view's data may be stale:
80+
81+ **USE_STITCHING ** (default)
82+ Reads fresh data from storage, recomputes stale data from base tables,
83+ and combines results via UNION.
84+
85+ **USE_DATA_TABLE **
86+ Reads directly from storage, ignoring staleness. Fastest but may return stale data.
87+
88+ **USE_VIEW_QUERY **
89+ Executes the view query against base tables. Always fresh but highest cost.
90+
91+ Set via session property::
92+
93+ SET SESSION materialized_view_skip_storage = 'USE_STITCHING';
94+
95+ Predicate Stitching (USE_STITCHING Mode)
96+ ----------------------------------------
97+
98+ Overview
99+ ^^^^^^^^
100+
101+ Predicate stitching recomputes only stale data rather than the entire view. When base
102+ tables change, Presto identifies which data is affected and generates a UNION query
103+ that combines:
104+
105+ * **Storage scan **: Reads unchanged (fresh) data from the materialized view's storage
106+ * **Recompute branch **: Recomputes changed (stale) data from base tables using the view's
107+ defining query
108+
109+ This avoids full recomputation when only a subset of data is stale, though there is
110+ overhead from the UNION operation and predicate-based filtering.
111+
112+ How It Works
113+ ^^^^^^^^^^^^
114+
115+ **Staleness Detection **
116+
117+ For each base table referenced in the materialized view, a connector may track which data
118+ has changed since the last refresh and return predicates identifying the stale data. The
119+ specific mechanism depends on the connector:
120+
121+ 1. At refresh time, metadata is recorded (implementation varies by connector)
122+ 2. When the view is queried, the current state is compared with the recorded state
123+ 3. Predicates are built that identify exactly which data is stale
124+
125+ See the connector-specific documentation for details on how staleness is tracked.
126+ For Iceberg tables, see :doc: `/connector/iceberg ` (Materialized Views section).
127+
128+ **Query Rewriting **
129+
130+ When a query uses a materialized view with stale data, the optimizer rewrites the query
131+ to use UNION::
132+
133+ -- Original query
134+ SELECT * FROM my_materialized_view WHERE order_date >= '2024-01-01'
135+
136+ -- Rewritten with predicate stitching (example using partition predicates)
137+ SELECT * FROM (
138+ -- Fresh partitions from storage
139+ SELECT * FROM my_materialized_view_storage
140+ WHERE order_date >= '2024-01-01'
141+ AND order_date NOT IN ('2024-01-15', '2024-01-16') -- Exclude stale
142+ UNION ALL
143+ -- Stale partitions recomputed
144+ SELECT o.order_id, c.customer_name, o.order_date
145+ FROM orders o
146+ JOIN customers c ON o.customer_id = c.customer_id
147+ WHERE o.order_date IN ('2024-01-15', '2024-01-16') -- Stale partition filter
148+ AND c.reg_date IN ('2024-01-15', '2024-01-16') -- Propagated via equivalence
149+ AND o.order_date >= '2024-01-01' -- Original filter preserved
150+ )
151+
152+ The partition predicate is propagated to equivalent columns in joined tables (in this case,
153+ ``c.reg_date ``), allowing partition pruning on the ``customers `` table as well.
154+
155+ Requirements
156+ ^^^^^^^^^^^^
157+
158+ For predicate stitching to work effectively, the following requirements must be met:
159+
160+ **Predicate Mapping Requirement **
161+
162+ The connector must be able to express staleness as predicates that can be mapped to the
163+ materialized view's columns. The specific requirements depend on the connector implementation.
164+ For partition-based connectors (like Iceberg), this typically means:
165+
166+ * Base table partition columns must appear in the SELECT list or be equivalent to columns that do
167+ * The materialized view should be partitioned on the same or equivalent columns
168+ * Partition columns must use compatible data types
169+
170+ See connector-specific documentation for details on staleness tracking requirements.
171+
172+ **Unsupported Query Patterns **
173+
174+ Predicate stitching does not work with:
175+
176+ * **Outer joins **: LEFT, RIGHT, and FULL OUTER joins
177+ * **Non-deterministic functions **: ``RANDOM() ``, ``NOW() ``, ``UUID() ``, etc.
178+
179+ **Security Constraints **
180+
181+ For SECURITY INVOKER materialized views, predicate stitching requires that:
182+
183+ * No column masks are defined on base tables (or the view is treated as fully stale)
184+ * No row filters are defined on base tables (or the view is treated as fully stale)
185+
186+ This is because column masks and row filters can vary by user, making it impossible to
187+ determine staleness in a user-independent way.
188+
189+ Column Equivalences and Passthrough Columns
190+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
191+
192+ Predicate stitching supports **passthrough columns ** through **column equivalences **,
193+ which allows tracking staleness even when predicate columns from base tables
194+ are not directly in the materialized view's output.
195+
196+ **Column Equivalence **
197+
198+ When tables are joined with equality predicates, those columns become equivalent for
199+ predicate propagation purposes. This applies to any type of staleness predicate
200+ (partition-based, snapshot-based, etc.). For example with partition predicates::
201+
202+ CREATE TABLE orders (order_id BIGINT, customer_id BIGINT, order_date VARCHAR)
203+ WITH (partitioning = ARRAY['order_date']);
204+
205+ CREATE TABLE customers (customer_id BIGINT, name VARCHAR, reg_date VARCHAR)
206+ WITH (partitioning = ARRAY['reg_date']);
207+
208+ -- MV with equivalence: order_date = reg_date
209+ CREATE MATERIALIZED VIEW order_summary
210+ WITH (partitioning = ARRAY['order_date'])
211+ AS
212+ SELECT o.order_id, c.name, o.order_date
213+ FROM orders o
214+ JOIN customers c ON o.customer_id = c.customer_id
215+ AND o.order_date = c.reg_date; -- Creates equivalence
216+
217+ In this example:
218+
219+ * ``orders.order_date `` and ``customers.reg_date `` are equivalent due to the equality join condition
220+ * Even though ``reg_date `` is not in the MV's SELECT list, staleness can be tracked through the equivalence to ``order_date ``
221+ * When ``customers `` table changes in partition ``reg_date='2024-01-15' ``, this maps to ``order_date='2024-01-15' `` for recomputation
222+
223+ **How Passthrough Mapping Works **
224+
225+ 1. **Equivalence Extraction **: During MV creation, Presto analyzes JOIN conditions to identify
226+ column equivalences
227+
228+ 2. **Staleness Detection **: When a base table changes:
229+
230+ * The connector detects which data changed in the base table and returns predicates
231+ * For passthrough columns, predicates are mapped through equivalences
232+ * Example: ``customers.reg_date='2024-01-15' `` → ``orders.order_date='2024-01-15' ``
233+
234+ 3. **Predicate Application **: The mapped predicates are used in:
235+
236+ * Storage scan: Exclude data where equivalent columns match stale values
237+ * Recompute branch: Filter the stale table using the staleness predicate
238+ * Joined tables: Propagate the predicate to equivalent columns in joined
239+ tables, enabling pruning on those tables as well
240+
241+ **Requirements for Passthrough Columns **
242+
243+ * Join must be an INNER JOIN (not LEFT, RIGHT, or FULL OUTER)
244+ * Equality must be direct (``col1 = col2 ``), not through expressions like ``col1 = col2 + 1 ``
245+ * At least one column in the equivalence class must be in the MV's output
246+ * Data types must be compatible
247+
248+ **Transitive Equivalences **
249+
250+ Multiple equivalences can be chained together. If ``A.x = B.y `` and ``B.y = C.z ``, then
251+ ``A.x ``, ``B.y ``, and ``C.z `` are all equivalent for predicate propagation.
252+
253+ Unsupported Patterns
254+ ^^^^^^^^^^^^^^^^^^^^
255+
256+ Predicate stitching is **not ** applied in the following cases:
257+
258+ * **No staleness predicates available **: If the connector cannot provide staleness predicates
259+ * **Predicate columns not preserved **: If predicate columns are transformed or not mappable to MV output
260+ * **Outer joins with passthrough **: LEFT, RIGHT, and FULL OUTER joins invalidate passthrough equivalences due to null handling
261+ * **Expression-based equivalences **: ``CAST(col1 AS DATE) = col2 `` or ``col1 = col2 + 1 ``
262+
263+ When predicate stitching cannot be applied, the behavior falls back to the configured consistency mode:
264+
265+ * If ``USE_STITCHING `` is set but stitching is not possible, the query falls back to full
266+ recompute (equivalent to ``USE_VIEW_QUERY ``)
267+ * A warning may be logged indicating why stitching was not possible
268+
269+ Performance Considerations
270+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
271+
272+ **When Stitching is Most Effective **
273+
274+ * **Large materialized views **: More benefit from avoiding full recomputation
275+ * **Localized changes **: When only a small fraction of data is stale
276+ * **Frequently refreshed **: When most data remains fresh between queries
277+ * **Well-structured data **: When staleness predicates align with data modification patterns
278+
279+ **Cost Trade-offs **
280+
281+ Predicate stitching introduces a UNION operation, which has overhead:
282+
283+ * **Storage scan overhead **: Reading from storage + filtering fresh data
284+ * **Recompute overhead **: Querying base tables + filtering stale data
285+ * **Union overhead **: Combining results from both branches
286+
287+ However, this is typically much cheaper than:
288+
289+ * **Full recompute **: Reading all base table data
290+ * **Stale data **: Returning incorrect results
291+
292+ **Optimization Tips **
293+
294+ 1. **Predicate granularity **: For partition-based connectors, choose partition columns that align
295+ with data modification patterns
296+
297+ * Too coarse (e.g., partitioning by year): Recomputes too much data
298+ * Too fine (e.g., partitioning by second): Too many partitions to manage
299+
300+ 2. **Refresh frequency **: Balance freshness needs with refresh costs
301+
302+ * More frequent refreshes: Less recomputation per query, but higher refresh costs
303+ * Less frequent refreshes: More recomputation per query, but lower refresh costs
304+
305+ 3. **Query filters **: Include predicate columns in query filters when possible::
306+
307+ -- Good: Limits scan to relevant data
308+ SELECT * FROM mv WHERE order_date >= '2024-01-01'
309+
310+ -- Less optimal: Scans all data
311+ SELECT * FROM mv WHERE customer_id = 12345
312+
313+ 4. **Monitor metrics **: Track the ratio of storage scan vs recompute:
314+
315+ * High recompute ratio: Consider more frequent refreshes or better staleness granularity
316+ * High storage scan ratio: Stitching is working efficiently
317+
75318See Also
76319--------
77320
0 commit comments