prestodb
diff --git a/‎presto-docs/src/main/sphinx/admin/materialized-views.rst‎
Lines changed: 239 additions & 0 deletions b/‎presto-docs/src/main/sphinx/admin/materialized-views.rst‎
Lines changed: 239 additions & 0 deletions
diff --git a/‎presto-docs/src/main/sphinx/connector/iceberg.rst‎
Lines changed: 14 additions & 5 deletions b/‎presto-docs/src/main/sphinx/connector/iceberg.rst‎
Lines changed: 14 additions & 5 deletions
@@ -72,6 +72,245 @@ The following permissions are required for materialized view operations when
   * For DEFINER mode: User needs ``SELECT`` permission on the view itself
   * For INVOKER mode: User needs ``SELECT`` permission on all underlying base tables
 
+Data Consistency Modes
+----------------------
+
+Materialized views support three data consistency modes that control how queries are optimized
+when the view's data may be stale:
+
+**USE_STITCHING** (default)
+  Reads fresh partitions from storage, recomputes stale partitions from base tables,
+  and combines results via UNION.
+
+**USE_DATA_TABLE**
+  Reads directly from storage, ignoring staleness. Fastest but may return stale data.
+
+**USE_VIEW_QUERY**
+  Executes the view query against base tables. Always fresh but highest cost.
+
+Set via session property::
+
+    SET SESSION materialized_view_skip_storage = 'USE_STITCHING';
+
+Partition Stitching (USE_STITCHING Mode)
+----------------------------------------
+
+Overview
+^^^^^^^^
+
+Partition stitching recomputes only stale partitions rather than the entire view. When base
+tables change, Presto identifies which partitions are affected and generates a UNION query
+that combines:
+
+* **Storage scan**: Reads unchanged (fresh) partitions from the materialized view's storage
+* **Recompute branch**: Recomputes changed (stale) partitions from base tables using the view's
+  defining query
+
+This avoids full recomputation when only a subset of partitions are stale, though there is
+overhead from the UNION operation and partition-level filtering.
+
+How It Works
+^^^^^^^^^^^^
+
+**Staleness Detection**
+
+For each base table referenced in the materialized view, a connector may track which partitions
+have changed since the last refresh and return predicates identifying the stale data. The
+specific mechanism depends on the connector:
+
+1. At refresh time, partition-level metadata is recorded (implementation varies by connector)
+2. When the view is queried, the current partition state is compared with the recorded state
+3. Partition constraints are built that identify exactly which data is stale
+
+See the connector-specific documentation for details on how staleness is tracked.
+For Iceberg tables, see :doc:`/connector/iceberg` (Materialized Views section).
+
+**Query Rewriting**
+
+When a query uses a materialized view with stale partitions, the optimizer rewrites the query
+to use UNION::
+
+    -- Original query
+    SELECT * FROM my_materialized_view WHERE order_date >= '2024-01-01'
+
+    -- Rewritten with partition stitching
+    SELECT * FROM (
+        -- Fresh partitions from storage
+        SELECT * FROM my_materialized_view_storage
+        WHERE order_date >= '2024-01-01'
+          AND order_date NOT IN ('2024-01-15', '2024-01-16')  -- Exclude stale
+    UNION ALL
+        -- Stale partitions recomputed
+        SELECT o.order_id, c.customer_name, o.order_date
+        FROM orders o
+        JOIN customers c ON o.customer_id = c.customer_id
+        WHERE o.order_date IN ('2024-01-15', '2024-01-16')  -- Stale partition filter
+          AND c.reg_date IN ('2024-01-15', '2024-01-16')    -- Propagated via equivalence
+          AND o.order_date >= '2024-01-01'  -- Original filter preserved
+    )
+
+The partition predicate is propagated to equivalent columns in joined tables (in this case,
+``c.reg_date``), allowing partition pruning on the ``customers`` table as well.
+
+Requirements
+^^^^^^^^^^^^
+
+For partition stitching to work effectively, the following requirements must be met:
+
+**Partitioning Requirement**
+
+Both base tables and the materialized view must be partitioned, and partition columns must be
+preserved through the view's query:
+
+* Base table partition columns must appear in the SELECT list or be equivalent to columns that do
+* The materialized view should be partitioned on the same or equivalent columns
+* Partition columns must use compatible data types
+
+**Unsupported Query Patterns**
+
+Partition stitching does not work with:
+
+* **Outer joins**: LEFT, RIGHT, and FULL OUTER joins
+* **Non-deterministic functions**: ``RANDOM()``, ``NOW()``, ``UUID()``, etc.
+
+**Security Constraints**
+
+For SECURITY INVOKER materialized views, partition stitching requires that:
+
+* No column masks are defined on base tables (or the view is treated as fully stale)
+* No row filters are defined on base tables (or the view is treated as fully stale)
+
+This is because column masks and row filters can vary by user, making it impossible to
+determine staleness in a user-independent way.
+
+Column Equivalences and Passthrough Partitions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Partition stitching supports **passthrough partitions** through **column equivalences**,
+which allows tracking partition staleness even when partition columns from base tables
+are not directly in the materialized view's output.
+
+**Column Equivalence**
+
+When tables are joined with equality predicates on partition columns, those columns become
+equivalent for partition tracking purposes::
+
+    CREATE TABLE orders (order_id BIGINT, customer_id BIGINT, order_date VARCHAR)
+      WITH (partitioning = ARRAY['order_date']);
+
+    CREATE TABLE customers (customer_id BIGINT, name VARCHAR, reg_date VARCHAR)
+      WITH (partitioning = ARRAY['reg_date']);
+
+    -- MV with equivalence: order_date = reg_date
+    CREATE MATERIALIZED VIEW order_summary
+    WITH (partitioning = ARRAY['order_date'])
+    AS
+      SELECT o.order_id, c.name, o.order_date
+      FROM orders o
+      JOIN customers c ON o.customer_id = c.customer_id
+                      AND o.order_date = c.reg_date;  -- Creates equivalence
+
+In this example:
+
+* ``orders.order_date`` and ``customers.reg_date`` are equivalent due to the equality join condition
+* Even though ``reg_date`` is not in the MV's SELECT list, staleness can be tracked through the equivalence to ``order_date``
+* When ``customers`` table changes in partition ``reg_date='2024-01-15'``, this maps to ``order_date='2024-01-15'`` for recomputation
+
+**How Passthrough Mapping Works**
+
+1. **Equivalence Extraction**: During MV creation, Presto analyzes JOIN conditions to identify
+   partition column equivalences
+
+2. **Staleness Detection**: When a base table changes:
+
+   * Presto detects which partitions changed in the base table
+   * For passthrough columns, constraints are mapped through equivalences
+   * Example: ``customers.reg_date='2024-01-15'`` → ``orders.order_date='2024-01-15'``
+
+3. **Constraint Application**: The mapped constraints are used in:
+
+   * Storage scan: Exclude partitions where equivalent columns match stale values
+   * Recompute branch: Filter the stale table using its partition column
+   * Joined tables: Propagate the partition predicate to equivalent columns in joined
+     tables, enabling partition pruning on those tables as well
+
+**Requirements for Passthrough Partitions**
+
+* Join must be an INNER JOIN (not LEFT, RIGHT, or FULL OUTER)
+* Equality must be direct (``col1 = col2``), not through expressions like ``col1 = col2 + 1``
+* Both columns must be partition columns in their respective tables
+* At least one column in the equivalence class must be in the MV's output
+* Data types must be compatible
+
+**Transitive Equivalences**
+
+Multiple equivalences can be chained together. If ``A.x = B.y`` and ``B.y = C.z``, then
+``A.x``, ``B.y``, and ``C.z`` are all equivalent for partition tracking.
+
+Unsupported Patterns
+^^^^^^^^^^^^^^^^^^^^
+
+Partition stitching is **not** applied in the following cases:
+
+* **Non-partitioned tables**: If base tables or the materialized view lack partitioning
+* **Partition columns not preserved**: If partition columns are transformed or not in the output
+* **Outer joins with passthrough**: LEFT, RIGHT, and FULL OUTER joins invalidate passthrough equivalences due to null handling
+* **Expression-based equivalences**: ``CAST(col1 AS DATE) = col2`` or ``col1 = col2 + 1``
+
+When partition stitching cannot be applied, the behavior falls back to the configured consistency mode:
+
+* If ``USE_STITCHING`` is set but stitching is not possible, the query falls back to full
+  recompute (equivalent to ``USE_VIEW_QUERY``)
+* A warning may be logged indicating why stitching was not possible
+
+Performance Considerations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**When Stitching is Most Effective**
+
+* **Large materialized views**: More benefit from avoiding full recomputation
+* **Localized changes**: When only a small fraction of partitions are stale
+* **Frequently refreshed**: When most partitions remain fresh between queries
+* **Well-partitioned data**: When partition scheme aligns with data modification patterns
+
+**Cost Trade-offs**
+
+Partition stitching introduces a UNION operation, which has overhead:
+
+* **Storage scan overhead**: Reading from storage + filtering fresh partitions
+* **Recompute overhead**: Querying base tables + filtering stale partitions
+* **Union overhead**: Combining results from both branches
+
+However, this is typically much cheaper than:
+
+* **Full recompute**: Reading all base table data
+* **Stale data**: Returning incorrect results
+
+**Optimization Tips**
+
+1. **Partition granularity**: Choose partition columns that align with data modification patterns
+
+   * Too coarse (e.g., partitioning by year): Recomputes too much data
+   * Too fine (e.g., partitioning by second): Too many partitions to manage
+
+2. **Refresh frequency**: Balance freshness needs with refresh costs
+
+   * More frequent refreshes: Less recomputation per query, but higher refresh costs
+   * Less frequent refreshes: More recomputation per query, but lower refresh costs
+
+3. **Query filters**: Include partition column filters in queries when possible::
+
+       -- Good: Limits scan to relevant partitions
+       SELECT * FROM mv WHERE order_date >= '2024-01-01'
+
+       -- Less optimal: Scans all partitions
+       SELECT * FROM mv WHERE customer_id = 12345
+
+4. **Monitor metrics**: Track the ratio of storage scan vs recompute:
+
+   * High recompute ratio: Consider more frequent refreshes or better partitioning
+   * High storage scan ratio: Stitching is working efficiently
+
 See Also
 --------
 
 
@@ -2306,18 +2306,27 @@ Property Name                                              Description
 
 The storage table inherits standard Iceberg table properties for partitioning, sorting, and file format.
 
+Staleness Tracking
+^^^^^^^^^^^^^^^^^^
+
+The Iceberg connector tracks materialized view staleness at the partition level, enabling
+partition stitching to recompute only affected partitions rather than the entire view.
+
+.. note::
+    Partition-level staleness detection only works for append-only changes (INSERT).
+    DELETE or UPDATE operations on base tables cause the entire view to be treated
+    as stale, requiring full recomputation.
+
 Freshness and Refresh
 ^^^^^^^^^^^^^^^^^^^^^^
 
-Materialized views track the snapshot IDs of their base tables to determine staleness. When base tables are modified, the materialized view becomes stale and returns results by querying the base tables directly. After running ``REFRESH MATERIALIZED VIEW``, queries read from the pre-computed storage table.
-
-The refresh operation uses a full refresh strategy, replacing all data in the storage table with the current query results.
+After running ``REFRESH MATERIALIZED VIEW``, queries read from the pre-computed storage table. The refresh operation uses a full refresh strategy, replacing all data in the storage table with the current query results and recording the new snapshot IDs for all base tables.
 
 Limitations
 ^^^^^^^^^^^
 
-- All refreshes recompute the entire result set
-- REFRESH does not provide snapshot isolation across multiple base tables
+- All refreshes recompute the entire result set (incremental refresh not yet supported)
+- REFRESH does not provide snapshot isolation across multiple base tables (each base table's current snapshot is used independently)
 - Querying materialized views at specific snapshots or timestamps is not supported
 
 Example