Skip to content

Conversation

@zyratlo
Copy link
Contributor

@zyratlo zyratlo commented Oct 3, 2025

NOTE: this tool is still in development, design choices and features currently present are not finalized

PR Description

This PR reintroduces the migration tool branch to the Texera repository after it was removed during our transition to an Apache project. The code changes included in this PR are purely front-end GUI changes, as the back-end is currently a standalone micro-service separate from the Texera codebase.

Purpose

Currently, users who have existing code outside of Texera and want to migrate that code to Texera must create a workflow from scratch. This can take a long time to do depending on the complexity of the code. This tool aims to reduce the amount of time needed migrating to Texera by utilizing large language models to migrate Jupyter Notebooks to Texera workflows.

Tool Overview (Demo Videos Below)

The user can upload a Jupyter Notebook which will be given to the OpenAI LLM API to migrate into a Texera workflow. Once generated, the user can modify the workflow alongside the original notebook until they are satisfied with the migration results.

Design

image The uploaded notebook is passed through the front-end to the migration micro-service in the back-end. The micro-service will handle all communication with OpenAI. OpenAI returns the generated workflow to the micro-service, which passes it to the front-end to render. The communication design with OpenAI is shown below: image

Future Work

  • The main concern is the reliability and accuracy of the returned workflow from the LLM. The current effort is to research methods to improve this concern, such as relying more on algorithmic methods instead of black-box LLM results and reducing the dependency on OpenAI.
  • Another effort is to integrate the separate micro-service into the Texera back-end.

Demo

1. User starts with a Jupyter Notebook they want to migrate into Texera.

1.show.original.notebook.mp4

2. User uploads the Jupyter Notebook using the new tool button.

2.show.import.notebook.mp4

3. User can view the uploaded notebook from within Texera.

3.show.jupyter.window.mp4

4. Depending on the notebook size and complexity, generation can take between one to three minutes. After the workflow is generated, the user can begin editing.

4.show.workflow.mp4

@zyratlo zyratlo marked this pull request as ready for review October 4, 2025 03:00
Yicong-Huang and others added 6 commits October 3, 2025 23:27
…pache#3795)

Bumps
[@babel/helpers](https://github.com/babel/babel/tree/HEAD/packages/babel-helpers)
from 7.25.7 to 7.28.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/babel/babel/releases"><code>@​babel/helpers</code>'s
releases</a>.</em></p>
<blockquote>
<h2>v7.28.4 (2025-09-05)</h2>
<p>Thanks <a
href="https://github.com/gwillen"><code>@​gwillen</code></a> and <a
href="https://github.com/mrginglymus"><code>@​mrginglymus</code></a> for
your first PRs!</p>
<h4>:house: Internal</h4>
<ul>
<li><code>babel-core</code>,
<code>babel-helper-check-duplicate-nodes</code>,
<code>babel-traverse</code>, <code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17493">#17493</a>
Update Jest to v30.1.1 (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
<li><code>babel-plugin-transform-regenerator</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17455">#17455</a>
chore: Clean up <code>transform-regenerator</code> (<a
href="https://github.com/liuxingbaoyu"><code>@​liuxingbaoyu</code></a>)</li>
</ul>
</li>
<li><code>babel-core</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17474">#17474</a>
Switch to <code>@​jridgewell/remapping</code> (<a
href="https://github.com/mrginglymus"><code>@​mrginglymus</code></a>)</li>
</ul>
</li>
</ul>
<h4>Committers: 5</h4>
<ul>
<li>Babel Bot (<a
href="https://github.com/babel-bot"><code>@​babel-bot</code></a>)</li>
<li>Bill Collins (<a
href="https://github.com/mrginglymus"><code>@​mrginglymus</code></a>)</li>
<li>Glenn Willen (<a
href="https://github.com/gwillen"><code>@​gwillen</code></a>)</li>
<li>Huáng Jùnliàng (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
<li><a
href="https://github.com/liuxingbaoyu"><code>@​liuxingbaoyu</code></a></li>
</ul>
<h2>v7.28.3 (2025-08-14)</h2>
<h4>:eyeglasses: Spec Compliance</h4>
<ul>
<li><code>babel-helper-create-class-features-plugin</code>,
<code>babel-plugin-proposal-decorators</code>,
<code>babel-plugin-transform-class-static-block</code>,
<code>babel-preset-env</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17443">#17443</a>
[static blocks] Do not inject new static fields after static code (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
</ul>
<h4>:bug: Bug Fix</h4>
<ul>
<li><code>babel-parser</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17465">#17465</a>
fix(parser/typescript): parse <code>import(&quot;./a&quot;,
{with:{},})</code> (<a
href="https://github.com/easrng"><code>@​easrng</code></a>)</li>
<li><a
href="https://redirect.github.com/babel/babel/pull/17478">#17478</a>
fix(parser): stop subscript parsing on async arrow (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
</ul>
<h4>:nail_care: Polish</h4>
<ul>
<li><code>babel-plugin-transform-regenerator</code>,
<code>babel-plugin-transform-runtime</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17363">#17363</a> Do
not save last yield in call in temp var (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
</ul>
<h4>:memo: Documentation</h4>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17448">#17448</a>
move eslint-{parser,plugin} docs to the website (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
<h4>:house: Internal</h4>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17454">#17454</a>
Enable type checking for <code>scripts</code> and
<code>babel-worker.cjs</code> (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
<h4>:microscope: Output optimization</h4>
<ul>
<li><code>babel-plugin-proposal-destructuring-private</code>,
<code>babel-plugin-proposal-do-expressions</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17444">#17444</a>
Optimize do expression output (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
</ul>
<h4>Committers: 5</h4>
<ul>
<li>Babel Bot (<a
href="https://github.com/babel-bot"><code>@​babel-bot</code></a>)</li>
<li>Huáng Jùnliàng (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
<li>Jam Balaya (<a
href="https://github.com/JamBalaya56562"><code>@​JamBalaya56562</code></a>)</li>
<li>Nicolò Ribaudo (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
<li>easrng (<a
href="https://github.com/easrng"><code>@​easrng</code></a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/babel/babel/blob/main/CHANGELOG.md"><code>@​babel/helpers</code>'s
changelog</a>.</em></p>
<blockquote>
<h2>v7.28.4 (2025-09-05)</h2>
<h4>:house: Internal</h4>
<ul>
<li><code>babel-core</code>,
<code>babel-helper-check-duplicate-nodes</code>,
<code>babel-traverse</code>, <code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17493">#17493</a>
Update Jest to v30.1.1 (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
<li><code>babel-plugin-transform-regenerator</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17455">#17455</a>
chore: Clean up <code>transform-regenerator</code> (<a
href="https://github.com/liuxingbaoyu"><code>@​liuxingbaoyu</code></a>)</li>
</ul>
</li>
<li><code>babel-core</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17474">#17474</a>
Switch to <code>@​jridgewell/remapping</code> (<a
href="https://github.com/mrginglymus"><code>@​mrginglymus</code></a>)</li>
</ul>
</li>
</ul>
<h2>v7.28.3 (2025-08-14)</h2>
<h4>:eyeglasses: Spec Compliance</h4>
<ul>
<li><code>babel-helper-create-class-features-plugin</code>,
<code>babel-plugin-proposal-decorators</code>,
<code>babel-plugin-transform-class-static-block</code>,
<code>babel-preset-env</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17443">#17443</a>
[static blocks] Do not inject new static fields after static code (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
</ul>
<h4>:bug: Bug Fix</h4>
<ul>
<li><code>babel-parser</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17465">#17465</a>
fix(parser/typescript): parse <code>import(&quot;./a&quot;,
{with:{},})</code> (<a
href="https://github.com/easrng"><code>@​easrng</code></a>)</li>
<li><a
href="https://redirect.github.com/babel/babel/pull/17478">#17478</a>
fix(parser): stop subscript parsing on async arrow (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
</ul>
<h4>:nail_care: Polish</h4>
<ul>
<li><code>babel-plugin-transform-regenerator</code>,
<code>babel-plugin-transform-runtime</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17363">#17363</a> Do
not save last yield in call in temp var (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
</ul>
<h4>:memo: Documentation</h4>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17448">#17448</a>
move eslint-{parser,plugin} docs to the website (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
<h4>:house: Internal</h4>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17454">#17454</a>
Enable type checking for <code>scripts</code> and
<code>babel-worker.cjs</code> (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
<h4>:microscope: Output optimization</h4>
<ul>
<li><code>babel-plugin-proposal-destructuring-private</code>,
<code>babel-plugin-proposal-do-expressions</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17444">#17444</a>
Optimize do expression output (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
</ul>
<h2>v7.28.2 (2025-07-24)</h2>
<h4>:bug: Bug Fix</h4>
<ul>
<li><code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17445">#17445</a>
[babel 7] Make <code>operator</code> param in
<code>t.tsTypeOperator</code> optional (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-helpers</code>,
<code>babel-plugin-transform-async-generator-functions</code>,
<code>babel-plugin-transform-regenerator</code>,
<code>babel-preset-env</code>, <code>babel-runtime-corejs3</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17441">#17441</a>
fix: <code>regeneratorDefine</code> compatibility with es5 strict mode
(<a
href="https://github.com/liuxingbaoyu"><code>@​liuxingbaoyu</code></a>)</li>
</ul>
</li>
</ul>
<h2>v7.28.1 (2025-07-12)</h2>
<h4>:bug: Bug Fix</h4>
<ul>
<li><code>babel-plugin-transform-async-generator-functions</code>,
<code>babel-plugin-transform-regenerator</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17426">#17426</a>
fix: <code>regenerator</code> correctly handles <code>throw</code>
outside of <code>try</code> (<a
href="https://github.com/liuxingbaoyu"><code>@​liuxingbaoyu</code></a>)</li>
</ul>
</li>
</ul>
<h4>:memo: Documentation</h4>
<ul>
<li><code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/17422">#17422</a> Add
missing FunctionParameter docs (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/babel/babel/commit/35055e392079a65830b7bf5b1d1c1fc4de90a78f"><code>35055e3</code></a>
v7.28.4</li>
<li><a
href="https://github.com/babel/babel/commit/18d88b83c67c8dbbe63e4ac423e6006c4c01b85c"><code>18d88b8</code></a>
Improve <code>@​babel/core</code> typings (<a
href="https://github.com/babel/babel/tree/HEAD/packages/babel-helpers/issues/17471">#17471</a>)</li>
<li><a
href="https://github.com/babel/babel/commit/ef155f5ca83c73dbc1ea8d95216830b7dc3b0ac2"><code>ef155f5</code></a>
v7.28.3</li>
<li><a
href="https://github.com/babel/babel/commit/741cbd2381ac0cda3afd42bc04454a87d9d8762a"><code>741cbd2</code></a>
chore: fix various typos across codebase (<a
href="https://github.com/babel/babel/tree/HEAD/packages/babel-helpers/issues/17476">#17476</a>)</li>
<li><a
href="https://github.com/babel/babel/commit/cac0ff4c3426eed30b4d27e7971b348da7c9f1e6"><code>cac0ff4</code></a>
v7.28.2</li>
<li><a
href="https://github.com/babel/babel/commit/f743094585b39bd9f7a9e3a3561215b2103e2474"><code>f743094</code></a>
fix: <code>regeneratorDefine</code> compatibility with es5 strict mode
(<a
href="https://github.com/babel/babel/tree/HEAD/packages/babel-helpers/issues/17441">#17441</a>)</li>
<li><a
href="https://github.com/babel/babel/commit/baa4cb8b9f8a551d7dae9042b19ea2f74df6b110"><code>baa4cb8</code></a>
v7.27.6</li>
<li><a
href="https://github.com/babel/babel/commit/fdbf1b32b3aa3705761ff820661e81c0aececab7"><code>fdbf1b3</code></a>
fix: <code>finally</code> causes unexpected return value (<a
href="https://github.com/babel/babel/tree/HEAD/packages/babel-helpers/issues/17366">#17366</a>)</li>
<li><a
href="https://github.com/babel/babel/commit/7d069309fdfcedda2928a043f6f7c98135c1242a"><code>7d06930</code></a>
v7.27.4</li>
<li><a
href="https://github.com/babel/babel/commit/5b9468d9bf1ab4f427241673e9f03593da115a69"><code>5b9468d</code></a>
Reduce <code>regenerator</code> size more (<a
href="https://github.com/babel/babel/tree/HEAD/packages/babel-helpers/issues/17287">#17287</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/babel/babel/commits/v7.28.4/packages/babel-helpers">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@babel/helpers&package-manager=npm_and_yarn&previous-version=7.25.7&new-version=7.28.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts page](https://github.com/apache/texera/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xinyuan Lin <[email protected]>
Co-authored-by: Chris <[email protected]>
…e#3818)

### **Purpose**
This PR fixes apache#3804 that the upload status panel behaved unexpectedly:
when there were no queued/active uploads, the UI still rendered empty
panels, which was confusing. This PR hides empty panels and restores the
clear empty state.

### **Changes**
- Introduce a flag ` hasAnyActivity = queuedCount > 0 || activeCount > 0
|| pendingChangesCount > 0 `
- Conditionally render status panels:
  - Pending only when queuedCount > 0
  - Uploading only when activeCount > 0
  - Finished only when hasAnyActivity
- Restore the empty state: when no activity, render
`<texera-dataset-staged-objects-list>` outside the collapse so “No
pending changes” is visible
- Add a bottom divider beneath the staged list to improve visual
separation (the previous `[nzBorder]` was removed to avoid overlapping
with the vertical divider)

### **Demonstration**
**Datasets page:**
<img width="1315" height="870" alt="main"
src="https://github.com/user-attachments/assets/9112999c-24bc-4076-b139-eb3b405c2288"
/>

**Finished panel:** 
| Collapsed | Expanded (delete) | Expanded (adds) |
|---|---|---|
| <img width="260"
src="https://github.com/user-attachments/assets/58fe8ba4-2b13-45a6-b049-318b288ec37e"
alt="collapsed" /> | <img width="260"
src="https://github.com/user-attachments/assets/21dada5c-0e39-4a9f-b1ee-ab0ede345aca"
alt="delete" /> | <img width="260"
src="https://github.com/user-attachments/assets/d3a25bea-bee3-4bb7-8023-b407741b01fe"
alt="expand" /> |


**Uploading files:**


https://github.com/user-attachments/assets/413f9480-330e-456a-8a0a-7f87511fbf13

**Remove files:** 


https://github.com/user-attachments/assets/04fc0dc0-4f59-4e18-bb1d-72b0132ac943

Co-authored-by: Xinyuan Lin <[email protected]>
…#3597)

### Purpose
This PR fixes an issue with text wrapping in workflow comments, where
words would be broken up between sentences reducing readability

closes apache#3595

### Changes
- Editited css in `nz-modal-comment-box.component.scss` to fix wrapping

### Before:
<img width="565" height="434" alt="Screenshot (229)"
src="https://github.com/user-attachments/assets/e05d44b5-c11d-45a8-88ed-a60d5bd48776"
/>

### After:
<img width="530" height="433" alt="Screenshot (230)"
src="https://github.com/user-attachments/assets/970a8aa7-54d6-41ba-a774-0fb47b93db52"
/>

Co-authored-by: Xinyuan Lin <[email protected]>
This PR improves the template for creating GitHub issues of type "Bug",
based on the feedback provided in [this
comment](apache#3812 (comment))

1. Added a Pre-release Version option to the version selection field.
2. Added an optional Commit Hash field for developers to specify the
exact commit associated with the issue.
@yunyad yunyad self-requested a review October 6, 2025 18:25
…pache#3635)

Bumps [transformers](https://github.com/huggingface/transformers) from
4.44.2 to 4.53.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/transformers/releases">transformers's
releases</a>.</em></p>
<blockquote>
<h2>Release v4.53.0</h2>
<h3>Gemma3n</h3>
<p>Gemma 3n models are designed for efficient execution on low-resource
devices. They are capable of multimodal input, handling text, image,
video, and audio input, and generating text outputs, with open weights
for pre-trained and instruction-tuned variants. These models were
trained with data in over 140 spoken languages.</p>
<p>Gemma 3n models use selective parameter activation technology to
reduce resource requirements. This technique allows the models to
operate at an effective size of 2B and 4B parameters, which is lower
than the total number of parameters they contain. For more information
on Gemma 3n's efficient parameter management technology, see the <a
href="https://ai.google.dev/gemma/docs/gemma-3n#parameters">Gemma 3n</a>
page.</p>
<p><img
src="https://github.com/user-attachments/assets/858cb034-364d-4eb6-8de8-4a0b5eaff3d7"
alt="image" /></p>
<pre lang="python"><code>from transformers import pipeline
import torch
<p>pipe = pipeline(
&quot;image-text-to-text&quot;,
torch_dtype=torch.bfloat16,
model=&quot;google/gemma-3n-e4b&quot;,
device=&quot;cuda&quot;,
)
output = pipe(
&quot;<a
href="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg">https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg</a>&quot;,
text=&quot;&lt;image_soft_token&gt; in this image, there is&quot;
)</p>
<p>print(output)
</code></pre></p>
<h3>Dia</h3>
<p><img
src="https://github.com/user-attachments/assets/bf86e887-e4f4-4222-993d-f5eac58f8040"
alt="image" /></p>
<p>Dia is an opensource text-to-speech (TTS) model (1.6B parameters)
developed by <a href="https://huggingface.co/nari-labs">Nari Labs</a>.
It can generate highly realistic dialogue from transcript including
nonverbal communications such as laughter and coughing.
Furthermore, emotion and tone control is also possible via audio
conditioning (voice cloning).</p>
<p><strong>Model Architecture:</strong>
Dia is an encoder-decoder transformer based on the original transformer
architecture. However, some more modern features such as
rotational positional embeddings (RoPE) are also included. For its text
portion (encoder), a byte tokenizer is utilized while
for the audio portion (decoder), a pretrained codec model <a
href="https://github.com/huggingface/transformers/blob/HEAD/dac.md">DAC</a>
is used - DAC encodes speech into discrete codebook
tokens and decodes them back into audio.</p>
<ul>
<li>Add Dia model by <a
href="https://github.com/buttercrab"><code>@​buttercrab</code></a> in <a
href="https://redirect.github.com/huggingface/transformers/issues/38405">#38405</a></li>
</ul>
<h3>Kyutai Speech-to-Text</h3>
<!-- raw HTML omitted -->
<p>Kyutai STT is a speech-to-text model architecture based on the <a
href="https://huggingface.co/docs/transformers/en/model_doc/mimi">Mimi
codec</a>, which encodes audio into discrete tokens in a streaming
fashion, and a <a
href="https://huggingface.co/docs/transformers/en/model_doc/moshi">Moshi-like</a>
autoregressive decoder. Kyutai’s lab has released two model
checkpoints:</p>
<ul>
<li><a
href="https://huggingface.co/kyutai/stt-1b-en_fr">kyutai/stt-1b-en_fr</a>:
a 1B-parameter model capable of transcribing both English and
French</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/huggingface/transformers/commit/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39"><code>67ddc82</code></a>
Release: v4.53.0</li>
<li><a
href="https://github.com/huggingface/transformers/commit/0a8081b03d118da9a8c3fa143a03afe54a5c624e"><code>0a8081b</code></a>
[Modeling] Fix encoder CPU offloading for whisper (<a
href="https://redirect.github.com/huggingface/transformers/issues/38994">#38994</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/c63cfd6a833d629a74c098933017c61dd755969d"><code>c63cfd6</code></a>
Gemma 3n (<a
href="https://redirect.github.com/huggingface/transformers/issues/39059">#39059</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/3e5cc1285503bbdb6a0a3e173b5ae90566862215"><code>3e5cc12</code></a>
[tests] remove tests from libraries with deprecated support (flax,
tensorflow...</li>
<li><a
href="https://github.com/huggingface/transformers/commit/cfff7ca9a27280338c6a57dfa7722dcf44f51a87"><code>cfff7ca</code></a>
[Whisper] Pipeline: handle long form generation (<a
href="https://redirect.github.com/huggingface/transformers/issues/35750">#35750</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/02ecdcfc0f7d81e90a9c8e7f9e6d636123a84254"><code>02ecdcf</code></a>
add _keep_in_fp32_modules_strict (<a
href="https://redirect.github.com/huggingface/transformers/issues/39058">#39058</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/d973e62fdd86d64259f87debc46bbcbf6c7e5de2"><code>d973e62</code></a>
fix condition where torch_dtype auto collides with model_kwargs. (<a
href="https://redirect.github.com/huggingface/transformers/issues/39054">#39054</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/44b231671db25974cfebcdae34402ad5099bf37a"><code>44b2316</code></a>
[qwen2-vl] fix vision attention scaling (<a
href="https://redirect.github.com/huggingface/transformers/issues/39043">#39043</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/ae15715df138949328d18e1dd95fd9cb4efb8e09"><code>ae15715</code></a>
polishing docs: error fixes for clarity (<a
href="https://redirect.github.com/huggingface/transformers/issues/39042">#39042</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/3abeaba7e53512ef9c1314163dd7e462ab405ce6"><code>3abeaba</code></a>
Create test for <a
href="https://redirect.github.com/huggingface/transformers/issues/38916">#38916</a>
(custom generate from local dir with imports) (<a
href="https://redirect.github.com/huggingface/transformers/issues/39015">#39015</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/huggingface/transformers/compare/v4.44.2...v4.53.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.44.2&new-version=4.53.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts page](https://github.com/apache/texera/network/alerts).

</details>

> **Note**
> Automatic rebases have been disabled on this pull request as it has
been open for over 30 days.

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xinyuan Lin <[email protected]>
Copy link
Contributor

@yunyad yunyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Hardcoded Localhost URLs and Tokens
    There are multiple instances of hardcoded URLs and ports (e.g., http://localhost:5000/, http://localhost:8889/). It's recommended to move these into a configuration file or environment variable for better maintainability and portability.

  2. Use of fetch Instead of Angular HttpClient
    The current implementation uses native fetch inside Angular components. It is recommended to use Angular’s HttpClient service instead.

  3. Move Mapping Logic to Backend
    The mapping logic appears to reside on the frontend. For consistency and scalability, consider moving it to the backend so that it can be reused across sessions and clients.

bobbai00 and others added 14 commits October 7, 2025 15:09
When a PR is created or updated, automatically set the author as the
assignee.

Signed-off-by: Yicong Huang <[email protected]>
## Purpose
This PR sets user system to be enabled by default in the configuration.
Currently, this flag is by default set to be disabled (a.k.a. the
non-user mode). As no one is using the non-user mode and we are
requiring all the developers to enable the user system, we have decided
to abandon the non-user mode.

## Challenge & Design

The major blocker of setting the flag to be enabled by default is two
e2e test suites that rely on the non-user mode. These two test suites
execute a workflow in the Amber engine in each of their test cases.
Enabling the user mode would require texera_db in the test environment,
as in the user-system mode, the execution of a workflow requires an
`eid` (and subsequently a `vid`, `wid`, and `uid`) in `texera_db`.

We could use `MockTexeraDB`, which is currently used by many unit tests.
`MockTexeraDB` creates an embedded postgres instance per test suite, and
the embedded db is destroyed at the end of each such test suite.

However, a complexity of the two e2e test cases is they both access a
singleton resource `WorkflowExecutionsResource`, which caches the DSL
context from `SqlServer` (i.e., it only gets evaluated once per JVM):

```
 final private lazy val context = SqlServer
    .getInstance()
    .createDSLContext()
```

In fact, most of the singleton resources in our current codebase cache
the `DSLContext` / Dao, as the `DSLContext` never gets updated during
the real Texera environment (i.e., the real`texera_db`'s address never
changes).

In the test environment, however, when working with `MockTexeraDB`, that
assumption does not hold, as each instance of `MockTexeraDB` has a
different address, and gets destroyed before other test suite runs.
Since all the test suites are executed in the same JVM during CI run,
using `MockTexeraDB` would cause the 2nd of the two e2e test cases to
fail because it still uses the DSL context from the 1st test suite's
`MockTexeraDB`.

The diagrams below show what happens when using the embedded
`MockTexeraDB` to run two e2e test suites that both need to access the
same singleton resource during their execution.

The 1st test suite creates an embedded DB (`DB1`) and lets the singleton
`SqlServer` object set its `DSLContext` to point to `DB1`. When the test
cases first access `WorkflowExecutionsResource` (`WER`), WER grabs the
`DSLContext` from `SqlServer` and caches it. `WER` then queries `DB1`
for all the test cases of test suite 1. When test suite 1 finishes,
`DB1` gets destroyed.
![DB and CI -
1](https://github.com/user-attachments/assets/0e405744-d2e4-4543-8c51-13abd88a6845)

Later, In the same JVM, when test suite 2 starts, it also creates its
own embedded DB (`DB2`) and lets `SqlServer` point to `DB2`. However, as
the `DSLContext` in `WER` is cached, it does not get updated when the
test cases access `WER`, so `WER` still points to `DB1`, which is
already destroyed, and causes failures.
![DB and CI -
2](https://github.com/user-attachments/assets/af364b16-93c5-463e-8a24-952347584b2e)


To solve this problem, we could either:

1. Avoid caching DSLContext/Dao in the codebase, or
2. Let the two e2e test cases use the same real, external database (same
as production environment) instead of `MockTexeraDB`.

**We choose the 2nd design, as these two are e2e tests which should
emulate production behavior with a real database.** To avoid polluting
the developer's local `texera_db`, we use a separate test database with
the same schema.


## Changes
- Sets `user-sys` to be enabled by default.
- Introduces a `texera_db_for_test_cases` specifically for test cases
and CIs. `texera_ddl.sql` is updated to allow creating the database with
a name other than `texera_db` (and still defaults to `texera_db`), and
CIs will automatically create `texera_db_for_test_cases` with the same
schema as `texera_db`.
- Updates `DataProcessingSpec` and `PauseSpec` to use
`texera_db_for_test_cases`. The two test suites now populate and cleanup
this database during their run.
- `MockTexeraDB` is updated to incorporate the changes to the DDL
script.
- `SqlServer` is also updated with a `clearInstance` logic so that other
unit tests that use `MockTexeraDB` can clear their instance in
`SqlServer` properly so that they do not interfere with the two e2e
tests.

## Next Step

Remove the `user-sys`'s`enabled` flag and its `if-else` handling logic
completely.

---------

Co-authored-by: Xinyuan Lin <[email protected]>
…3796)

Bumps [prismjs](https://github.com/PrismJS/prism) from 1.29.0 to 1.30.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/PrismJS/prism/releases">prismjs's
releases</a>.</em></p>
<blockquote>
<h2>v1.30.0</h2>
<h2>What's Changed</h2>
<ul>
<li>check that <code>currentScript</code> is set by a script tag by <a
href="https://github.com/lkuechler"><code>@​lkuechler</code></a> in <a
href="https://redirect.github.com/PrismJS/prism/pull/3863">PrismJS/prism#3863</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/lkuechler"><code>@​lkuechler</code></a>
made their first contribution in <a
href="https://redirect.github.com/PrismJS/prism/pull/3863">PrismJS/prism#3863</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/PrismJS/prism/compare/v1.29.0...v1.30.0">https://github.com/PrismJS/prism/compare/v1.29.0...v1.30.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/PrismJS/prism/blob/v2/CHANGELOG.md">prismjs's
changelog</a>.</em></p>
<blockquote>
<h1>Prism Changelog</h1>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/PrismJS/prism/commit/76dde18a575831c91491895193f56081ac08b0c5"><code>76dde18</code></a>
Release 1.30.0</li>
<li><a
href="https://github.com/PrismJS/prism/commit/93cca40b364215210f23a9e35f085a682a2b8175"><code>93cca40</code></a>
npm pkg fix</li>
<li><a
href="https://github.com/PrismJS/prism/commit/99c5ca970f18f744d75e473573d4679100f87086"><code>99c5ca9</code></a>
Add release script</li>
<li><a
href="https://github.com/PrismJS/prism/commit/8e8b9352dac64457194dd9e51096b4772532e53d"><code>8e8b935</code></a>
check that currentScript is set by a script tag (<a
href="https://redirect.github.com/PrismJS/prism/issues/3863">#3863</a>)</li>
<li><a
href="https://github.com/PrismJS/prism/commit/f894dc2cbb507f565a046fed844fd541f07aa191"><code>f894dc2</code></a>
Fix logo in the footer</li>
<li><a
href="https://github.com/PrismJS/prism/commit/ac38dcec9bea6bac064a7264b7aeba086e3102bf"><code>ac38dce</code></a>
Delete CNAME</li>
<li><a
href="https://github.com/PrismJS/prism/commit/9b5b09aef4dc2c18c28d2f5a6244d4efcc6ab5cb"><code>9b5b09a</code></a>
Enable CORS</li>
<li>See full diff in <a
href="https://github.com/PrismJS/prism/compare/v1.29.0...v1.30.0">compare
view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by <a
href="https://www.npmjs.com/~dmitrysharabin">dmitrysharabin</a>, a new
releaser for prismjs since your current version.</p>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=prismjs&package-manager=npm_and_yarn&previous-version=1.29.0&new-version=1.30.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts page](https://github.com/apache/texera/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xinyuan Lin <[email protected]>
Co-authored-by: yunyad <[email protected]>
Reverts apache#3835. The added action
`technote-space/assign-author@v1` is not approved by apache.
## Update
This PR fixes formatting issues that introduce redundant file changes in
the core [PR](apache#3598).
…ze the requests to `/wsapi` and `Computing Unit` endpoints (apache#3598)

## Access Control Service
This service is currently used only by envoy as authorization service.
It act as a third party service to authorize any request sent to the
computing unit to get socket connection through `/wsapi`. It parses the
`user-token` from URL parameters and then check user access to the
computing unit by checking the database and add the corresponding
information to the following headers:
- x-user-cu-access
- x-user-id
- x-user-name
- x-user-email
If the service can not parse the token or fail for any reason, the
access to computing unit is denied by envoy. If the authorization
succeed, the user is directly connected to computing unit using
`Upgrade` on the first `HTTP` handshake request so the latency will not
change.

## The new connection flow 

<img width="1282" height="577"
alt="489656839-e09b06ee-3915-4c18-9584-e880bc06011d"
src="https://github.com/user-attachments/assets/f7b0d29e-f30b-4e7f-9a0d-966f52d8d48a"
/>


1. A user initiates an `HTTP` request to connect to a specific Computing
Unit.
2.  The request is first routed through the **Gateway** to **Envoy**.
3. Envoy pauses the request and sends a query to the **Access Control
Service** to get an authorization decision.
4. The Access Control Service verifies the user's token and checks a
PostgreSQL database to see if the user has the necessary permissions for
the target Computing Unit.
5. **If authorized**, the service injects specific HTTP headers
(`x-user-cu-access`, `x-user-id`, `x-user-name`) into the request and
sends an approval back to Envoy.
6.  Envoy then forwards the approved request to the Computing Unit.
7. The connection is then upgraded to a WebSocket, establishing a
secure, interactive session.

If authorization fails at any point, Envoy immediately denies the
connection request, and the user is prevented from accessing the
Computing Unit. This new process provides **enhanced security**, a
**centralized authorization logic**, and is designed to have **no
performance impact** on the established WebSocket connection since the
check is performed only on the initial handshake.

## Summary of file changes

| Component/Flow | File | Description |
| :--- | :--- | :--- |
| **Database Access Logic** |
`core/auth/src/main/scala/edu/uci/ics/texera/auth/util/ComputingUnitAccess.scala`
| Implements the logic to query the PostgreSQL database and determine a
user's access privilege (`READ`, `WRITE`, `NONE`) for a given Computing
Unit. |
| |
`core/auth/src/main/scala/edu/uci/ics/texera/auth/util/HeaderField.scala`
| Defines constants for the custom HTTP headers (`x-user-cu-access`,
`x-user-id`, etc.) that are injected by the Access Control Service. |
| **WebSocket Connection Handling** |
`core/amber/src/main/scala/edu/uci/ics/texera/web/ServletAwareConfigurator.scala`
| Modified to read the new authorization headers during the WebSocket
handshake. If headers are present, it creates the `User` object from
them; otherwise, it falls back to the old method of parsing the JWT from
URL parameters for single-node mode. |
| |
`core/amber/src/main/scala/edu/uci/ics/texera/web/SessionState.scala` |
Updated to store the user's access privilege level for the current
computing unit within the session. |
| |
`core/amber/src/main/scala/edu/uci/ics/texera/web/resource/WorkflowWebsocketResource.scala`
| Enforces the access control by checking if the user has `WRITE`
privilege before allowing a `WorkflowExecuteRequest`. |
| **Deployment & Routing** |
`deployment/access-control-service.dockerfile` | New Dockerfile for
building and containerizing the Access Control Service. |
| |
`deployment/k8s/texera-helmchart/templates/access-control-service-deployment.yaml`
| New Kubernetes manifest to deploy the Access Control Service. |
| |
`deployment/k8s/texera-helmchart/templates/access-control-service-service.yaml`
| New Kubernetes service manifest to expose the Access Control Service
within the cluster. |
| | `deployment/k8s/texera-helmchart/templates/envoy-config.yaml` |
**Key change:** Configures Envoy to use the new service as an external
authorization filter (`ext_authz`). It intercepts relevant requests,
forwards them for an authorization check, and then passes the injected
headers to the upstream service (AmberMaster). |
| | `deployment/k8s/texera-helmchart/values.yaml` | Adds the
configuration parameters for the new Access Control Service to the Helm
chart. |
| **Frontend UI** |
`core/gui/src/app/workspace/component/menu/menu.component.ts` & `.html`|
The frontend is updated to disable the "Run" button if the connected
user does not have `WRITE` access to the selected Computing Unit,
providing immediate visual feedback. |
| **Build & Configuration** | `core/build.sbt` | The root SBT build file
is updated to include the new `AccessControlService` module. |
| | `core/config/src/main/scala/edu/uci/ics/amber/util/PathUtils.scala`
| Adds a path helper for the new service's directory structure. |

---------

Co-authored-by: Ali Risheh <[email protected]>
…ough parameters (apache#3820)

## Summary
- Fixed non-deterministic parameter ordering issue when creating Dataset
objects from JOOQ records
- Used `createdDataset.into(classOf[Dataset])` to convert DatasetRecord
to Dataset POJO instead of manual constructor

Fixes apache#3821

---------

Co-authored-by: Claude <[email protected]>
# Purpose

This PR is a successor of apache#3782. As the non-user system mode is no
longer used or maintained, we can remove the flag to switch between
user-system being enabled/disabled, and keep only the mode of
user-system being enabled.

# Content

- Removed the `user-sys.enabled` flag, both in the frontend and backend.
- Removed all the if-else statements based on this flag in the codebase.
Only the cases of user system being enabled are kept.
- Removed `ExecutionResourceMapping` in the backend as it is no longer
needed.
- Removed `WorkflowCacheService` in the frontend as it is no longer
needed.

---------

Co-authored-by: Xinyuan Lin <[email protected]>
…pache#3836)

## Purpose

apache#3571 disabled frontend undo/redo due to an existing bug with the
undo/redo manager during shared editing. This PR fixes that bug and
re-enables undo/redo.

## Bug with shared editing

The bug can be minimally reproduced as follows with two users editing
the same workflow (or two tabs opened by the same user):

1. User A deletes a link E from operator X to Y on the canvas,
2. User B deletes operator Y.
3. User A clicks "undo", and the workflow reaches an erroneous state,
where there is a link E that connects to an operator Y that no longer
exists. Note E exists in the frontend data but is not visible on the UI.

The following gif shows this process.
![Screen Recording 2025-10-08 at 10 38
58](https://github.com/user-attachments/assets/e890f86e-33e8-48be-b3b2-9f95a7460fde)

## Shared-editing Architecture

Shared editing (apache#1674) is achieved by letting the frontend rely on data
structures from yjs (a CRDT library) as its data model, as any
manipulation to these data structures can be propagated to other users
with automatic conflict-resolution.

There are two layers of data on each user's Texera frontend, one being
the UI data (jointjs), and the other being this shared "Y data". The two
layers in each user's UI are synched by our application code, and the Y
data between users of a shared-editing sessions are kept in sync with
automatic conflict resolution by relying on yjs. The following diagram
shows what happens when a user adds a link and how the other user sees
this change in real-time.


![shared-editing-process](https://github.com/user-attachments/assets/d81ed158-f7fc-4842-8e64-5436add3d221)

Yjs's CRDT guarantees the eventual **consistency** of this underlying
data model among concurrent editors, i.e., it makes sure this data model
is correctly synced in each editor's frontend.

## The core problem

Yjs does not offer a "graph" data structure, and currently in Texera,
the shared data structures for operators and links are two separate
`Map`s:

- `operatorIDMap`: `operatorID`->`Operator`
- `operatorLinkMap`: `linkID`-> `OperatorLink`

There is an application-specific "referential constraint" in Texera's
frontend that "a link must connect to an operator that exists", and this
kind of sanity checking on the data is not the concern of CRDT. It can
only be enforced by the application (i.e., ourselves). Ideally, before
making any changes to the shared data model, we should do sanity
checking and reject changes that violate our application-specific
constraints.


As shown below, in each user's frontend, there are 3 paths where the
shared data model can be modified.


![shared-editing-issue](https://github.com/user-attachments/assets/9a8da72c-90bd-412a-8f0a-384cf9388c8c)

**Path 1**: The first is path includes those changes initiated by a
user's UI actions (e.g., add a link on the UI). For this path, we do
have existing sanity checking logic:

```
public addLink(link: OperatorLink): void {
    this.assertLinkNotExists(link);
    this.assertLinkIsValid(link);
    this.sharedModel.operatorLinkMap.set(link.linkID, link);
  }
```

**Path 2**: Another path is undo/redo, which is purely managed by an
`UndoManager`, also offered by Yjs. This module is local to each user's
frontend, and it automatically tracks local changes to the shared data
model. When a user clicks "undo", `UndoManager` directly applies changes
to the shared data model. **The core of the problem is there is no
sanity checking on this path.**

**Path 3**: The third path is remote changes from another collaborator.
There is also no sanity checking on this path, but the correctness of
such changes depends on whether the change was sanity-checked on the
collaborator's side (i.e., if it is a UI change from User A, the
propagated change to User B's frontend would be sanity-checked; if it is
a undo change, however, the propagated changed to User B would not be
sanity-checked and could cause issues.)

## Cause of the bug

The following diagram shows how the bug happens from the perspective of
the shared model.

![shared-editing-steps](https://github.com/user-attachments/assets/066c4989-1b3c-412f-bcfa-f1d3689cb5de)

When user A clicks "Undo" after 2), the `UndoManager` simply applies the
reverse-operation of "Delete E", and add the link `E` to
`operatorLinkMap `. As there is no sanity checking during this process,
this operation succeeds, and the shared model reaches a state that
violates the constraint.

## Solution

Unfortunately, due to the limitations of Yjs's APIs, it is not possible
to add sanity checking to Path 2 or 3 **before** a change is applied, as
an undo/redo operation on the `UndoManager`'s stack is not exposed as a
meaningful action (i.e., there is no way to tell that an action to be
applied to the shared model is an `addLink` if it is an undo operation).

Nevertheless, we can react to a change to the shared model that is
initiated from Path 2 or Path 3 after the change has been applied, and
add sanity checking logic there to "repair" unsanitary changes.

This places (`SharedModelChangeHandler`) is exactly where we sync the
changes from the shared model to the UI: any changes to the shared model
not initiated by the UI (i.e., changes from the `UndoManager` or remote
changes by other users) go through this place, and such changes are
parsed as meaningful changes such as "add a link", "delete an operator",
etc.

![shared-editing-solution](https://github.com/user-attachments/assets/d3fb4d5b-d9c3-4993-ab9b-2158ecacd5be)

Currently, the only sanity checking needed is to check if a newly added
link connects to operator / ports that exist and that it is not a
duplicate link. We add such checking logic in
`SharedModelChangeHandler`, and revert unsanitary operations before it
is reflected on the UI.

## Demo

The following gif shows the experience after the fix. When unsanitary
actions caused by undo happens, it would fail and we log it in the
console. The workflow JSON no longer gets damaged.

![Screen Recording 2025-10-08 at 15 27
35](https://github.com/user-attachments/assets/81629e7f-2657-4b58-8a2c-0d159c1cef1c)

---------

Co-authored-by: Yicong Huang <[email protected]>
)

### **Purpose**
This PR resolved apache#3844 that pending uploads cannot be removed before
they start. This PR enables removing/canceling items directly from the
Pending panel, improving queue control and flexibility in managing
uploads.

### **Changes**
- Add a Remove action to Pending items; behavior and styling match the
Uploading panel’s remove action.
- Refactor cancelExistingUpload(fileName: string):
- Uploading/Initializing → reuse the abort path to properly finalize
server-side and prevent leaks.
- Pending → front-end clean only (remove from queue, tasks) with no
backend abort call.

### **Demonstration**


https://github.com/user-attachments/assets/aa4aa40c-bf7a-45fd-9257-fcfac4a00da9
This PR changes the following:

- remove `version` attribute
- update container names to avoid conflicts
- set default named volumes for data persistence

resolves apache#3816

---------

Co-authored-by: Jiadong Bai <[email protected]>
…e#3772)

### Description:
Implemented restriction on `export result` to prevent users from
exporting workflow results that depend on non-downloadable datasets they
don't own. This ensures dataset download cannot be circumvented through
workflow execution and result export.

Closes apache#3766 

### Changes:
**Backend**
- Added server-side validation to analyze workflow dependencies and
block export of operators that depend on non-downloadable datasets
- Implemented algorithm to propagate restrictions to downstream
operators

**Frontend**
- Updated export dialog component to show restriction warnings, filter
exportable operators, and display blocking dataset
  information

### Video:
The video demonstrates how `export result` behaves on:
- workflows with downloadable datasets
- workflows with non-downloadable datasets
- workflows with both downloadable and non-downloadable datasets


https://github.com/user-attachments/assets/56b78aeb-dbcc-40fc-89b4-9c4238f8bc56

---------

Signed-off-by: Seongjin Yoon <[email protected]>
Co-authored-by: Seongjin Yoon <[email protected]>
Co-authored-by: Xinyuan Lin <[email protected]>
Co-authored-by: Seongjin Yoon <[email protected]>
Co-authored-by: Seongjin Yoon <[email protected]>
Co-authored-by: Seongjin Yoon <[email protected]>
Co-authored-by: Jiadong Bai <[email protected]>
LJX2017 and others added 13 commits December 1, 2025 19:22
…4087)

<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->
This PR adds pre-configured IntelliJ run configurations for:
- launching all 8 backend microservices,
- the frontend service,
- and lakeFS via Docker Compose.

With these changes, developers can now launch the backend services,
lakeFS, and frontend directly from IntelliJ’s run menu, eliminating the
need to manually locate and configure each relevant class or compose
file. This leverages IntelliJ’s built-in Compound and individual run
configurations, so no additional plugins are required.


https://github.com/user-attachments/assets/9ef8fb13-2dc3-4598-ba44-0540d37202db



### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion in the mailing list, please add the link.
-->
Fixes apache#4045

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
Verified on a local IntelliJ IDEA environment. The Compound run config
cleanly launches all backend microservices in parallel.

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
No

---------

Co-authored-by: Xinyuan Lin <[email protected]>
Co-authored-by: Chen Li <[email protected]>
…est architecture (apache#4077)

### What changes were proposed in this PR?

This PR improves the single-node docker-compose configuration with the
following changes:


1. **Added microservices**:
- `config-service` (port 9094): Provides endpoints for configuration
management
- `access-control-service` (port 9096): Handles user permissions and
access control
- `workflow-computing-unit-managing-service` (port 8888): Provides
endpoints for managing computing units
- All services are added with proper health checks and dependencies on
postgres
- Nginx reverse proxy routes are configured for `/api/config` and
`/api/computing-unit`

2. **Removed outdated environment variables** from `.env`:
   - `USER_SYS_ENABLED=true`
   - `STORAGE_ICEBERG_CATALOG_TYPE=postgres`

3. **Removed unused example data loader**: the example data will be
loaded via other ways, not the container way anymore.

### Any related issues, documentation, discussions?

Closes apache#4083 

### How was this PR tested?

docker-compose tested locally.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-5-20250101)

---------

Co-authored-by: Claude <[email protected]>
Bumps [pg8000](https://github.com/tlocke/pg8000) from 1.31.2 to 1.31.5.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/tlocke/pg8000/commits">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pg8000&package-manager=pip&previous-version=1.31.2&new-version=1.31.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts page](https://github.com/apache/texera/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xiaozhen Liu <[email protected]>
### What changes were proposed in this PR?
Add a configuration option to automatically shorten file paths for
Windows users when the original path exceeds the system’s maximum
length.

After this PR, Windows users should not see this error anymore.

<img width="612" height="157" alt="image"
src="https://github.com/user-attachments/assets/73a23ef2-0fad-4f2f-bc99-c7f2e576a4d9"
/>


### Any related issues, documentation, discussions?
Follow-up of PR apache#4087


### How was this PR tested?
Tested manually.


### Was this PR authored or co-authored using generative AI tooling?
No
### What changes were proposed in this PR?
Removed official support for R-UDF. The frontend is not changed, but
during execution user will receive an error about unofficially supported
R-UDF. We plan to move the R-UDF to a third party hosted repo, so users
can install the R-UDF support as a plugin.

### Any related issues, documentation, discussions?
This change was due to the fact that R-UDF runtime requires `rpy2`,
which is not apache-license friendly.
resolves apache#4084 

### How was this PR tested?
Added test suite `TestExecutorManager`.

### Was this PR authored or co-authored using generative AI tooling?
Tests generated by Cursor.

---------

Co-authored-by: Yicong Huang <[email protected]>
Co-authored-by: Chen Li <[email protected]>
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->
1. Replace flake8 and black with Ruff in CI.
2. Format existing code using Ruff

Basic Ruff commands:
Under amber/src/main/python
```cd amber/src/main/python```
Run Ruff’s formatter in dry mode
```ruff format --check .```
Run Ruff’s formatter
```ruff format .```
Run Ruff’s linter
```ruff check .```

### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  4. If there is design documentation, please add the link.
  5. If there is a discussion in the mailing list, please add the link.
-->
Closes apache#4078

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
I created a PR on my own fork to ensure CI is working.

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
No

---------

Co-authored-by: Xinyuan Lin <[email protected]>
### What changes were proposed in this PR?

This PR bumps the project version from `1.0.0` to `1.1.0-incubating`
across all relevant configuration files:

- **`build.sbt`**: Updated `version := "1.0.0"` to `version :=
"1.1.0-incubating"`
- **`bin/single-node/docker-compose.yml`**:
- Updated project name from `texera-single-node-release-1-0-0` to
`texera-single-node-release-1-1-0-incubating`
- Updated network name from `texera-single-node-release-1-0-0` to
`texera-single-node-release-1-1-0-incubating`
- Updated all 7 Texera service image tags from `:latest` to
`:1.1.0-incubating`
  - Updated the R operator comment reference
- **`bin/k8s/values.yaml`**: Updated all 8 Texera service image tags
from `:latest` to `:1.1.0-incubating`

### Any related issues, documentation, discussions?

Closes apache#4082

### How was this PR tested?

This is a configuration-only change.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.5)

Co-authored-by: Claude Opus 4.5 <[email protected]>
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->

This PR renames the `BigObject` type to `LargeBinary`. The original
feature was introduced in apache#4067, but we decided to adopt the
`LargeBinary` terminology to align with naming conventions used in other
systems (e.g., Arrow).

This change is purely a renaming/terminology update and does not modify
the underlying functionality.


### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion in the mailing list, please add the link.
-->
apache#4100 (comment)


### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
Run this workflow and check if the workflow runs successfully and see if
three objects are created in MinIO console.
[Java
UDF.json](https://github.com/user-attachments/files/23976766/Java.UDF.json)



### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->

No.

---------

Signed-off-by: Chris <[email protected]>
Co-authored-by: Copilot <[email protected]>
…4124)

### What changes were proposed in this PR?

This PR removes the `WITH_R_SUPPORT` build argument and all R-related
installation logic from the Docker build configuration:

1. **Dockerfiles** (`computing-unit-master.dockerfile` and
`computing-unit-worker.dockerfile`):
   - Removed `ARG WITH_R_SUPPORT` build argument
   - Removed conditional R runtime dependencies installation
   - Removed R compilation and installation steps (R 4.3.3)
   - Removed R packages installation (arrow, coro, dplyr)
   - Removed `LD_LIBRARY_PATH` environment variable for R libraries
   - Removed `r-requirements.txt` copy in worker dockerfile
   - Simplified to Python-only dependencies

2. **GitHub Actions Workflow**
(`.github/workflows/build-and-push-images.yml`):
   - Removed `with_r_support` workflow input parameter
   - Removed `with_r_support` from job outputs and parameter passing
- Removed `WITH_R_SUPPORT` build args from both AMD64 and ARM64 build
steps
   - Removed R Support from build summary

### Any related issues, documentation, discussions?

Related to apache#4090

### How was this PR tested?

Verified Dockerfile & CI yml syntax are valid

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) via Claude
Code CLI
qzt168 and others added 15 commits December 11, 2025 01:49
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->
This PR introduces Python support for the `large_binary` attribute type,
enabling Python UDF operators to process data larger than 2 GB. Data is
offloaded to MinIO (S3), and the tuple retains only a pointer (URI).
This mirrors the existing Java LargeBinary implementation, ensuring
cross-language compatibility. (See apache#4067 for system diagram and apache#4111
for renaming)

## Key Features

### 1. MinIO/S3 Integration
- Utilizes the shared `texera-large-binaries` bucket.
- Implements lazy initialization of S3 clients and automatic bucket
creation.

### 2. Streaming I/O
- **`LargeBinaryOutputStream`:** Writes data to S3 using multipart
uploads (64KB chunks) to prevent blocking the main execution.
- **`LargeBinaryInputStream`:** Lazily downloads data only when the read
operation begins. Implements standard Python `io.IOBase`.

### 3. Tuple & Iceberg Compatibility
- `largebinary` instances are automatically serialized to URI strings
for Iceberg storage and Arrow tables.
- Uses a magic suffix (`__texera_large_binary_ptr`) to distinguish
pointers from standard strings.

### 4. Serialization
- Pointers are stored as strings with metadata (`texera_type:
LARGE_BINARY`). Auto-conversion ensures UDFs always see `largebinary`
instances, not raw strings.

## User API Usage

### 1. Creating & Writing (Output)
Use `LargeBinaryOutputStream` to stream large data into a new object.

```python
from pytexera import largebinary, LargeBinaryOutputStream

# Create a new handle
large_binary = largebinary()

# Stream data to S3
with LargeBinaryOutputStream(large_binary) as out:
    out.write(my_large_data_bytes)
    # Supports bytearray, bytes, etc.
```

### 2. Reading (Input)
Use `LargeBinaryInputStream` to read data back. It supports all standard
Python stream methods.

```python
from pytexera import LargeBinaryInputStream

with LargeBinaryInputStream(large_binary) as stream:
    # Option A: Read everything
    all_data = stream.read()

    # Option B: Chunked reading
    chunk = stream.read(1024)

    # Option C: Iteration
    for line in stream:
        process(line)
```

## Dependencies
- `boto3`: Required for S3 interactions.
- `StorageConfig`: Uses existing configuration for
endpoints/credentials.

## Future Direction
- Support for R UDF Operators
- Check apache#4123


### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion in the mailing list, please add the link.
-->
Design: apache#3787

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
Tested by running this workflow multiple times and check MinIO dashboard
to see whether six objects are created and deleted. Specify the file
scan operator's property to use any file bigger than 2GB.
[Large Binary
Python.json](https://github.com/user-attachments/files/24062982/Large.Binary.Python.json)

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
No.

---------

Signed-off-by: Chris <[email protected]>
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->
This PR proposes to remove the unused `retrieveDatasetSingleFile()`
endpoint (GET /api/dataset/file) which was allowing unauthenticated
downloads of non-downloadable datasets.

### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion in the mailing list, please add the link.
-->

The endpoint is introduced in the PR apache#2391 which adds dataset APIs to
the webserver. Then it is modified in the PR apache#2719 which aims to remove
the concept of `Environment`.

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
Manually tested:
<img width="690" height="404" alt="Screenshot 2025-12-27 at 1 15 21 AM"
src="https://github.com/user-attachments/assets/91bea787-d447-4abe-ad39-74eb581fa657"
/>

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
No.
### What changes were proposed in this PR?

* Added support in `parseTimestamp(fieldValue: Any)` for additional
`java.time` input types:

  * `LocalDateTime` → `Timestamp.valueOf(ldt)`
  * `Instant` → `Timestamp.from(inst)`
  * `OffsetDateTime` → `Timestamp.from(odt.toInstant)`
  * `ZonedDateTime` → `Timestamp.from(zdt.toInstant)`
  * `LocalDate` → `Timestamp.valueOf(ld.atStartOfDay())`

### Any related issues, documentation, discussions?

* N/A.

### How was this PR tested?

* Added unit tests covering the new `java.time` cases for timestamp
parsing:

* Positive cases for `LocalDateTime`, `Instant`, `OffsetDateTime`,
`ZonedDateTime`, and `LocalDate`
* Negative case verifying unsupported/invalid inputs throw
`AttributeTypeException`

### Was this PR authored or co-authored using generative AI tooling?

No
…#4138)

### What changes were proposed in this PR?
Currently the endpoint `/api/workflow/owner_user/?wid=X` returns all the
information about the user, including name, email, etc. But the frontend
only needs the name. This PR limits the information returned from the
backend to the user name only.

The main changes are as follow:

1. Change the endpoint name from `/api/workflow/owner_user` to
`/api/workflow/owner_name`
2. Change the SQL query to only return name as plain text.
3. Change related uses of the endpoint in the frontend to match the new
signature.
4. Added a new `WorkflowResourceDashboardUserSpec` to test this endpoint
and support future testing of related endpoints.

### Any related issues, documentation, discussions?
No


### How was this PR tested?
Tests:
<img width="1778" height="429" alt="image"
src="https://github.com/user-attachments/assets/81b91d73-7396-4d97-a53f-80d4ed5ca724"
/>


Manually tested:
<img width="1348" height="658" alt="image"
src="https://github.com/user-attachments/assets/1aeba8b4-343c-407f-b1ef-ffc5b4afee1a"
/>



### Was this PR authored or co-authored using generative AI tooling?
No
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->

This PR removes the automatic labeling of dynamically added input and
output ports. Previously, when users added new ports to operators, the
ports were automatically labeled with their port IDs which caused
inconsistencies (e.g., "input-1", "input-2", "output-1", "output-2").

**Changes:**
- Modified the `addPort` method in `WorkflowActionService` to set the
`displayName` to an empty string instead of using the `portID`
- This change affects all dynamically added ports (both input and
output)

**Before:**
<img width="441" height="198" alt="Screenshot 2025-12-29 at 4 07 24 PM"
src="https://github.com/user-attachments/assets/ced2f0a9-bdcf-4eff-9865-becf63e4466f"
/>

**After:**
<img width="270" height="181" alt="Screenshot 2025-12-29 at 4 07 31 PM"
src="https://github.com/user-attachments/assets/f98a27de-31a4-45f2-a75e-5b61fd658468"
/>

### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion in the mailing list, please add the link.
-->
This PR resolves apache#3894.

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
Tested manually.

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
No.
# What changes were proposed in this PR?
## Summary
To gather more user information for a better overview of users, this PR
introduces a new column `affiliation` to the `user` table. Now when a
user logins to the Texera for the first time (after getting approved to
REGULAR role), they will be prompted to enter their affiliation. The
answer will be recorded to the database and retrieved when admins enter
the admin dashboard.

## For Developers
Please do the following steps to incorporate with new changes:
- Apply sql/updates/16.sql to your local postgres instance
- Run
common/dao/src/main/scala/org/apache/texera/dao/JooqCodeGenerator.scala
to generate jooq tables

## Sample Video


https://github.com/user-attachments/assets/61e895db-8e30-4c59-8e98-fa527995b486


## Design of the Feature
When a user logins to the system for the first time, they will be
prompted to enter their affiliation after getting approved to REGULAR
role. The user can submit their affiliation and the frontend would send
this information to the backend to save in the database. Users can
choose to either enter the affiliation or skip the prompt and the system
would remember if the user has been prompted or not by checking the user
data from the database. Depending on the user's answer, the
`affiliation` column would have different data (more details are
included in "Backend Changes"). The system would only prompt once when
the user logins to the system for the first time and would never ask
again. To view the affiliation information, admins can go to the admin
dashboard to view the affiliations of users.

## Backend Changes
Introduced column `affiliation` to the `user` table. This column would
have three types of entry:
1. null: Indicates the user has never been prompted before. Next time
when the user logins to the system, they will be prompted to answer the
affiliation question.
2. emptry string "": the user has been prompted and did not answer the
affiliation question. This is to indicate that the user did not answer
this question (whether by hitting the skip button, ESC, X, or pressing
spaces outside of the prompt).
3. Actual value.
`16.sql` adds the column to the `user` table and ensures the existing
users' affiliation column is set to null. ddl file changed as well.

Added a `UserResource.scala` file to include the functions/apis related
to retrieving & updating User data. Currently it only contains functions
related to this PR, but in future other related functions can be added
to this file as well.

### Original `user` Schema
<img width="300" height="400" alt="image"
src="https://github.com/user-attachments/assets/5be89398-583e-486c-96af-448fffbbf2d5"
/>

### Proposed `user` Schema
<img width="300" height="400" alt="image"
src="https://github.com/user-attachments/assets/b1522ce0-f905-4865-a62d-813770eef3d7"
/>

## Frontend Changes
Added the prompt window to pop up in the main page after logging in. 
Added `affiliation` column to admin dashboard to cooperate with the new
data.
Changed files that contain class `User` as new attribute `affiliation`
is added to the class.

### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  5. If there is design documentation, please add the link.
  6. If there is a discussion in the mailing list, please add the link.
-->
Closes Issue apache#4118.

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
Manually tested.

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
Generated-by: ChatGPT 5.1 (bug fixing)
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is a work in progress, mark it as a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes. We 
    are following the Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, could you explain what has been changed?
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
4. Please consider writing helpful notes for better and faster reviews.
-->
This PR adds a feature that enables Texera to efficiently handle tables
with vast numbers of columns in the result panel.

This PR adds UI features that enable Texera to efficiently handle tables
with large numbers of columns in the result viewer and related data
preview components.

Specifically, this PR introduces:

1. Horizontal Column Pagination
    1. "Next Columns" and "Previous Columns" buttons have been added.
2. Columns are now loaded in column windows (configurable size, default
25).
3. Prevents UI freezing or overflow when dealing with tables containing
hundreds or thousands of columns.
2. Column Search Bar
1. A new search box allows users to filter or jump directly to column
names.
2. When a column is found, that column window is automatically loaded
and highlighted.
    3. Useful for wide schemas such as:
        1. large scientific datasets
        2. logs with hundreds of attributes
        3. denormalized tables or wide joins
3. Improvements to rendering performance
    1. The frontend now only renders the visible subset of columns.
    2. Reduces DOM load and improves React change detection speed.

### Any related issues, documentation, or discussions?
<!--
Please use this section to link to other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234.`
     or `Closes apache#1234`. If it is only related, mention the issue number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion on the mailing list, please add the link.
-->
Fixes: apache#3825
   
### How was this PR tested?
<!--
If tests were added, say so here. Or mention that if the PR 
is tested with existing test cases. Please include/update test cases
that
check the changes thoroughly, including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->

1. Ran a couple of CSV scan operators that produced a  wide output table
2. Clicked through column windows in both directions
3. Using search to jump to:
    - first column
    - random middle column
    - last column

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
No 

### New Layout:
<img width="1905" height="733" alt="image"
src="https://github.com/user-attachments/assets/18d14c2d-c134-422f-a5d1-2c826cf3a8e9"
/>

---------

Co-authored-by: Chen Li <[email protected]>
Co-authored-by: Chris <[email protected]>
Co-authored-by: Xinyuan Lin <[email protected]>
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->
This PR adds support for WebP and GIF image formats in the dataset file
preview feature.

**Changes:**
- Backend: 
- Added MIME type mappings for WebP (image/webp) and GIF (image/gif) in
the `retrieveDatasetSingleFile `endpoint
- Frontend: 
  - Added WEBP and GIF to MIME_TYPES constants
  - Added size limits: 5 MB for WebP, 10 MB for GIF

**Demonstration:**

<img width="667" height="423" alt="demo"
src="https://github.com/user-attachments/assets/86c73106-8f33-4121-a886-60996c013669"
/>

### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion in the mailing list, please add the link.
-->
apache#4119 

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
Manually tested, and existing automated tests passed.

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
No

Signed-off-by: Xuan Gu <[email protected]>
Co-authored-by: Chen Li <[email protected]>
…ache#4136)

### What changes were proposed in this PR?

* **DB / schema**

* Add `dataset_upload_session` to track multipart upload sessions,
including:

    * `(uid, did, file_path)` as the primary key
    * `upload_id` (**UNIQUE**), `physical_address`
    * **`num_parts_requested`** to enforce expected part count

* Add `dataset_upload_session_part` to track per-part completion for a
multipart upload:

    * `(upload_id, part_number)` as the primary key
* `etag` (`TEXT NOT NULL DEFAULT ''`) to persist per-part ETags for
finalize
    * `CHECK (part_number > 0)` for sanity
* `FOREIGN KEY (upload_id) REFERENCES dataset_upload_session(upload_id)
ON DELETE CASCADE`

* **Backend (`DatasetResource`)**

* Multipart upload API (server-side streaming to S3, LakeFS manages
multipart state):

    * `POST /dataset/multipart-upload?type=init`

      * Validates permissions and input.
      * Creates a LakeFS multipart upload session.
      * Inserts a DB session row including `num_parts_requested`.
* **Pre-creates placeholder rows** in `dataset_upload_session_part` for
part numbers `1..num_parts_requested` with `etag = ''` (enables
deterministic per-part locking and simple completeness checks).
* **Rejects init if a session already exists** for `(uid, did,
file_path)` (409 Conflict). Race is handled via PK/duplicate handling +
best-effort LakeFS abort for the losing initializer.

    * `POST /dataset/multipart-upload/part?filePath=...&partNumber=...`

      * Requires dataset write access and an existing upload session.
      * **Requires `Content-Length`** for streaming uploads.
      * Enforces `partNumber <= num_parts_requested`.
* **Per-part locking**: locks the `(upload_id, part_number)` row using
`SELECT … FOR UPDATE NOWAIT` to prevent concurrent uploads of the same
part.
* Uploads the part to S3 and **persists the returned ETag** into
`dataset_upload_session_part.etag` (upsert/overwrite for retries).
* Implements idempotency for retries by returning success if the ETag is
already present for that part.

    * `POST /dataset/multipart-upload?type=finish`

* Locks the session row using `SELECT … FOR UPDATE NOWAIT` to prevent
concurrent finalize/abort.

      * Validates completeness using DB state:

* Confirms the part table has `num_parts_requested` rows for the
`upload_id`.
* Confirms **all parts have non-empty ETags** (no missing parts).
* Optionally surfaces a bounded list of missing part numbers (without
relying on error-message asserts in tests).

* Fetches `(part_number, etag)` ordered by `part_number` from DB and
completes multipart upload via LakeFS.

* Deletes the DB session row; part rows are cleaned up via `ON DELETE
CASCADE`.

* **NOWAIT lock contention is handled** (mapped to “already being
finalized/aborted”, 409).

    * `POST /dataset/multipart-upload?type=abort`

      * Locks the session row using `SELECT … FOR UPDATE NOWAIT`.
* Aborts the multipart upload via LakeFS and deletes the DB session row
(parts cascade-delete).
      * **NOWAIT lock contention is handled** similarly to `finish`.

* Access control and dataset permissions remain enforced on all
endpoints.

* **Frontend service (`dataset.service.ts`)**

* `multipartUpload(...)` updated to reflect the server flow and return
values (ETag persistence is server-side; frontend does not need to track
ETags).

* **Frontend component (`dataset-detail.component.ts`)**

  * Uses the same init/part/finish flow.
  * Abort triggers backend `type=abort` to clean up the upload session.

---

### Any related issues, documentation, discussions?

Closes apache#4110

---

### How was this PR tested?

* **Unit tests added/updated** (multipart upload spec):

* Init validation (invalid numParts, invalid filePath, permission
denied).
* Upload part validation (missing/invalid Content-Length, partNumber
bounds, minimum size enforcement for non-final parts).
* **Per-part lock behavior** under contention (no concurrent streams for
the same part; deterministic assertions).
  * Finish/abort locking behavior (NOWAIT contention returns 409).
* Successful end-to-end path (init → upload parts → finish) with DB
cleanup assertions.
* **Integrity checks**: positive + negative SHA-256 tests by downloading
the finalized object and verifying it matches (or does not match) the
expected concatenated bytes.

* Manual testing via the dataset detail page (single and multiple
uploads), verified:

  * Progress, speed, and ETA updates.
  * Abort behavior (UI state + DB session cleanup).
* Successful completion path (all expected parts uploaded, LakeFS object
present, dataset version creation works).

---

### Was this PR authored or co-authored using generative AI tooling?

GPT partial use.

---------

Co-authored-by: Chen Li <[email protected]>
…access on shared workflows (apache#4143)

<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->

This PR fixes a permission issue where users with READ access to a
workflow could not revoke their own access.

**Changes:**
- Updated `revokeAccess()` method in `WorkflowAccessResource.scala` to
allow users to revoke their own access regardless of privilege level
(READ or WRITE).
- Added owner protection which prevents workflow owners from revoking
their own access to avoid orphaned workflows.
- Added test cases for the `revokeAccess()` method in
`WorkflowAccessResourceSpec.scala`.

**Before:**
- Backend requires WRITE privilege for self-revocation.
- READ users received error when revoking their own access.

**After:**
- READ users can revoke their own access to a shared workflow (leave
shared workflows).
- Owners cannot revoke their own access (prevent orphaned workflows).

**Demo:**


https://github.com/user-attachments/assets/4fa57eb0-9218-4715-bf8d-aec26f039174

### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion in the mailing list, please add the link.
-->

Fixes apache#4141.

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->

Run `sbt "WorkflowExecutionService/testOnly
*WorkflowAccessResourceSpec"`

**The test cases cover the following scenarios:**
- Users with WRITE access can revoke other users' access.
- Users with READ access cannot revoke other users' access.
- Users can revoke their own access regardless of access level.
- Owner's access cannot be revoked by others.
- Owner cannot revoke their own access.
- Error handling for non-existing users.
- Revoking access does not affect other users' access level.
- Revoke access of a user who does not have access.

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->

No.
…le.MIN_VALUE (apache#4145)

### What changes were proposed in this PR?

This PR fixes the value returned by `minValue` for
`AttributeType.DOUBLE` in `AggregationOperation.scala`.

Previously, the code used `Double.MIN_VALUE`, which is the smallest
positive non-zero double, not the most-negative value.


https://github.com/apache/texera/blob/07c35d004a1185e4098b56591166b62cc9ab4856/common/workflow-operator/src/main/scala/org/apache/amber/operator/aggregate/AggregationOperation.scala#L335-L345

The fix replaces `Double.MIN_VALUE` with `Double.NEGATIVE_INFINITY` and
add new testing.

### Any related issues, documentation, discussions?

Closes apache#4144  
Related discussion: apache#4049 (Clarify minValue intent in
AggregationOperation)

### How was this PR tested?

- AggregateOpSpec.scala
- AttributeTypeUtilsSpec.scala
- Frontend Manual Test

### Was this PR authored or co-authored using generative AI tooling?
No

---------

Co-authored-by: Xiaozhen Liu <[email protected]>
…numbers for mappings, and moved separate migration database into texera_db
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build ci changes related to CI common ddl-change Changes to the TexeraDB DDL dependencies Pull requests that update a dependency file docs Changes related to documentations engine frontend Changes related to the frontend GUI gui python service

Projects

None yet

Development

Successfully merging this pull request may close these issues.