[SPARK-51223][CONNECT] Always use an ephemeral port for local connect #49965

Kimahriman · 2025-02-15T15:17:50Z

What changes were proposed in this pull request?

Always use an ephemeral port when automatically starting a local connect server. This prevents port conflicts when starting a connect server purely for the purpose of the local Spark environment, both with --remote local as well as --conf spark.api.mode=connect within PySpark.

Why are the changes needed?

Trying to launch multiple PySpark sessions with either --remote local or spark.api.mode=connect fails with port conflicts. Additionally, using a cluster deploy mode with PySpark would lead to a port conflicts if two drivers start on the same node.

Does this PR introduce any user-facing change?

Yes, allows you to run multiple automatically launched local Spark Connect servers without manually specifying ports for each one.

How was this patch tested?

Existing UTs which were already using ephemeral ports.

Also manually run two simultaneous pyspark --remote local and two spark-submit --conf spark.api.mode=connect test.py

Was this patch authored or co-authored using generative AI tooling?

No

Kimahriman · 2025-02-15T15:20:21Z

@HyukjinKwon @hvanhovell

The Scala local remote setup works a lot different so I wasn't sure what if anything could be done with that. Currently if you start two spark-shell --remote local, the second one will just silently not create a new Spark Connect server and instead connect to the first one. Not sure if this is intentional or not. I'm also not sure how the connect API mode is supposed to work in Scala with a cluster deploy mode, like in Yarn, since it uses the start-connect-server script which I don't think exists in the uploaded artifacts? Not totally sure on that one.

HyukjinKwon

I quite like this change but can we do this in 4.1? I am actually thinking about replacing Py4J server to Spark Connect server - when we happen to land that change, then this change alone makes much more sense without thinking about Spark Connect case.

HyukjinKwon · 2025-02-19T00:01:46Z

and I also think about using Unix Domain Socket instead - I have a draft here: https://github.com/apache/spark/compare/master...HyukjinKwon:spark:SPARK-51156-2?expand=1 but this will likely happen in 4.1

HyukjinKwon · 2025-02-19T00:03:20Z

@Kimahriman actually are you interested in picking https://github.com/apache/spark/compare/master...HyukjinKwon:spark:SPARK-51156-2?expand=1 up and opening a PR? I am currently stuck in some work so couldn't have time to get working on it ..

Kimahriman · 2025-02-19T03:06:31Z

Don't have a huge preference about including this. Mostly was just thinking about the cluster deploy mode having port conflicts would be a not fun thing users of the new Connect API mode enabled by default distro users might try.

Don't have time right now to look into the UDS stuff, but I have been playing around with a slightly different approach to the securing local connections setup with a more generic config that could be used for remote authentication too if that would help at all

HyukjinKwon · 2025-02-19T03:55:53Z

oh if you have another approach, please go ahead and open a PR 👍

Kimahriman · 2025-04-17T11:14:01Z

Closing in favor of UDS work

Always use an ephemeral port for local connect

b4b151e

github-actions bot added SQL PYTHON CONNECT labels Feb 15, 2025

HyukjinKwon reviewed Feb 18, 2025

View reviewed changes

Kimahriman closed this Apr 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-51223][CONNECT] Always use an ephemeral port for local connect #49965

[SPARK-51223][CONNECT] Always use an ephemeral port for local connect #49965

Uh oh!

Kimahriman commented Feb 15, 2025 •

edited

Loading

Uh oh!

Kimahriman commented Feb 15, 2025

Uh oh!

HyukjinKwon left a comment

Uh oh!

HyukjinKwon commented Feb 19, 2025

Uh oh!

HyukjinKwon commented Feb 19, 2025

Uh oh!

Kimahriman commented Feb 19, 2025

Uh oh!

HyukjinKwon commented Feb 19, 2025

Uh oh!

Kimahriman commented Apr 17, 2025

Uh oh!

Uh oh!

[SPARK-51223][CONNECT] Always use an ephemeral port for local connect #49965

[SPARK-51223][CONNECT] Always use an ephemeral port for local connect #49965

Uh oh!

Conversation

Kimahriman commented Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Kimahriman commented Feb 15, 2025

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Feb 19, 2025

Uh oh!

HyukjinKwon commented Feb 19, 2025

Uh oh!

Kimahriman commented Feb 19, 2025

Uh oh!

HyukjinKwon commented Feb 19, 2025

Uh oh!

Kimahriman commented Apr 17, 2025

Uh oh!

Uh oh!

Kimahriman commented Feb 15, 2025 •

edited

Loading