Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSHCluster - localhost to localhost - The command line is too long #8138

Open
jcus0006 opened this issue Aug 28, 2023 · 3 comments · May be fixed by #8994
Open

SSHCluster - localhost to localhost - The command line is too long #8138

jcus0006 opened this issue Aug 28, 2023 · 3 comments · May be fixed by #8994
Labels
bug Something is broken

Comments

@jcus0006
Copy link

jcus0006 commented Aug 28, 2023

I am trying to run a multi-node, multi-host SSH cluster on Windows. I simplified it, for now, attempting to run both the scheduler and the workers on localhost. Based on the Dask documentation instructions, I setup public key SSH access, in this case, from localhost to localhost. Encountered this issue and fixed it by the recommended fix in the same link. Then encountered the next issue, which has to do with trying to run a command which is over the character limit imposed by Windows.

set_env = "set DASK_INTERNAL_INHERIT_CONFIG={} &&".format(
                    dask.config.serialize(dask.config.global_config)
                )

The above line from the "distributed\deploy\ssh.py", generates a string of 9000+ chars. Which seems to be a problem.

The next line of code creates the command "cmd", and the following line starts the process:
self.proc = await self.connection.create_process(cmd)

and the below line extracts this error - 'The command line is too long.\r\n':
line = await self.proc.stderr.readline()

In an attempt to reduce the size of the serialized config, I have tried removing the Kubernetes key from the dask.config.global_config, and re-adding it with an empty dict as value, thinking I should not need Kubernetes, since I am using the SSHCluster and not KubeCluster. When serializing the config, the length is less than the limit, and sure enough, I seem to get past the 'The command line is too long' error but get stuck with the below error instead:

2023-08-28 21:10:06,883 - distributed.deploy.ssh - INFO - raise JSONDecodeError("Expecting value", s, err.value) from None
2023-08-28 21:10:06,883 - distributed.deploy.ssh - INFO - json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I am using Windows right now, and am considering installing a Linux VM to try this out. Was wondering if anyone has had this issue with Windows and what can be done to workaround it?

This is the code I am using in the main module:

import dask
from dask.distributed import Client, SSHCluster
cluster = SSHCluster(["localhost", "localhost"], 
                        connect_options={"known_hosts": None},
                        worker_options={"n_workers": 10},
                        scheduler_options={"port": 0, "dashboard_address": ":8797"})
client = Client(cluster)

Environment:

  • Dask version 2023.8.1
  • Python version 3.11.2
  • OS: Windows 10
  • Installed via Pip
@onurarpacioglu
Copy link

Hello,

I encountered the same issue. I've found the fix for the JSON issue and also found a way to reduce the size of the command by some amount. With the below changes (3 lines with the comments), I no longer see the issue:

    cmd = " ".join(
        [
            #set_env, -> Removed this to shorten cmd; it is executed before cmd to preserve functionality
            self.remote_python,
            "-m",
            "distributed.cli.dask_spec",
            "--spec",
            '"%s"' % dumps({"cls": "distributed.Scheduler", "opts": self.kwargs}).replace('"', '\\"'), # exchanged places of ' and " at the beginning to fix the json issue
        ]
    )
    await self.connection.run(set_env) # added this due to removal above
    self.proc = await self.connection.create_process(cmd)

Thanks

@jcus0006
Copy link
Author

jcus0006 commented Dec 9, 2024

hi @onurarpacioglu , I ended up using Linux, but thanks for replying with your workaround. Hopefully, it could be useful to someone else.

@holtvogt
Copy link

Receiving the same issue for Windows as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants