Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install non-pure package dependencies with micropip #179

Open
juntyr opened this issue Jan 25, 2025 · 9 comments
Open

Install non-pure package dependencies with micropip #179

juntyr opened this issue Jan 25, 2025 · 9 comments

Comments

@juntyr
Copy link

juntyr commented Jan 25, 2025

I'd like to experiment with the following setup:

  • pure Python packages without patches (where installing with micropip works) are removed from the Pyodide distro

  • for packages inside the Pyodide distro, micropip refuses to install a separate version, even if the package isn't loaded yet, i.e. if the package is requested as a dependency it's either the Pyodide-built version or nothing (even for pure wheels, since they [may] contain patches)

  • micropip is used to install the dependencies (expect for unpackaged dynlib dependencies) of Pyodide-built packages (using the metadata as with any other wheel), so pure and unpatched dependencies are fetched from PyPi and non-pure or patches dependencies come from Pyodide as described above

Would it be possible to hack this together, if so could you give some pointers for the micropip side?

My motivation is that I want to explore allowing users to come with a requirements file and installing that. Any non-pure or patched packages would be "anchors" that cannot be overridden, but any pure package with overlapping constraints could be installed (and since pure packages would no longer be part of the Pyodide distro this group would be much larger).

Thanks for your help!

@hoodmane
Copy link
Member

I'm having trouble understanding your proposal. Can you highlight the ways in which what you are suggesting is distinct from the current behavior?

@hoodmane
Copy link
Member

micropip is used to install the dependencies (expect for unpackaged dynlib dependencies) of Pyodide-built packages (using the metadata as with any other wheel), so pure and unpatched dependencies are fetched from PyPi and non-pure or patches dependencies come from Pyodide as described above

Are you asking for a Pyodide to depend on a package from pypi but not on a locked version of it?

@juntyr
Copy link
Author

juntyr commented Jan 26, 2025

I'm having trouble understanding your proposal. Can you highlight the ways in which what you are suggesting is distinct from the current behavior?

Sorry for being unclear. The two changes that I would like to experiment with are:

  1. Use micropip resolution when installing the dependencies of a wheel in the Pyodide distribution (so when you run e.g. await micropip.install("xarray"), after calling loadPackage("xarray"), look at its dist metadata to get the requirments and install them with micropip as well. One effect is that dependencies are now only as-locked as specified in the source wheel

  2. When micropip installs a package that is included in the Pyodide distribution, it currently checks if the version is compatible, and if not falls back to PyPi. I'd like it to fail if the version is incompatible (even if the package isn't loaded yet). If I have e.g. a patched version of fsspec in my distro and some package I install depends on a different version of fsspec, I want that install to fail instead of silently installing an unpatched version of fsspec from PyPi.

The combined effect would be a relaxation and hardening of dependency resolution: Pyodide-built packages would be hard requirements, but their PyPi dependencies would have the same relaxed bounds (that are specified e.g. in their pyproject.toml) that you'd get when installing them from pip.

@juntyr
Copy link
Author

juntyr commented Jan 26, 2025

My ask here is mostly for pointers as to how I could hack this behaviour together. I first want to see how this would feel in practice before proposing it as a change to upstream Pyodide + micropip.

@juntyr
Copy link
Author

juntyr commented Jan 26, 2025

I think I could change the following

def _add_requirement_from_pyodide_lock(self, req: Requirement) -> bool:
"""
Find requirement from pyodide-lock.json. If the requirement is found,
add it to the package list and return True. Otherwise, return False.
"""
if req.name in REPODATA_PACKAGES and req.specifier.contains(
REPODATA_PACKAGES[req.name]["version"], prereleases=True
):
version = REPODATA_PACKAGES[req.name]["version"]
self.pyodide_packages.append(
PackageMetadata(name=req.name, version=str(version), source="pyodide")
)
return True
return False

to

  1. raise an exception if req.name in REPODATA_PACKAGES and not req.specifier.contains(..)
  2. maybe fetch the *.whl.metadata file to get the Requires-Dist metadata ? (unsure)
  3. reuse some of the following code to add the requirments to the transaction

if self.deps:
# Case 1) If metadata file is available,
# we can gather requirements without waiting for the wheel to be downloaded.
if wheel.pep658_metadata_available():
try:
await wheel.download_pep658_metadata(self.fetch_kwargs)
except OSError:
# If something goes wrong while downloading the metadata,
# we have to wait for the wheel to be downloaded.
await wheel_download_task
await asyncio.gather(
self.gather_requirements(wheel.requires(extras)),
wheel_download_task,
)
# Case 2) If metadata file is not available,
# we have to wait for the wheel to be downloaded.
else:
await wheel_download_task
await self.gather_requirements(wheel.requires(extras))

  1. load the pyodide package with loadPackage, only automatically load its dynlib dependencies (e.g. openssl) (everything else should be handled by micropip)

@ryanking13
Copy link
Member

Thanks for the suggestion. There was a similar discussion about this (pyodide/pyodide#2580). I just noticed that you were also in the loop, so probably you'll be already aware of that.

Use micropip resolution when installing the dependencies of a wheel in the Pyodide distribution (so when you run e.g. await micropip.install("xarray"), after calling loadPackage("xarray"), look at its dist metadata to get the requirments and install them with micropip as well.

Because dependency resolution is a heavy process, we rely on the lock file to speed up the package installation, and micropip uses a very heuristic dependency resolver that is fast but inaccurate.

I am +1 to make it more flexible, so that users can control the dependency resolution process. There is a proposal about that #112. I am still trying to figure out what kind of interface will satisfy all users without being too complicated, but if you have any thoughts on this, please feel free to comment.

My ask here is mostly for pointers as to how I could hack this behaviour together. I first want to see how this would feel in practice before proposing it as a change to upstream Pyodide + micropip.

Currently, I think one possible option to satisfy your need without hacking micropip is to,

  1. Host your personal package index, and publish Pyodide-built package into that index.
  2. Remove all packages (except for a few packages like micropip) from pyodide-lock.json, so that micropip does not rely on the lockfile.
  3. call `micropip.set_index_urls("...your personal package index...") so that micropip looks for packages that are only in the package index.

@juntyr
Copy link
Author

juntyr commented Jan 27, 2025

  1. Host your personal package index, and publish Pyodide-built package into that index.

That's a good idea actually, I hadn't thought of it before.

Do you perhaps have a link to share for how to set one up? Given that this index would only need to host the few patched or binary wheels, it wouldn't need to be anything fancy, just enough for the (basic?) JSON API (?, I'm in unfamiliar territory now).

I would then run micropip with my index first, then PyPi, and then the stripped Pyodide, and I might still patch micropip so that if a package exists in my custom index by name (but not with the right version), it fails instead of checking PyPi.

Does micropip already check that packages from an index have a compatible ABI (to deal with the ~yearly ABI breakage from Pyodide)?

@ryanking13
Copy link
Member

ryanking13 commented Jan 27, 2025

Do you perhaps have a link to share for how to set one up? Given that this index would only need to host the few patched or binary wheels, it wouldn't need to be anything fancy, just enough for the (basic?) JSON API (?, I'm in unfamiliar territory now).

  1. devpi: is one possible open-source and PyPI-compatible package index. I am not sure it allows you to host pyodide wheels, but since it is open source, patching it will not be hard.

  2. Anaconda.org: allows you to host Pyodide wheels. We are actually trying to use it to provide Pyodide wheels, but sadly it does not allow cross-origin requests (see micropip.install("<...>") does not uphold custom PyPI indices #101 (comment)), so you'll need to setup a CORS proxy to use it.

  3. pypa/warehouse: it's the same code that runs PyPI, but I heard that it is quite complex to setup.

  4. If you don't want to setup a server: Micropip supports HTML Simple API, so you can set up a static webpage (e.g. GitHub Pages) and host HTML files (API response) and wheels there. I recently saw that llama-cpp-python uses that approach.

@juntyr
Copy link
Author

juntyr commented Jan 27, 2025

4. If you don't want to setup a server: Micropip supports HTML Simple API, so you can set up a static webpage (e.g. GitHub Actions) and host HTML files (API response) and wheels there. I recently saw that [llama-cpp-python](https://abetlen.github.io/llama-cpp-python/whl/cpu/llama-cpp-python/) uses that approach.

This was a fantastic suggestion! At pyodide built time, I now create a simple JSON API index that just refers to the existing Pyodide-built wheels. Installing from it works well, a great start!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants