Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on NumPy 2.x support #488

Closed
davidmezzetti opened this issue Dec 2, 2024 · 23 comments
Closed

Question on NumPy 2.x support #488

davidmezzetti opened this issue Dec 2, 2024 · 23 comments
Assignees
Labels
question Further information is requested

Comments

@davidmezzetti
Copy link

I've found a couple discussions related to NumPy 1.x vs 2.x support (#136, #283, #420). It seems like the project is going to pin to NumPy < 2.0 for Python 3.9 - 3.12 and NumPy >= 2.0 for 3.13.

Is there something in particular causing the project to depend on NumPy 1.x for Python 3.9 - 3.12? Could the project instead support either for Python 3.9 - 3.12? Currently installing Docling on those platforms is uninstalling NumPy 2.x and downgrading to 1.x. That seems extreme unless there is something specifically not supported.

@davidmezzetti davidmezzetti added the question Further information is requested label Dec 2, 2024
@dolfim-ibm
Copy link
Contributor

I think it could be possible. But we will have to still limit python 3.9 to <2.1.0.

@davidmezzetti
Copy link
Author

I force upgraded my environment locally to 2.x and docling seems to work with the limited cases I tested.

My usual strategy is to set the requirement to the lowest version I know the lowest version of Python will work with. For example, if Docling works with NumPy 1.24.4, then you could say >= 1.24.4. The dependency manager will figure it out from there. The last version of NumPy that supports 3.9 is 2.0.2, so there should be no need to limit it to < 2.1.0 as the dependency manager won't see that version.

If there is an incompatibility with a newer version, you could also limit it on the upper side too. Keeping the requirements as loose as possible helps limit these dependency loops.

@dolfim-ibm dolfim-ibm self-assigned this Dec 2, 2024
@dolfim-ibm
Copy link
Contributor

We have to update a few packages with the proper pinning, but I think it is doable. See here DS4SD/docling-ibm-models#57

@davidmezzetti
Copy link
Author

Fantastic. This helps limit the footprint in larger environments!

@simjak
Copy link

simjak commented Dec 3, 2024

I faced some issues with numpy 2.x

  File ".venv/lib/python3.12/site-packages/torchvision/transforms/functional.py", line 167, in to_tensor
    img = torch.from_numpy(np.array(pic, mode_to_nptype.get(pic.mode, np.uint8), copy=True))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Numpy is not available
(.venv) 09:48:40 docling_test [simonas/base-options](+5) pip install numpy                                                                           11467ms
Requirement already satisfied: numpy in ./.venv/lib/python3.12/site-packages (2.1.3)```

@simjak
Copy link

simjak commented Dec 3, 2024

Downgrading to 1.26.4 solved the issue.

uv add numpy==1.26.4
                                                                  
Resolved 98 packages in 562ms
Prepared 1 package in 1ms
Uninstalled 1 package in 70ms
Installed 1 package in 25ms
 - numpy==2.1.3
 + numpy==1.26.4

@dolfim-ibm
Copy link
Contributor

I was just testing on a fresh install, and I cannot reproduce the error above.

$ python3.12 -m venv venv
$ source ./venv/bin/activate

$ pip install docling

$ docling --version
Docling version: 2.8.1
Docling Core version: 2.6.1
Docling IBM Models version: 2.0.7
Docling Parse version: 2.1.2


$ pip freeze|grep numpy
numpy==2.1.3

# launch a convert
$ docling testdoc.pdf
# ...all good here

@simjak can you please share a few more details? which OS? which docling version? which python version?

@dolfim-ibm
Copy link
Contributor

dolfim-ibm commented Dec 3, 2024

@simjak which torch version did you install?

$ pip freeze|grep torch
torch==2.5.1
torchvision==0.20.1

@simjak
Copy link

simjak commented Dec 3, 2024

@dolfim-ibm

torch==2.2.2
torchvision==0.17.2

python 3.12, docling forked main branch, MacOS Ventura 13.0.1

@dolfim-ibm
Copy link
Contributor

I guess this is the newest version one can use on Intel Mac, right? So we might consider some conditional markers in that case.

@simjak
Copy link

simjak commented Dec 3, 2024

I have M1, will test with the newest with a new torch version
Error on installing 2.5.1 torch:

error: distribution torch==2.5.1 @ registry+https://pypi.org/simple can't be installed because it doesn't have a source distribution or wheel for the current platform

UV:

dependencies = [
    "torch==2.5.1; sys_platform == 'darwin' or platform_machine == 'aarch64' or platform_machine == 'x86_64'",
    "torchvision==0.20.1; sys_platform == 'darwin' or platform_machine == 'aarch64' or platform_machine == 'x86_64'",
    "requests>=2.31.0",
    "certifi>=2024.7.4",
    "docling",
    "rapidocr-onnxruntime>=1.4.0",
    # "ocrmac>=1.0.0",
    # "numpy==1.26.4",
    # "docling>=2.8.1",
]

@cau-git
Copy link
Contributor

cau-git commented Dec 3, 2024

@simjak this error installing torch==2.5.1 seems odd to me, there is definitely a wheel for MacOS 11+, for python 3.12. Can you outline the changes you made on your fork? From your UV dependencies it looks like you use a pypi version of docling in it.

I just used uv pip install docling in a clean venv to install docling and I also get torch==2.5.1 by default.

@davidmezzetti
Copy link
Author

davidmezzetti commented Dec 3, 2024

There is nothing with this change that prevents one from limiting numpy to 1.x. It just gives the dependency manager more flexibility to work in more environments.

Looking at this error:

  File ".venv/lib/python3.12/site-packages/torchvision/transforms/functional.py", line 167, in to_tensor
    img = torch.from_numpy(np.array(pic, mode_to_nptype.get(pic.mode, np.uint8), copy=True))

It seems like this is a torch 2.2.2/torchvision 0.17.2 and numpy compatibility issue and nothing to do with docling. It looks like torch >=2.3 is when numpy 2.x support was added.

In general, I'd suggest only worrying about Docling's dependencies and not worrying about other dependencies for those projects. For example, you wouldn't want to get into the business of managing torch dependencies here unless it's specifically a requirement for this project. If docling works with torch 2.2.2/numpy 1.x AND torch 2.5.1/numpy 2.x, that should be naturally decided by the dependency manager not hardcoded into the project deps.

@simjak
Copy link

simjak commented Dec 3, 2024

@davidmezzetti agree
@cau-git I tried with 2.3, the latest which I can install with UV, Numpy failed 2.x

[project]
name = "docling-test"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"

dependencies = [
    "requests>=2.31.0",
    "certifi>=2024.7.4",
    "docling",
    "rapidocr-onnxruntime>=1.4.0",
    "torch<2.3.0",
    "torchvision<0.18.0",

    # "ocrmac>=1.0.0",
    # "numpy==1.26.4",
    # "docling>=2.8.1",
]

# https://docs.astral.sh/uv/guides/integration/pytorch/#using-a-pytorch-index
# https://github.com/astral-sh/uv/issues/8358#issuecomment-2424808369
[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true



[tool.uv.sources]
torch = { index = "pytorch-cpu", marker = "platform_system != 'Darwin'" }
torchvision = { index = "pytorch-cpu", marker = "platform_system != 'Darwin'" }
docling = { git = "https://github.com/simjak/docling", branch = "simonas/base-options" }

@simjak
Copy link

simjak commented Dec 3, 2024

@simjak this error installing torch==2.5.1 seems odd to me, there is definitely a wheel for MacOS 11+, for python 3.12. Can you outline the changes you made on your fork? From your UV dependencies it looks like you use a pypi version of docling in it.

I just used uv pip install docling in a clean venv to install docling and I also get torch==2.5.1 by default.

can you try this setup:

[project]
name = "docling-test"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"

dependencies = [
    "torch==2.5.1",
    "torchvision==0.20.1",
]

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

[tool.uv.sources]
torch = { index = "pytorch-cpu", marker = "platform_system != 'Darwin'" }
torchvision = { index = "pytorch-cpu", marker = "platform_system != 'Darwin'" }

I'm getting

uv sync                                           
Resolved 16 packages in 1.26s
error: Distribution `torch==2.5.1 @ registry+https://pypi.org/simple` can't be installed because it doesn't have a source distribution or wheel for the current platform

@davidmezzetti
Copy link
Author

@davidmezzetti agree @cau-git I tried with 2.3, the latest which I can install with UV, Numpy failed 2.x

[project]
name = "docling-test"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"

dependencies = [
    "requests>=2.31.0",
    "certifi>=2024.7.4",
    "docling",
    "rapidocr-onnxruntime>=1.4.0",
    "torch<2.3.0",
    "torchvision<0.18.0",

    # "ocrmac>=1.0.0",
    # "numpy==1.26.4",
    # "docling>=2.8.1",
]

# https://docs.astral.sh/uv/guides/integration/pytorch/#using-a-pytorch-index
# https://github.com/astral-sh/uv/issues/8358#issuecomment-2424808369
[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true



[tool.uv.sources]
torch = { index = "pytorch-cpu", marker = "platform_system != 'Darwin'" }
torchvision = { index = "pytorch-cpu", marker = "platform_system != 'Darwin'" }
docling = { git = "https://github.com/simjak/docling", branch = "simonas/base-options" }

What happens if you change to this:

     "torch>=2.3.0",
     "torchvision>=0.18.0",

@simjak
Copy link

simjak commented Dec 3, 2024

@davidmezzetti

Resolved 95 packages in 1ms
error: Distribution `torch==2.5.1 @ registry+https://pypi.org/simple` can't be installed because it doesn't have a source distribution or wheel for the current platform

@cau-git
Copy link
Contributor

cau-git commented Dec 3, 2024

@simjak I need to check again, on which platform are you? I am on a MacBook Pro M3, and macOS Sequoia 15.1.1, and all of this works fine.

@simjak
Copy link

simjak commented Dec 3, 2024

@simjak I need to check again, on which platform are you? I am on a MacBook Pro M3, and macOS Sequoia 15.1.1, and all of this works fine.

M1 13.0.1 Ventura

@simjak
Copy link

simjak commented Dec 3, 2024

Upgrading to Sequoia

@cau-git
Copy link
Contributor

cau-git commented Dec 6, 2024

@simjak did you see any change since?

@simjak
Copy link

simjak commented Dec 6, 2024

@simjak did you see any change since?

Using <2.3.0 :)

@cau-git
Copy link
Contributor

cau-git commented Dec 9, 2024

Closing this for now, until further issues are reported with this.

@cau-git cau-git closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants