-
Notifications
You must be signed in to change notification settings - Fork 858
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When trying to deepcopy an HfFileSystem with error cached in _repo_and_revision_exists_cache, the whole deepcopy will failed because an error object HfHubHTTPError do not implement __reduce__ex correctly.
We did not pass the response into constructor because this is a keyword argument without default value, therefore this will give us error
TypeError: HfHubHTTPError.__init__() missing 1 required keyword-only argument: 'response'
This will be fatal if we want to serialize HfFileSystem instance.
errors.py
class HfHubHTTPError(HTTPError, OSError):
def __init__(
self,
message: str,
*,
response: Response,
server_message: Optional[str] = None,
):
self.request_id = response.headers.get("x-request-id") or response.headers.get("X-Amzn-Trace-Id")
self.server_message = server_message
self.response = response
self.request = response.request
super().__init__(message)
def __reduce_ex__(self, protocol):
"""Fix pickling of Exception subclass with kwargs. We need to override __reduce_ex__ of the parent class"""
return (self.__class__, (str(self),), {"response": self.response, "server_message": self.server_message})Reproduction
To minimize repro script, we just deepcopy the _repo_and_revision_exists_cache like HfFileSystem.
# test_hf_cloudpickle_bug.py
from copy import deepcopy
from huggingface_hub import HfFileSystem
from huggingface_hub.utils import RepositoryNotFoundError
from requests import Response, Request
# Mock an error
resp = Response()
resp.status_code = 404
resp.url = "https://huggingface.co/api/datasets/rotten_tomatoes/test.parquet"
resp.request = Request("GET", "https://huggingface.co/api/datasets/rotten_tomatoes/test.parquet")
resp._content = b'{"error": "Repository Not Found"}'
err = RepositoryNotFoundError(
"404 Client Error. Repository Not Found.",
response=resp,
server_message="Repository Not Found",
)
fs = HfFileSystem()
# Simulate the error in cache
fs._repo_and_revision_exists_cache = {
("dataset", "rotten_tomatoes/test.parquet", None): (False, err),
}
# Now try to deepcopy the cache: this is exactly what _get_instance_state does.
cache_copy = deepcopy(fs._repo_and_revision_exists_cache) # <- expected to fail on buggy behavior
Logs
❯ python test_hf.py (myenv)
Traceback (most recent call last):
File "/Users/youchenglin/ray/test_hf.py", line 30, in <module>
cache_copy = deepcopy(fs._repo_and_revision_exists_cache) # <- expected to fail on buggy behavior
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 211, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 211, in <listcomp>
y = [deepcopy(a, memo) for a in x]
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 265, in _reconstruct
y = func(*args)
TypeError: HfHubHTTPError.__init__() missing 1 required keyword-only argument: 'response'System info
- huggingface_hub version: 1.1.5
- python 3.10.19hanouticelina
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working