Skip to content

OverflowError: string longer than 2147483647 bytes for large datasets #1223

@israel-cj

Description

@israel-cj

Hi, I want to upload a large dataset 2.7GB and 4.6M features but I get the next error:

Traceback (most recent call last): File "publish_dataset.py", line 62, in <module> publish_dataset() File "publish_dataset.py", line 49, in publish_dataset openml_dataset.publish() File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\openml\base.py", line 135, in publish response_text = openml._api_calls._perform_api_call( File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\openml\_api_calls.py", line 118, in _perform_api_call response = _read_url_files(url, data=data, file_elements=file_elements) File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\openml\_api_calls.py", line 325, in _read_url_files return _send_request( File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\openml\_api_calls.py", line 383, in _send_request response = session.post(url, data=data, files=files, headers=_HEADERS) File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\requests\sessions.py", line 637, in post return self.request("POST", url, data=data, json=json, **kwargs) File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\requests\sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\requests\sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\requests\adapters.py", line 667, in send resp = conn.urlopen( File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\urllib3\connectionpool.py", line 789, in urlopen response = self._make_request( File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\urllib3\connectionpool.py", line 495, in _make_request conn.request( File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\urllib3\connection.py", line 455, in request self.send(chunk) File "C:\Users\20210595\.conda\envs\tableshift\lib\http\client.py", line 972, in send self.sock.sendall(data) File "C:\Users\20210595\.conda\envs\tableshift\lib\ssl.py", line 1237, in sendall v = self.send(byte_view[count:]) File "C:\Users\20210595\.conda\envs\tableshift\lib\ssl.py", line 1206, in send return self._sslobj.write(data) OverflowError: string longer than 2147483647 bytes

What is the constraint for the size of datasets in OpenML? I could not find it (maybe I did not look long enough)
Is there a way to avoid such limitations?
Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions