Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning for large datasets on CPU and reference to tabpfn-client API #245

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Krishnadubey1008
Copy link

This PR adresses issue #192

Changes

  • Added a check in the fit method to determine if the code is running on CPU and if the dataset size exceeds 1000 samples.
  • Displayed a warning message to inform users of potential performance issues and suggested using the tabpfn-client API or a GPU.

@CLAassistant
Copy link

CLAassistant commented Mar 18, 2025

CLA assistant check
All committers have signed the CLA.

@noahho
Copy link
Collaborator

noahho commented Mar 18, 2025

Hi! Please sign the CLA to make your PR useable in our package! :-)

@noahho noahho requested a review from Copilot March 18, 2025 12:42

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a runtime warning for users when running the fit method on a CPU with a large dataset, suggesting the use of a GPU or the tabpfn-client API to improve performance.

  • Added a conditional check for CPU usage and dataset size in the fit method.
  • Displays a warning message with a reference to the tabpfn-client API.
Comments suppressed due to low confidence (2)

src/tabpfn/regressor.py:417

  • Consider adding a 'stacklevel' argument to warnings.warn (e.g., stacklevel=2) so that the warning message points to the caller's line, improving traceability for the user.
warnings.warn(

src/tabpfn/regressor.py:416

  • Please add corresponding unit tests to validate the warning behavior when running on CPU with a large dataset.
if self.device == 'cpu' and X.shape[0] > 1000:
@Krishnadubey1008 Krishnadubey1008 force-pushed the larger-datasets-warning branch from 0cda59a to 04d1f59 Compare March 18, 2025 23:41
@Krishnadubey1008
Copy link
Author

@noahho earlier I had unverified email connected to git commits now I have fixed it and also signed CLA. Please Review the PR

@Krishnadubey1008
Copy link
Author

@noahho Please tell me according to the copilot overview should i need to add unit tests(at test_regressor_interface.py) to validate the warning behavior when running on CPU with a large dataset.

def test_cpu_large_dataset_runtime_warning():
    model = TabPFNRegressor(device='cpu')
    X_large = np.random.rand(1001, 10)
    y_large = np.random.rand(1001)

    with pytest.warns(RuntimeWarning, match="Running on CPU with large dataset"):
        model.fit(X_large, y_large)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants