Skip to content

Conversation

@morgsmss7
Copy link

@morgsmss7 morgsmss7 commented Feb 24, 2020

Reference Issues/PRs

Fixes Issue 16370 in scikit-learn. Also see Issue 2 in tealeaf.

What does this implement/fix? Explain your changes.

This PR adds simulations and plots that show how split criteria compare on several nonlinear regression simulations including sinusoidal, logarithmic, multiplicative, and independence. There is not much information on scikit-learn's documentation about how to go about choosing which to use (mse, mae, or friedman mse) for the criterion parameter. This example demonstrates how to go about finding differences and shows that it may not always matter which criterion is chosen.

Any other comments?

This PR in sklearn will include these files:

sklearn/datasets/tests/test_samples_generator.py 
sklearn/datasets/samples_generator.py
sklearn/datasets/__init__.py
examples/ensemble/plot_random_forest_regression_criteria_comparison.py
examples/datasets/plot_nonlinear_regression_datasets.py 
doc/modules/classes.rst 
doc/datasets/index.rst

The other files that were changed for Vivek's PR will not be changed in sklearn.

@morgsmss7
Copy link
Author

morgsmss7 commented Feb 24, 2020

Split Criteria Comparison Experiment and Results

For each simulation type (Logarithmic, Sine, Square, Multiplicative, Independence)
And For each split criterion (mse, mae, friedman mse)

  1. Generate 30 noisy training sets (10 dimensions, 50 samples)
  2. Generate 1 noisy test set (10 dimensions, 1000 samples)
  3. Train and evaluate using mse for all 30 training sets with random forests (500 trees) varying number of samples ( np.arange(5, 51, 3) )

nonlinearSimPlots
splitter_comparison_02_17

@morgsmss7 morgsmss7 requested review from eigenvivek, j1c and jheiko1 March 8, 2020 17:22
Copy link

@eigenvivek eigenvivek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks great! I made some minor comments asking for stylistic changes to the comments.

@eigenvivek eigenvivek changed the title Nonlinear regression simulations for Existing Split Criteria Nonlinear regression simulations for existing split criteria Mar 17, 2020
@j1c
Copy link
Member

j1c commented Mar 30, 2020

This looks great. I think you wanted to make a PR to real sklearn right? My only concern is that the current NDD master has bunch of changes from other people that would be merged in as well. For this, I think the best course of action is to make a new branch, fetch the latest sklearn, add in these two examples along with the data generation code. Then make the PR from that branch.

@morgsmss7 morgsmss7 dismissed eigenvivek’s stale review May 14, 2020 19:30

These changes have been made in both the real sklearn version and the NDD version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants