-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Describe the issue
In the albeit unlikely (but not impossible) situation when the root of a tree has one leaf and one branch child, which happen to both have the same index in leaf_*
and node_*
respectively, the check to make sure that this isn't just a tree with just one node in line 312 of tree_ensemble_attribute.h will incorrectly decide that the root is a leaf rather than a branch.
I think it would be enough to change
nodes_falsenodeids[tree_root_size_t] == nodes_truenodeids[tree_root_size_t],
to something like
(nodes_falsenodeids[tree_root_size_t] == nodes_truenodeids[tree_root_size_t]) &&
nodes_trueleafs[tree_root_size_t] == true && nodes_falseleafs[tree_root_size_t] == true,
or even
(nodes_falsenodeids[tree_root_size_t] == nodes_truenodeids[tree_root_size_t]) &&
(nodes_trueleafs[tree_root_size_t] == nodes_falseleafs[tree_root_size_t]),
to ensure that in addition to the indices being the same, they are in fact indices to the same vector.
I have not checked providers other than cpu
, similar issues may be present in all of them.
To reproduce
Attached in human readable form below is a very simple forest I made by hand to try and illustrate this issue. Considering e.g. the input [7,7,4] and going through manually, it's clear the result should be 16 (25 from the first tree and -9 from the second), but at the moment onnxruntime (both in C++ and python) is returning 50.
It does this because the code looks at the root of the second tree, (index 2 in node_*
), "sees" that both its true and false children have index 3, and assumes this means this is functionally a leaf. However, one of the children is index 3 in node_*
, and the other is index 3 in leaf_*
, so this is not a leaf. This would be fixed by looking at nodes_falseleafs
and nodes_trueleafs
as well.
<
ir_version: 10,
opset_import: ["" : 22]
>
BDT_Forest (float[?,3] X) => (float[?,1] Y) {
Y = ai.onnx.ml.TreeEnsemble <aggregate_function: int = 1, leaf_targetids: ints = [0, 0, 0, 0, 0, 0, 0], leaf_weights: tensor = float[7] leaf_weights {100,0,25,0.5,-0.5,-5,-9}, n_targets: int = 1, nodes_falseleafs: ints = [1, 1, 0, 0, 1], nodes_falsenodeids: ints = [2, 1, 3, 4, 6], nodes_featureids: ints = [0, 1, 0, 1, 2], nodes_missing_value_tracks_true: ints = [0, 0, 0, 0, 0], nodes_modes: tensor = uint8[5] nodes_modes {0,0,0,0,0}, nodes_splits: tensor = float[5] nodes_splits {2,2,3,2,1}, nodes_trueleafs: ints = [0, 1, 1, 1, 1], nodes_truenodeids: ints = [1, 0, 3, 4, 5], tree_roots: ints = [0, 2]> (X)
}
Urgency
This will be rare, but I've now found at least one "real" tree where this caused an issue, in addition to the test example above.
Platform
Linux
OS Version
Ubuntu 24
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response