Skip to content

Conversation

ben-schwen
Copy link
Member

Towards #6180
Depends on #6694
Supersedes #6697

aitap and others added 14 commits December 29, 2024 19:10
The resulting data.tables now have GROWABLE_BIT set, therefore:
 - the finalizer is not needed on R >= 3.4
 - duplicates of data.tables (which are not over-allocated) now have
   TRUELENGTH of 0 instead of whatever it was before, which is detected
   earlier in selfrefok()

As a result, assign.c only uses TRUELENGTH on R < 3.4.
Since dogroups() relies on being able to shrink vectors it hasn't
allocated, introduce make_growable() and is_growable() to adapt. Since
dogroups() relies on SD and SDall having shared columns, use the
setgrowable() wrapper on the R side at the time when SD and SDall are
being created.

(In the ALTREP case, setgrowable() will re-create the columns.)
This helps avoid false positive warnings about unreachable places in the code.
Currently broken:
- 1 errors out of 11637
 - won't work with expression vectors at all
Co-Authored-By: Benjamin Schwendinger <[email protected]>
@ben-schwen ben-schwen changed the base branch from master to truehash September 25, 2025 20:47
@jangorecki
Copy link
Member

jangorecki commented Sep 25, 2025

SETLENGTH is left in frollapply.c rather than growable_resize(), is it intentional?
It runs for every observation so would be nice to have as little overhead like it is here now. Especially when we know what's the max size and that we will not attempt to resize to larger than that...

remove growable_altrep define

Co-authored-by: aitap <[email protected]>
@ben-schwen
Copy link
Member Author

SETLENGTH left in frollapply.c rather than growable_resize(), is it intentional? It runs for every observation so would be nice to have as little overhead as possible. Especially when we know what's the max size and that we will not attempt to resize to larger than that...

Didn't touch that yet. Not sure if exchanging SETLENGTH with growable_resize will do the full trick since the function dispatch in frollapply is quite cumbersome and its hard to grasp whether we end up with a DT or a vector.

@jangorecki
Copy link
Member

jangorecki commented Sep 26, 2025

SETLENGTH left in frollapply.c rather than growable_resize(), is it intentional? It runs for every observation so would be nice to have as little overhead as possible. Especially when we know what's the max size and that we will not attempt to resize to larger than that...

Didn't touch that yet. Not sure if exchanging SETLENGTH with growable_resize will do the full trick since the function dispatch in frollapply is quite cumbersome and its hard to grasp whether we end up with a DT or a vector.

by.column=FALSE is when DT branches are in use, otherwise vector.
Note that functions there are very similar to each other, and may seem that could be merged into one by branching if (.) 1 else 2. It is kept like that to reduce the overhead. We want as few ifs as possible because those functions are called for every single observation of the input to forllapply().

@ben-schwen
Copy link
Member Author

by.column=FALSE is when DT branches are in use, otherwise vector. Note that functions there are very similar to each other, and may seem that could be merged into one by branching if (.) 1 else 2. It is kept like that to reduce the overhead. We want as few ifs as possible because those functions are called for every single observation of the input to forllapply().

You were right 👍. Exchanging SETLENGTH with growable_resize and patching your version of setgrowable using make_growable works!

All 16120 tests (last 7003.76) in tests/froll.Rraw completed ok in 00:01:39 elapsed (00:01:28 cpu)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants