-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion/Thoughts: Mutation of series or dataframes #108
Comments
What are the motivations/goals of this issue? |
I started off from an API design point of view. But I believe the bigger question is performance. |
Having immutable data has some benefits. For instance, when chaining multiple operations over a series, we don't have to manually clone the operands in advance. However, I don't oppose the idea of supplementary APIs for in-place updates. Regarding performance, I propose that we should make the individual elements of a series unexposed. So, we can store the elements in flat memory layout (except for strings), rather than a slice of interfaces pointing to heap values. Edit: I have an experimental PR in my local repo, which has some preliminary refactoring for the aforementioned proposal. I thought that would break the APIs too much that I didn't expect to upstream it. |
Immutability was a conscious decision during the API design. I understand the potential benefits of mutating in place in terms of performance and memory usage. However, this library is not necessarily focused on extracting the maximum amount of performance, but rather on providing a somewhat safe API for data manipulation. If there is a real need for more performance, a lot more thought should be put in the memory layout of the data and other operations. As @typeless mentions, performance is bottlenecked by the way that the current memory model works. I initially designed for 'code reusability', but after a lot more experience with low level programming, I'm not sure it was the right call. |
This is a placeholder to discuss how we should treat operations on series and dataframes that amed the underlying datatype.
For example, for series we have the following functions:
Currently, only Append actually amends the series in place. The rest return a new series.
Question: Is there merit on discussion how the project should treat such operations for series and dataframes? Or is there already an understanding?
The text was updated successfully, but these errors were encountered: