Discussion/Thoughts: Mutation of series or dataframes #108

mannharleen · 2020-01-05T10:29:27Z

This is a placeholder to discuss how we should treat operations on series and dataframes that amed the underlying datatype.

For example, for series we have the following functions:

Append
Map
Order
Set
Currently, only Append actually amends the series in place. The rest return a new series.

Question: Is there merit on discussion how the project should treat such operations for series and dataframes? Or is there already an understanding?

typeless · 2020-01-06T02:45:46Z

What are the motivations/goals of this issue?
Are you concerned about, say, performance or API design?

mannharleen · 2020-01-06T04:12:43Z

I started off from an API design point of view. But I believe the bigger question is performance.
For instance, having a Map operation for Series is great, but should it return a new Series or map in place? What works for gota? and Why?

typeless · 2020-01-06T05:05:42Z

Having immutable data has some benefits. For instance, when chaining multiple operations over a series, we don't have to manually clone the operands in advance. However, I don't oppose the idea of supplementary APIs for in-place updates.

Regarding performance, I propose that we should make the individual elements of a series unexposed. So, we can store the elements in flat memory layout (except for strings), rather than a slice of interfaces pointing to heap values.

Edit: I have an experimental PR in my local repo, which has some preliminary refactoring for the aforementioned proposal. I thought that would break the APIs too much that I didn't expect to upstream it.

kniren · 2020-01-20T10:31:34Z

Immutability was a conscious decision during the API design. I understand the potential benefits of mutating in place in terms of performance and memory usage. However, this library is not necessarily focused on extracting the maximum amount of performance, but rather on providing a somewhat safe API for data manipulation.

If there is a real need for more performance, a lot more thought should be put in the memory layout of the data and other operations. As @typeless mentions, performance is bottlenecked by the way that the current memory model works. I initially designed for 'code reusability', but after a lot more experience with low level programming, I'm not sure it was the right call.

julienrbrt mentioned this issue Jun 10, 2020

Facilitate basic series operations #121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion/Thoughts: Mutation of series or dataframes #108

Discussion/Thoughts: Mutation of series or dataframes #108

mannharleen commented Jan 5, 2020

typeless commented Jan 6, 2020

mannharleen commented Jan 6, 2020

typeless commented Jan 6, 2020 •

edited

Loading

kniren commented Jan 20, 2020

Discussion/Thoughts: Mutation of series or dataframes #108

Discussion/Thoughts: Mutation of series or dataframes #108

Comments

mannharleen commented Jan 5, 2020

typeless commented Jan 6, 2020

mannharleen commented Jan 6, 2020

typeless commented Jan 6, 2020 • edited Loading

kniren commented Jan 20, 2020

typeless commented Jan 6, 2020 •

edited

Loading