Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion/Thoughts: Mutation of series or dataframes #108

Open
mannharleen opened this issue Jan 5, 2020 · 4 comments
Open

Discussion/Thoughts: Mutation of series or dataframes #108

mannharleen opened this issue Jan 5, 2020 · 4 comments

Comments

@mannharleen
Copy link

This is a placeholder to discuss how we should treat operations on series and dataframes that amed the underlying datatype.

For example, for series we have the following functions:

  • Append
  • Map
  • Order
  • Set
    Currently, only Append actually amends the series in place. The rest return a new series.

Question: Is there merit on discussion how the project should treat such operations for series and dataframes? Or is there already an understanding?

@typeless
Copy link
Contributor

typeless commented Jan 6, 2020

What are the motivations/goals of this issue?
Are you concerned about, say, performance or API design?

@mannharleen
Copy link
Author

I started off from an API design point of view. But I believe the bigger question is performance.
For instance, having a Map operation for Series is great, but should it return a new Series or map in place? What works for gota? and Why?

@typeless
Copy link
Contributor

typeless commented Jan 6, 2020

Having immutable data has some benefits. For instance, when chaining multiple operations over a series, we don't have to manually clone the operands in advance. However, I don't oppose the idea of supplementary APIs for in-place updates.

Regarding performance, I propose that we should make the individual elements of a series unexposed. So, we can store the elements in flat memory layout (except for strings), rather than a slice of interfaces pointing to heap values.

Edit: I have an experimental PR in my local repo, which has some preliminary refactoring for the aforementioned proposal. I thought that would break the APIs too much that I didn't expect to upstream it.

@kniren
Copy link
Collaborator

kniren commented Jan 20, 2020

Immutability was a conscious decision during the API design. I understand the potential benefits of mutating in place in terms of performance and memory usage. However, this library is not necessarily focused on extracting the maximum amount of performance, but rather on providing a somewhat safe API for data manipulation.

If there is a real need for more performance, a lot more thought should be put in the memory layout of the data and other operations. As @typeless mentions, performance is bottlenecked by the way that the current memory model works. I initially designed for 'code reusability', but after a lot more experience with low level programming, I'm not sure it was the right call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants