`ADR Suggestion` DescriptorArray #95

henrikjacobsenfys · 2025-01-17T12:58:41Z

henrikjacobsenfys
Jan 17, 2025
Maintainer

For EasyCrystallography and EasyDiffraction, we need to have Descriptors that are arrays. We agreed that it should be named DescriptorArray.

It will inherit from DescriptorBase, and will be implemented as scipp variables using scipp.array.

It will have all the same methods as DescriptorNumbers, including element wise+,-abs, neg.

We will follow numpy conventions. This means that * and / can only be done with Numbers or DescriptorNumbers, and will be element wise.

Matrix multiplication and division will be implemented as well, using Numpy. We will follow the Numpy naming scheme. Numpy Arrays and DescriptorArrays will be accepted inputs.

rozyczko · 2025-01-19T15:39:19Z

rozyczko
Jan 19, 2025
Maintainer

I think this is all correct but should be converted to a task in the EasyScience project https://github.com/orgs/EasyScience/projects/21
We have all agreed that this should be done and the new class should have the same behaviour as the other derived classes for different types.

New features, especially extensions of existing functionality are best described on the project board as they are then easy to manage in the workflow TODO -> IN PROGRESS -> DONE

0 replies

henrikjacobsenfys · 2025-01-27T10:03:28Z

henrikjacobsenfys
Jan 27, 2025
Maintainer Author

We will also allow addition etc. with DescriptorNumbers

0 replies

elindgren · 2025-02-20T14:58:45Z

elindgren
Feb 20, 2025
Collaborator

Broadcasting operations will be allowed. In particular, broadcasting operations involving variables with variances will not consider the covariances between elements of the array that will be introduced.

For example, addition of a DescriptorArray a with a DescriptorNumber b, using the syntax a+b, will be allowed. In the case when a and b have variances, the variance of the sum will simply be the sum of the variances. The covariance $\rm cov(a_i + b, a_j + b) = \rm var(b)$ that would be introduced will not be considered. A warning will be raised to indicate to the user that the covariances introduced by the broadcasting operation have been suppressed.

0 replies

elindgren · 2025-02-24T14:48:48Z

elindgren
Feb 24, 2025
Collaborator

We need to define what the resulting type should be when performing operations with Numpy arrays. There are two alternatives:

The result is always a DescriptorArray

a = np.array(...)
b = DescriptorArray(...)

c = a + b  # c is a DescriptorArray
c = b + a  # c is a DescriptorArray

The result is determined by the variable to the left of the operand

a = np.array(...)
b = DescriptorArray(...)

c = a + b  # c is a Numpy array
c = b + a  # c is a DescriptorArray

Option 1 might initially be the most intuitive, but it requires extra complexity in the form of explicitly overriding the Numpy __array_ufuncs__ such as np.add.

Option 2 is consistent with how unit conversions work, but is not consistent with how operations on other types of objects with DescriptorArrays work. For example, addition with a list always yields a DescriptorArray. In order to implement Option 2, we should implement the __array__ method to allow Numpy to handle things.

This discussion was sparked by an issue I encountered when implementing __radd__ for DescriptorArrays with Numpy arrays. See this StackOverflow thread for someone else who has encountered the same issue.

0 replies

damskii9992 · 2025-02-25T10:25:53Z

damskii9992
Feb 25, 2025
Maintainer

Looking at the kind of rabbit hole this is turning into, I am re-considering the potential use-case for this functionality.

While it is easy to try to make a "perfect" class that interfaces with everything, I don't think we want our users to use numpy arrays with our descriptors and parameters. We should encourage, and here force, the users to use scipp arrays or our own classes instead.

So I propose to not implement arithmetic operations for numpy arrays.

0 replies

elindgren · 2025-02-27T13:56:46Z

elindgren
Feb 27, 2025
Collaborator

We should also support a trace operation for the descriptor array

0 replies

elindgren · 2025-02-27T16:06:48Z

elindgren
Feb 27, 2025
Collaborator

Should the DescriptorArray take an argument dims if anything else than a 2D array is supplied?

0 replies

henrikjacobsenfys · 2025-02-28T08:29:36Z

henrikjacobsenfys
Feb 28, 2025
Maintainer Author

I am a bit late for this, but I think we need to discuss a bit what the purpose is of DescriptorArrray. It was originally to be used in EasyCrystallography to handle space group operations, which at first sight consist of a list of 3x3 matrices and 1x3 (or 3x1?) vectors.

However, EasyCrystallography already has a class to handle space group operations, including all the required matrix operations and things I had not even thought to consider such as tolerance*. This is why I made the DescriptorAnyType instead, to function basically like the old Descriptor did in EasyCrystallography.
We discussed that it may be ideal to rewrite EasyCrystallography to use more specialized Descriptors, but I'm not sure if it's worth the effort, especially considering things like tolerance.

I am not sure if we have any other use cases for DescriptorArray at the moment.

*I believe it's for the following: using all the space group operations might create duplicate atom positions, within some numerical tolerance. For the sake of example, it may create atoms at (0,0,0) and (0.01,0,0), which would be considered equal.

8 replies

damskii9992 Feb 28, 2025
Maintainer

The old Descriptor, before we made DescriptorNumber, DescriptorString and DescriptorBool was made to be universal and take any types. But because of this it did not implement much functionality, mostly just type handling.
We started working on DescriptorArray because in EasyCrystallopgraphy we have these rotation matrices and we needed a type to handle these, hence we figured we should extend our suite of Descriptor types with an array type.
Henrik then found out that the way the Descriptor was used in EasyCrystallography could not simply be replaced by the DescriptorArray we had started designing, it would require some rework on EasyCrystallography too.
But instead of doing this work we just implemented the DescriptorAnyType which works like the old Descriptor and used that instead, meaning we now don't "need" a DescriptorArray, because we have the DescriptorAnyType.
But because this DescriptorAnyType is just a band-aid and a temporary work-around, we do still need to implement the DescriptorArray, after which we will then also need to work on EasyCrystallography in order to not need the DescriptorAnyType and longer.

elindgren Feb 28, 2025
Collaborator

Alright, thanks for the clarification @damskii9992. Does this mean that the spec for DescriptorArray is still as initially decided upon (i.e., what I've implemented + matrix multiplication), or should the spec be changed?

damskii9992 Feb 28, 2025
Maintainer

The spec is still the same :)

henrikjacobsenfys Feb 28, 2025
Maintainer Author

DescriptorAnyType has no methods beyond the basics like repr and value.

The original Descriptor could contain anything, and had, if I recall correctly, quite complicated logic in the methods since it had to check if it was a string, a number and so on. We therefore refactored it into DescriptorString, DescriptorNumber etc., with the idea that they would be easier to maintain and the logic would be simpler.
It turns out that we missed how EasyCrystallography used Descriptor, which was as a container of symmetry operations in its SpaceGroup class. The quickest fix was to add DescriptorAnyType and let it play the role of the old Descriptor.

The difficulty is that the symmetry operations are not, as I initially thought, simply an array. I thought I could make a DescriptorArray to replace this array and call it a day. However, a symmetry operation, as used in SpaceGroup, is an object containing two arrays and a couple of other properties as well. We can replace the arrays in SpaceGroup with DescriptorArrays, but what about the other properties? Ultimately, I'm not sure the solution would be significantly better than what's currently implemented.

I'm sorry for not writing these thoughts down earlier.

henrikjacobsenfys Feb 28, 2025
Maintainer Author

If the goal is to not use DescriptorAnyType anywhere, then I think the correct approach is to start from EasyCrystallography and work out a set of requirements from there. Otherwise we risk making a DescriptorArray which is either overengineered to solve problems we don't have, or isn't quite what we need.

elindgren · 2025-03-03T15:02:02Z

elindgren
Mar 3, 2025
Collaborator

We will not support editing part of an DescriptorArray by calling __setitem__. Instead, one should either create a new DescriptorArray or edit the underlying scipp.Variable array.

The reason for this change has to do with us not implementing views for the DescriptorArray. That means that slicing currently returns a new DescriptorArray, e.g.,

arr = DescriptorArray(...)
sub_arr = arr['dim0', 0]  # also a DescriptorArray

Slicing is done using scipp syntax. sub_arr is now a NEW DescriptorArray, with a new unique_name, but also without a View of the underlying scipp.Variable array. This means that the following assignment does not work as expected:

arr = DescriptorArray(values=[1.0, 1.0])
arr['dim0', 0][0] = 1000.0  # arr['dim0', 0] is a completely new DescriptorArray and scipp array
arr['dim0', 0][0]  # Still 1.0

What we would want is for the above syntax to modify the scipp array of arr, but that requires us working with views of the underlying scipp data as well as either delete the old arr and replace it with the new one, or come up with a way to modify it in place.

For now we decided to drop functionality for __setitem__, as we decided it be of limted use.

See numpy and scipp docs for further details on their respective systems for slicing and views.

0 replies

ADR Suggestion DescriptorArray #95

Uh oh!

Uh oh!

henrikjacobsenfys Jan 17, 2025 Maintainer

Replies: 9 comments · 8 replies

Uh oh!

Uh oh!

rozyczko Jan 19, 2025 Maintainer

Uh oh!

henrikjacobsenfys Jan 27, 2025 Maintainer Author

Uh oh!

elindgren Feb 20, 2025 Collaborator

Uh oh!

Uh oh!

elindgren Feb 24, 2025 Collaborator

Uh oh!

damskii9992 Feb 25, 2025 Maintainer

Uh oh!

elindgren Feb 27, 2025 Collaborator

Uh oh!

elindgren Feb 27, 2025 Collaborator

Uh oh!

henrikjacobsenfys Feb 28, 2025 Maintainer Author

Uh oh!

damskii9992 Feb 28, 2025 Maintainer

Uh oh!

elindgren Feb 28, 2025 Collaborator

Uh oh!

damskii9992 Feb 28, 2025 Maintainer

Uh oh!

henrikjacobsenfys Feb 28, 2025 Maintainer Author

Uh oh!

henrikjacobsenfys Feb 28, 2025 Maintainer Author

Uh oh!

elindgren Mar 3, 2025 Collaborator

`ADR Suggestion` DescriptorArray #95

henrikjacobsenfys
Jan 17, 2025
Maintainer

Replies: 9 comments 8 replies

rozyczko
Jan 19, 2025
Maintainer

henrikjacobsenfys
Jan 27, 2025
Maintainer Author

elindgren
Feb 20, 2025
Collaborator

elindgren
Feb 24, 2025
Collaborator

damskii9992
Feb 25, 2025
Maintainer

elindgren
Feb 27, 2025
Collaborator

elindgren
Feb 27, 2025
Collaborator

henrikjacobsenfys
Feb 28, 2025
Maintainer Author

damskii9992 Feb 28, 2025
Maintainer

elindgren Feb 28, 2025
Collaborator

damskii9992 Feb 28, 2025
Maintainer

henrikjacobsenfys Feb 28, 2025
Maintainer Author

henrikjacobsenfys Feb 28, 2025
Maintainer Author

elindgren
Mar 3, 2025
Collaborator