`AggregateNumericRangeEquality`: Mismatch between implementation and documentation

The doc string of the corresponding `add_*_constraint` method claims the following:

> Since we expect aggregate_column to be a numeric column, this leads to a multiset of aggregated values. These values should correspond to the integers ranging from start_value to the cardinality of the multiset.

Hence if we have, for a given key, `n` rows (in other words, the cardinality of the multiset) and a start_value of `k`, I would expect a range to be complete if exactly the following rows exist:
```
(key, k)
(key, k+1)
...
(key, k+n-1)
```

Yet, the implementation checks the following:

https://github.com/Quantco/datajudge/blob/035031836b20c30e4afac27067000cec83692d95/src/datajudge/constraints/groupby.py#L36-L37

On the one hand, it revolves around the maximal encountered value instead of the cardinality of the set. On the other hand, the start value is added to said maximum.

It is easy to come up with an example where both outlined behaviours diverge. Assume `start_value` to equal `k` and the observed rows to correspond to this:
```
(key, k)
(key, k+1)
(key, k+2)
```
According to the former definition - as described in the doc string - this would be a legitimate key. 
According to the latter definition, we would expect
```
(key, k)
(key, k+1)
(key, k+2)
...
(key, k+2+k)
```
which would flag the current key as a failure for some `k`.

We do not notice this diverging behaviour in our tests since our tests only use `start_value=1`.

What is intended behaviour?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`AggregateNumericRangeEquality`: Mismatch between implementation and documentation #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	def missing_from_range(values, start=0):
	return set(range(start, max(values) + start)) - set(values)

AggregateNumericRangeEquality: Mismatch between implementation and documentation #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`AggregateNumericRangeEquality`: Mismatch between implementation and documentation #37