Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source priority input #2022

Merged
merged 1 commit into from
Feb 11, 2025
Merged

Source priority input #2022

merged 1 commit into from
Feb 11, 2025

Conversation

SouthEndMusic
Copy link
Collaborator

@SouthEndMusic SouthEndMusic commented Jan 29, 2025

Fixes #1505.

To do:

  • Update QGIS plugin based on table changes
  • Allow for setting source priority with add API and/or separate function
  • Process the source priority data in the core
  • Make column renaming priority -> demand_priority a non-breaking migration step
  • Fix tests
  • Update docs

@SouthEndMusic SouthEndMusic requested a review from evetion January 29, 2025 12:34
@SouthEndMusic SouthEndMusic marked this pull request as draft January 29, 2025 12:34
@SouthEndMusic SouthEndMusic added the breaking A change that breaks existing models label Jan 29, 2025
Copy link
Member

@evetion evetion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe the design in this PR? From what I remember we need a configurable priority per nodetype only? And if we needed to override that (not sure), there can be a column in the Node table.

Given that context, why do we need a python/ribasim/ribasim/allocation/source_priorities.py?

core/test/docs.toml Outdated Show resolved Hide resolved
@SouthEndMusic
Copy link
Collaborator Author

Can you describe the design in this PR?

@evetion It's in the issue: #1505

@evetion
Copy link
Member

evetion commented Jan 29, 2025

Can you describe the design in this PR?

@evetion It's in the issue: #1505

It's still nice to summarize what the changes are in a PR description (and later commit message). Can be very short, as in rename field to demand_field, adds table X, adds config option, especially given the number of files changed here.

@SouthEndMusic SouthEndMusic removed the breaking A change that breaks existing models label Jan 29, 2025
@SouthEndMusic
Copy link
Collaborator Author

A remark on tests that were changed: In some cases the allocated amounts actually changed because a change in order of sources and how they are clustered together. But more generally, it is a bad idea to test based on obtaining the values from the variables (e.g. allocation_model.problem[:F]) directly, since these are the values from the last optimization only, and in particular only for one source priority. It is better to look at allocation_model.flow, which is cumulative over all flows for a certain demand priority.

@SouthEndMusic SouthEndMusic marked this pull request as ready for review February 6, 2025 14:43
@SouthEndMusic SouthEndMusic requested review from evetion and visr February 6, 2025 14:43
Copy link
Member

@evetion evetion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 💯 Could you comment on your approach in general?

I'm having a hard time to understand the refactor, mostly because of the rename with AllocationSourceType (going from the old boundary_node basin main_to_sub user_return buffer to user_demand, boundary, level_demand, flow_demand and subnetwork_inlet?). It would be good to document that. Same for the AllocationSource functions using the Val.

The changes in the tests slightly worry me, do you/we fully understand the changes in the source order? Are there other tests that assert correct behavior (if any)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with all the added newlines? Best to remove them (from the generator script).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def migration_func(df: DataFrame, schema_version: int) -> DataFrame:
if schema_version < 4:
warnings.warn(
f"Migrating outdated {node_type} / {table_type} table.", UserWarning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That warning is not so important that we need to adhere to a specific format, feel free to move the function outside of the loop.

core/test/docs.toml Outdated Show resolved Hide resolved
core/src/allocation_init.jl Outdated Show resolved Hide resolved
docs/reference/usage.qmd Outdated Show resolved Hide resolved
python/ribasim/ribasim/config.py Outdated Show resolved Hide resolved
if !(
(type isa DataType) &&
type <: Legolas.AbstractRecord &&
hasfield(type, :demand_priority)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I appreciate this refactor, I'd say the previous code was more readable. I now have to look up all tables with demand_priority to get the same understanding. Just a general remark, it's a fine line between preventing duplication bugs and readability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. On the other hand, if we add a new table with a demand_priority_column, this is typically something that we would forget to update and then the problem might be hard to locate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably something that would have to be undone if we were to prepare our code for static compilation?

core/src/read.jl Outdated Show resolved Hide resolved
core/src/read.jl Outdated Show resolved Hide resolved
@SouthEndMusic
Copy link
Collaborator Author

It seems that with these changes we're back in a situation where the allocation results are not deterministic. This might be due to the fact that now again in certain situations the best result (as measured by the loss function) is not unique. For instance, say there are two sources in a subnetwork that have the same priority and thus are used within the same optimization problem. If these two sources can both supply to a certain demand, there is no constraint that pins down which fraction of the demand comes from which of the 2 sources.

We can solve this by going back to optimizing per source. At the moment we do not enforce unique source priorities so in order to have a fixed order for the sources, per source priority the sources are ordered by node ID. If desired the modeler can set specific priorities for nodes to deviate from this behavior.

@FaridAlavi, is there a general way in which you avoid non-uniqueness problems like this? I just read about using a secondary objective. So first optimize for the allocation of demands, and then when those are fixed, solve for a secondary objective, say retrieving the same amount from every source, to make the solution unique.

@@ -36,614 +36,779 @@ def migrate(cls, df: Any, schema_version: int) -> Any:

class BasinConcentrationExternalSchema(_BaseSchema):
fid: Index[Int32] = pa.Field(default=1, check_name=True, coerce=True)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where all these white spaces come from

core/test/docs.toml Outdated Show resolved Hide resolved
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if !(
(type isa DataType) &&
type <: Legolas.AbstractRecord &&
hasfield(type, :demand_priority)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. On the other hand, if we add a new table with a demand_priority_column, this is typically something that we would forget to update and then the problem might be hard to locate

if !(
(type isa DataType) &&
type <: Legolas.AbstractRecord &&
hasfield(type, :demand_priority)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably something that would have to be undone if we were to prepare our code for static compilation?

@SouthEndMusic
Copy link
Collaborator Author

SouthEndMusic commented Feb 10, 2025

@evetion a comment on the approach in general:

In Ribasim Python, source priorities can now be added via the source_priority field of the Node class passed to the add API. This optional information is then added to the new source_priority column of the Node table. To void confusion, the old priority was renamed to demand_priority.

In the core the default or overridden source priorities are assigned to the source nodes in each subnetwork. Currently the sources within a subnetwork with the same priority are solved for at the same time which yield (more) situations with non-uniqueness of an optimal solution.

Regarding the rename of the AllocationSourceType: I noticed that there was a discrepancy between the source types we agreed upon in the design session for this issue and what was already present. I figured that it was nice to keep the internal enumeration and the names in the configuration in sync by defining on explicitly from the other.

Regarding the AllocationSource constructors: that was a design decision, not explicitly required for this PR. I figured it would be nice to dispatch on the source type in stead of having a lot of if ... elseif ... end. I accepted the price of having runtime dispatch in this initialization code. I had a discussion with @jakobnissen on slack a while back on this topic, that you should be able to do something like this in a type stable manner if you know at compile time what all possible options are.

@visr visr force-pushed the source_priority_input branch from 125889f to b70895f Compare February 11, 2025 12:56
@SouthEndMusic
Copy link
Collaborator Author

My latest commit reverts back to solving allocation per source to reduce the likelihood of non-unique optimal solutions which can lead to non-deterministic behavior. Because of this the allocation tests give the same results as main again so @evetion can sleep soundly 😴 The changes by @visr tried to fix a problem with CI which turned out to be unrelated to this PR.

Regarding non-unique solutions in general: they are still possible even now, for instance if flow can take 2 different routes from a source to a demand. There is a new issue for addressing this problem rigorously: #2051

@SouthEndMusic SouthEndMusic merged commit 193d793 into main Feb 11, 2025
4 of 23 checks passed
@SouthEndMusic SouthEndMusic deleted the source_priority_input branch February 11, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling default and non-default source priorities in allocation
2 participants