Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions src/libexpr/primops.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3398,6 +3398,59 @@ static RegisterPrimOp primop_mapAttrs({
.fun = prim_mapAttrs,
});


static void prim_concatMapAttrs(EvalState & state, const PosIdx pos, Value ** args, Value & v)
{
state.forceAttrs(*args[1], pos,
"while evaluating the second argument passed to builtins.concatMapAttrs");
auto inAttrs = args[1]->attrs();

std::map<SymbolIdx, Value*> attrMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must use traceable_allocator to make those allocations visible to the GC I think. Otherwise they are invisible to Boehm and those values can be freed.

Copy link
Contributor

@xokdvium xokdvium Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to use a small buffer optimization to avoid unnecessary allocations. As it stands now this will be very costly actually.


for (auto &i : *inAttrs) {
Value * vName = Value::toPtr(state.symbols[i.name]);
Value * funApp = state.allocValue();
funApp->mkApp(args[0], vName);
Value * result = state.allocValue();
result->mkApp(funApp, i.value);
Comment on lines +3414 to +3415
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of allocating the result on the heap? It could just be a variable on the stack, no?

state.forceAttrs(*result, pos,
"while evaluating the result of the function passed to builtins.concatMapAttrs");

// Direct insertion into map - automatically handles deduplication
// If duplicate keys exist, this overwrites with the later value (last-write-wins)
for (auto &j : *result->attrs()) {
attrMap[j.name] = j.value;
}
}

auto attrs = state.buildBindings(attrMap.size());
for (const auto& [name, value] : attrMap) {
attrs.insert(name, value);
}
Comment on lines +3427 to +3429
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::ranges::copy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large inputs, a k-way merge would make more sense.
The pkgs/by-name attrset, if* implemented with concatMapAttrs, would become a mere linear time operation; possibly faster than the current in-expression binary merge "tree" algorithm.

*: Nixpkgs could decide based on builtins?concatMapAttrs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we can expose the Bindings::iterator as a standalone k-way merge utility this is pretty easy to implement. We can resurrect your old k-way merge branch for this @roberth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if* implemented with concatMapAttrs

Wouldn't this also imply some overhead for the function calls? We could maybe implement an optimization for identity functions though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resurrect [...] @roberth

https://github.com/NixOS/nix/pull/11290/files#diff-f118e4c6f6e02148b887fdf627352311fca5a3a4eadf0b4a9d9f348e0be464ffR1904

imply some overhead for the function calls?

I think it's naturally a concatMapAttrs, something like

concatMapAttrs
  (shard: type:
    lib.optionalAttrs (type == "directory") (
      readShard shard
    )
  (readDir ../by-name)

No identity function needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe implement an optimization for identity functions though

Might be useful for other cases, idk. Not the most promising, but tracking in


v.mkAttrs(attrs.alreadySorted());
}


static RegisterPrimOp primop_concatMapAttrs({
.name = "__concatMapAttrs",
.args = {"f", "attrset"},
.doc = R"(
Apply function *f* to every element of *attrset*. For example,

```nix
builtins.concatMapAttrs (name: value: { ${name} = value; "${name}-${value}" = value; } ]) { foo = "fizz"; bar = "buzz"; }
```

evaluates to `{ foo = "fizz"; foo-fizz = "fizz"; bar = "buzz"; bar-buzz = "buzz"; }`.

If multiple applications of *f* return attribute sets with the same attribute names,
the last write wins. Meaning the value from the attribute set that was processed later will be kept in the final result.
)",
.fun = prim_concatMapAttrs,
});


static void prim_zipAttrsWith(EvalState & state, const PosIdx pos, Value ** args, Value & v)
{
// we will first count how many values are present for each given key.
Expand Down
Loading