Skip to content

Conversation

@lefou
Copy link
Member

@lefou lefou commented Nov 27, 2025

Fix: #6226

I just hardcoded the limit to 2 for now.

Pull request: #6260

@lefou lefou force-pushed the tr-scalajs-linker-parallel branch from 72beb07 to d2aa908 Compare November 27, 2025 17:26
Copy link
Contributor

@davesmith00000 davesmith00000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for picking this up @lefou. 🙏

It's kind of unsatisfying having to hardcore a value here, but I believe that setting a hardcoded sensible value is a better situation than people accidentally getting into an OutOfMemory blackhole by performing a common action, such as running all their tests.

def scalaJSWorker: Worker[ScalaJSWorker] = Task.Worker {
new ScalaJSWorker(
jobs = Task.ctx().jobs,
linkerJobs = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: Perhaps this value could be exposed on ScalaJSModuleAPI? It can have a nice low default but allow people to tweak it to their needs or based on some environmental heuristic? E.g. They have a massive CI server and can afford to open up the parallelism.

Exposing it on the API also slightly improves the transparency around what's going on here, but perhaps this will need to be documented somehow?

Copy link
Member Author

@lefou lefou Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I already thought about how to configure it, but didn't want to overengineer it.

The natural place for a config task would be the ScalaJSWorker, which is currently not designed to be customized, in a way other worker are, for example the JvmWorkerModule. Also, since there are potentially more than one ScalaJSWorker, we would need to introduce a new shared worker, so this route isn't a trivial change.

What would be somewhat easier is accepting an environment variable.

Also, we should converge to a "sensible default". I don't work with Scala.JS often, so I have no "feeling" for what a good value might be. We might also apply some logic based on heuristics, which I don't have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what reasonable heuristics you could sensibly apply, and I suspect attempting to do that might be a lot of work for not a lot of reward. 🤷

FWIW, @lolgab was suggesting a concurrency of 1 in a discussion on Discord, and I'm using 2 in CI:
https://github.com/PurpleKingdomGames/indigoengine/blob/main/ci.sh#L9-L10

@lefou
Copy link
Member Author

lefou commented Dec 5, 2025

I'm just witnessing a fatal OOM in coursier release process: https://github.com/coursier/coursier/actions/runs/19949228402/job/57231194898

 [3549] core.js[2.12.20].resolvedMvnDeps 658s
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid2482.hprof ...
Heap dump file created [6755211786 bytes in 76.676 secs]
Exception in thread "Process ID Checker Thread" java.lang.OutOfMemoryError: Java heap space
	at java.base/jdk.internal.misc.Unsafe.allocateInstance(Native Method)
	at java.base/java.lang.invoke.DirectMethodHandle.allocateInstance(DirectMethodHandle.java:501)
	at java.base/java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(DirectMethodHandle$Holder)
	at java.base/java.lang.invoke.Invokers$Holder.linkToTargetMethod(Invokers$Holder)
	at mill.server.Server$.checkProcessIdFile(Server.scala:484)
	at mill.server.Server$.$anonfun$7(Server.scala:511)
	at mill.server.Server$$$Lambda/0x00007f12b00ffd38.run(Unknown Source)
	at java.base/java.lang.Thread.runWith(Thread.java:1596)
	at java.base/java.lang.Thread.run(Thread.java:1583)

I think I want to merge this.

@davesmith00000
Copy link
Contributor

On the bright side: At least it is reproducible, and we know what the problem is. 🙂

@lolgab
Copy link
Member

lolgab commented Dec 5, 2025

@lefou Are we sure this solves the problem? Because the problem is not about limiting the parallelism, but about limiting the memory consumption of the Scala.js linkers. Even if we limit at 2, we still have a cache that stores jobs linkers. So we could still reach the same amount of memory to be allocated, no?

@davesmith00000
Copy link
Contributor

@lolgab I guess that depends on the implementation. Currently I work around the problem by manually forcing a concurrency limit, in order (to my naive understanding) to avoid the system choking / memory thrashing.

https://github.com/PurpleKingdomGames/indigoengine/blob/main/ci.sh#L9-L10

@lefou
Copy link
Member Author

lefou commented Dec 5, 2025

TBH, i have no idea. I just try to solve a blocker issue based on the provided input.

We probably also need to limit the cached jobs to the same size. Alternative or in addition, we could try to hold the cache in a soft or weak reference, so that the garbage collector has a change to evict unused instances. (We did this before for some caches, but I'm not sure, this code is still in place, since there were many rounds of refactoring since.)

Question: What is a good default for parallel linker jobs? This PR uses '2', mostly as this was reported as a good number, but I have no idea what's reasonable.

We should also add some metrics, so we better understand the error cases.

@lolgab
Copy link
Member

lolgab commented Dec 5, 2025

Alternative or in addition, we could try to hold the cache in a soft or weak reference, so that the garbage collector has a change to evict unused instances. (We did this before for some caches, but I'm not sure, this code is still in place, since there were many rounds of refactoring since.)

This unfortunately doesn't work. We did it before, but it wasn't working because Scala.js needs a cleanup method to be called to clean the cache, otherwise it gets leaked. SoftReference caches can't call finalizers when they get garbage collected.

@lefou
Copy link
Member Author

lefou commented Dec 5, 2025

@davesmith00000 Could you by any chance check, if this PR as-is fixes your issue (without applying your other workarounds, like limiting the --jobs.)?

@lefou
Copy link
Member Author

lefou commented Dec 5, 2025

@lefou
Copy link
Member Author

lefou commented Dec 5, 2025

Regarding the linker state. I don't know what the benefits of not clearing the linker are, but we should be able to auto-clean it after each use. That hopefully means, we don't hog unneeded memory, but still keep the JIT-ed classes around.

@lefou
Copy link
Member Author

lefou commented Dec 5, 2025

But maybe you don't mean the linker state, but the IRFileCache.Cache. Don't know what the best to do here.

@lolgab
Copy link
Member

lolgab commented Dec 5, 2025

Regarding the linker state. I don't know what the benefits of not clearing the linker are, but we should be able to auto-clean it after each use. That hopefully means, we don't hog unneeded memory, but still keep the JIT-ed classes around.

This basically kills the benefits of having a worker, since the Scala.js linker becomes not incremental anymore.

I'm thinking what is the best approach to avoid OOMs while keeping good parallelism and the incremental state.

@lefou
Copy link
Member Author

lefou commented Dec 5, 2025

I guess we need some runtime stats, and decide based on total and/or relative memory consumption, what caches to keep and what to remove. Theoretically, there are various kind ofdata a worker can keep, but not all state might provide the same benefit of being kept. E.g. intermediate compile results can be written and read from disc, but still provide a benefit over re-computation of the whole result. In the end, a cache so large that it causes OOMs is worse than no cache at all.

A classloader cache is much cheaper while ensuring high performance due to JIT-ed bytecode, than some in-memory cache of intermediate binary results.

@davesmith00000
Copy link
Contributor

@lefou I thought I'd try quickly testing this during my lunch break, but my efforts have been hampered by the forced upgrade to Scala 3.8.0-RC1 that Mill requires:

  1. 3.8.0-RC1 seems to have some weird behaviours around unused code (sometimes it's wrong...).
  2. 3.8.0-RC1 has reclassified some warnings, it seems, so a lot of patching was required.
  3. I don't understand the relationship between Mill Scala 3 version, my plugin's Scala 3 version, and my main projects Scala 3 version. Currently if they aren't aligned, bad things happen.
  4. One of my module's tests now refuse to compile.

Anyway, in terms of concurrently running fastOptJS, it seems better. I can't be 100% sure until I fix point (4) above, but it was happily linking 8-10 modules concurrently at one point.

@lefou
Copy link
Member Author

lefou commented Dec 5, 2025

Thank you @davesmith00000! I assume, before you were not able to have 8-10 link-tasks in parallel.

I'll merge this PR in the hope it helps. At least, it shouldn't make things worse. We can address the ScalaJS worker cache in a separate PR.

@lolgab
Copy link
Member

lolgab commented Dec 5, 2025

I've been trying to wrap my head around the ScalaJSWorker caching many times. It's complicated.
To recap, this is what we have.

We have a first layer of caching where we have an instance of ScalaJSWorkerImpl for every different Scala.js classloader.
So, more or less, we have an entry for every different scalaJSVersion we have in the process, with a maximum of ctx.jobs.

Then we have a second layer of caching where we cache the linkers. For every ScalaJSWorkerImpl instance, we have ctx.jobs linkers for ~every entry of the module/isFullLinkJS matrix.

On top of this, the way mill.util.CachedFactory works is that if you request more entries than maxCacheSize, it allocates a new linker, links, and then disposes it right away.

What I would want is a single limit that would be somehow shared by the two caches, so we keep control on the total number of linkers we instantiate, not only on the ones that we have in one of the ScalaJSWorkerImpls that are instantiated.

Moreover, maybe that behavior of creating an instance and dropping it right away we have in mill.util.CachedFactory is part of the problem. Maybe it should keep an internal semaphore as the one you implemented for Scala.js and limit who tries to create more entries than maxCacheSize?

@davesmith00000
Copy link
Contributor

I assume, before you were not able to have 8-10 link-tasks in parallel.

Correct. Previously it would start 8-10 in parallel but grind to a halt.

The behaviour I observe now is that it is completing what it can complete and only getting stuck on the troublesome module, which is some unrelated issue.

Comment on lines 300 to 305
val res = Await.result(resultFuture, Duration.Inf)
linker match {
case cl: ClearableLinker => cl.clear()
case _ => // no-op
}
res
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks Scala.js incremental linking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert it. The API docs don't tell, that this is related to incremental linking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert it.

}
}

private val linkerJobLimiter = ParallelismLimiter(linkerJobs)
Copy link
Member

@lolgab lolgab Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should pass linkerJobs instead of jobs to

    val bridge = cl
      .loadClass("mill.scalajslib.worker.ScalaJSWorkerImpl")
      .getDeclaredConstructor(classOf[Int])
      .newInstance(jobs)
      .asInstanceOf[workerApi.ScalaJSWorkerApi]

Since we are running two linker jobs at a time to save memory, if we store 8 different ones in memory, we aren't saving as much memory as we want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My working hypothesis was, that the high memory usage is required while the linking is in process, but most of it gets freed afterwards. That means, by delaying/synchronizing linking jobs, we already reduce the memory pressure. #6260 (comment) seems to support or at least not counter support this hypothesis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider that the test was performed with the code to clear the linker after every link step, which makes sense to clean the linker memory afterwards. If we keep the linkers in memory and do not clear them, the test could give different results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Limit the number of parallel running ScalaJS linker processes

3 participants