[WIP] Guidelines for pipelines by AlexVCaron · Pull Request #58 · nf-neuro/website

AlexVCaron · 2026-02-06T20:04:34Z

New pages for basic pipeline usage and common configuration, plus guidelines for download using both nf-core and our script.

Build the documentation and run the development server to get a view !

gagnonanthony

Great job! It's clear and easy to read. I've made a few comments, but before going further, I think we might need a discussion as a team. Here's my current thoughts:

While the doc is clear and super nice, your angle is tailored for a user's perspective (someone who wants to run a pipeline). You made it super general, which is perfect, but I have a hard time finding arguments as to why we should put this doc on the nf-neuro website and not on the pipelines' documentation (each would be more tailored to their own parameters). Otherwise, users would need to do back and forth between two docs.

What I think would be cool is to change the angle of this doc to support pipeline developers create their own pipeline documentation. This way we would get interaction on the nf-neuro website from developers, which means the doc can be more complex and "geeky". Developers will digest this information into a more user-friendly version tailored to their own pipeline, which users will refer to when launching their stuff.

Otherwise, I feel we duplicate the information a lot, which might confuse users as to which doc to refer to. Also, the rest of the nf-neuro website is mostly designed for pipelines developers, not pipeline users, so having a section that targets users is a bit strange imo.

Anyway, those are just thoughts, feel free to jump in @arnaudbore @levje @AlexVCaron

gagnonanthony · 2026-02-09T15:20:08Z

src/content/docs/pipelines/run.mdx

+   data and the `input` directory.
+
+   :::caution
+   **Symbolic links must be carefully verified on HPC**, as so they are accessible by computing nodes


Suggested change

**Symbolic links must be carefully verified on HPC**, as so they are accessible by computing nodes

**Symbolic links must be carefully verified on HPC** to validate that they will be accessible by computing nodes.

gagnonanthony · 2026-02-09T15:25:34Z

src/content/docs/pipelines/run.mdx

+4. **Create or edit `nextflow.config`** in the directory created at **step 1**. In it, set or replace :
+
+   ```groovy
+   params.input = 'input'
+   params.outdir = 'results'
+   ```
+
+   Refer to the documentation of the pipeline you are running for any other **input parameters** needing to be set, and for **execution
+   parameters** that might be of interest to set given your data, project or research question.
+
+   :::tip[Centralize your configuration !]
+   You can specify configuration for the pipeline in many different ways. **We cannot recommend enough you centralize everything in the
+   `nextflow.config` we made you create above, for debugging purposes, but also for reuse, sharing and safekeeping**. A rule of thumb is
+   compiling all **static** configuration in that file, and supply parameters at the command line only to slightly execution
+   to specific use-cases. **Overriding parameters using the `-c` nextflow argument should be avoided at all costs !**
+   :::


I am not sure I agree with this method for setting pipeline parameters. Asking users to create nextflow.config files is a bit too much for me, especially since they can conflict with each pipeline's config files.

I think we should promote the use of params-file.yaml rather than config files (or simply ask them to add parameters via the CLI). The usage is a bit more intuitive, and we make a clear separation of what is config/parameters.

In any case, the specific parameters are shown in the QC report, so they can refer to that for future analyses.

I'm not sure I completely agree here. I do think input via the CLI is preferable, but don't understand using yaml for input configuration. It makes sense on a developer standpoint, as they are using the language to implement components, but we don't assume users to know yaml. We assume they read documentation and know Nextflow. Nowhere, except maybe deep in some nf-core guidelines, will they be asked to provide configuration as yaml. Can you argue me why the popularity of yaml would make it preferential here ?

Personally, I don't even know what is the params-files.yaml. I see the argument to have a file that can't interfere with some essential parameters of the pipeline. However, I think I'm more in favor of using the "imperfect" nextflow.config file (even though, let's be honest, YAML is much cleaner than the nextflow config).

However, I am not sure I agree with the line saying that "Overriding parameters using the -c nextflow argument should be avoided at all costs!". That means I have to create a nextflow.config for each dataset I'm processing? Since I'm running the same pipeline with the same custom parameters on several parameters, can't I just have a single configuration file that's common to all my runs? With that phrasing, I feel like I'd be committing a crime by using something that's clearly well supported by nextflow. I would suggest to either:

Remove this if we decide it's ok to use the -c option.

Rephrase this so we don't feel terrible using that option and/or a small explanation on why this would be bad practice.

From what I know, all you need to override configuration is a single nextflow.config file in the working directory of the pipeline. It takes precedence on everything else no matter what. Only the -c option overwrites it ... @gagnonanthony is it different when nextflow downloads the pipeline ? Do you absolutely need the -c option ?

I meant that if there's multiple working directories (I would probably have a working directory per-dataset I'm running), I'd have to create a nextflow.config file in each working directory?

You should always keep a trace of the parameters you use, in a script with CLI arguments (with --) or in a nextflow.config where you launch your study. I think it's better to keep those locally to each run, for history and replicability. So I think it's better to avoid -c if you can.

Regarding the override configuration, @AlexVCaron and @levje, I don't think it's different when nextflow downloads the pipeline. The pipeline is downloaded to $HOME/.nextflow/..., so it's very unlikely this will also be your working directory. I prefer suggesting to users that they specify additional configs with -c (using a config file named something meaningful, not just nextflow.config). In my mind, if they do so, they need to understand that they are changing the pipeline's default configs (or supplying configs for their cluster).

Regarding the params.yml file, it's documented in the Nextflow docs and is recommended for providing parameters when not using the CLI (JSON is also supported, but I personally prefer YAML). It's also how nf-core recommends doing this (it's part of their boilerplate documentation for pipelines). I see two advantages of using the params.yml file:

It's cleaner for users who do not have a lot of programming experience (and more intuitive). Here's an example:

input: '/path/to/BIDS/' outdir: '/path/to/results/' participant_label: - sub-01 - sub-02 dti_shells: "0 1500" fodf_shells: "0 1500 3000"

It creates a clear separation between what is configurations (like resources, executors, etc.) and pipeline parameters. I feel that if inexperienced users start playing in a nextflow.config-like file (most certainly with the help of ChatGPT or something), we could end up with issues that are hard-to-track because they rewrote something that was not meant to be in the pipeline. Having them create a yaml for parameters limits the possible problems.

Also, it gets supplied differently, using the -params-file options in nextflow. Let me know what you think!

The location of the pipeline downloaded ($HOME/.nextflow/...) doesn't matter in the case of the configuration overriding. The location of the downloaded pipeline is labeled as the "project directory", but if we have a nextflow.config in the launch directory (from where we launch the pipeline), it will override the configuration without having to specify the -c option.

Personally, I prefer to manually input the custom configuration in the CLI with the -c option. I see 2 good points on doing that:

It's transparent to the user that we are using a custom configuration file (or not).

The configuration file name appears in the QC report. So if I want to reproduce results I ran 2 years ago and I forgot that I inputted a custom configuration, I can just have a look at the command I ran in the QC report and see the exact input I used for my pipeline.

On the -params-file, you're slowly convincing me (with good arguments) to prefer this way of specifying parameters. I think the idea is great. Only problem I have is that the YAML format feels disconnected to the other technologies/languages used in the pipelines (and using a JSON file would be a nightmare). Other than that, I'm all for it.

gagnonanthony · 2026-02-09T15:29:53Z

src/content/docs/pipelines/offline.mdx

I would personally put the custom download script as the first proposed solution, then the nf-core one.

The custom script was and will always remain a bugfix to me. So in my books, placing it first doesn't make sense. Happy to be proven wrong. But feel free to advertise otherwise in your pipeline documentation !

I agree with @gagnonanthony. Even thought it's more an ease-of-use script to circumvent some issues, I think the simpler solution should be the first one given to the user here. If I'm a user who has little knowledge about nextflow and its associated libraries, I don't really care about installing nf-core and all that stuff if there's a script that takes care of everything by running a single command.

levje

Great job. I do have the same opinion as @gagnonanthony wrt where this type of documentation belongs.

levje · 2026-02-09T20:14:39Z

src/content/docs/pipelines/run.mdx

+   be exact, when in doubt ensure you have **a lot** of it). The pipeline's execution should not be affected if no more
+   space is available to write results, but you won't have access to them easily. In which case, you might need to re-execute


Not so sure about this sentence makes sense: The pipeline's execution should not be affected if no more space is available to write results. If there's no more space left on my device, the pipeline does crash/stop.

I will precise. I had a use-case on HPC in head when I wrote this. You could write results to project and process in scratch, and run out of space in the results space, while still processing correctly ... my bad 😅

levje · 2026-02-09T20:15:25Z

src/content/docs/pipelines/run.mdx

+   some of the steps or the whole pipeline a second time, wasting time and computing resources.
+
+   :::tip
+   Pipeline's usually work in **overwrite mode**, meaning **subsequent pipeline runs will write over previous ones for the same


Pipeline's -> Pipelines

levje · 2026-02-09T20:16:11Z

src/content/docs/pipelines/run.mdx

+
+   :::tip
+   Pipeline's usually work in **overwrite mode**, meaning **subsequent pipeline runs will write over previous ones for the same
+   input subjects**. If unsure, consult the documentation for the specific pipeline you want to use


Add period at the end of sentence.

levje · 2026-02-09T20:22:58Z

src/content/docs/pipelines/run.mdx

+This is your **main task**. You need to prepare the spaces where your input data lives, according to
+your pipeline's input specification. You also need to allocate a space for the pipeline's outputs. You'll
+need to refer to its own documentation to get everything in order, as each pipeline has its own specificities.
+Nomatterwhat, here is a quick checklist to get everything in good shape, ready to address any pipeline's I/O


Nomatterwhat -> No matter what

levje · 2026-02-09T20:34:35Z

src/content/docs/pipelines/run.mdx

+4. **Create or edit `nextflow.config`** in the directory created at **step 1**. In it, set or replace :
+
+   ```groovy
+   params.input = 'input'
+   params.outdir = 'results'
+   ```
+
+   Refer to the documentation of the pipeline you are running for any other **input parameters** needing to be set, and for **execution
+   parameters** that might be of interest to set given your data, project or research question.
+
+   :::tip[Centralize your configuration !]
+   You can specify configuration for the pipeline in many different ways. **We cannot recommend enough you centralize everything in the
+   `nextflow.config` we made you create above, for debugging purposes, but also for reuse, sharing and safekeeping**. A rule of thumb is
+   compiling all **static** configuration in that file, and supply parameters at the command line only to slightly execution
+   to specific use-cases. **Overriding parameters using the `-c` nextflow argument should be avoided at all costs !**
+   :::


Personally, I don't even know what is the params-files.yaml. I see the argument to have a file that can't interfere with some essential parameters of the pipeline. However, I think I'm more in favor of using the "imperfect" nextflow.config file (even though, let's be honest, YAML is much cleaner than the nextflow config).

However, I am not sure I agree with the line saying that "Overriding parameters using the -c nextflow argument should be avoided at all costs!". That means I have to create a nextflow.config for each dataset I'm processing? Since I'm running the same pipeline with the same custom parameters on several parameters, can't I just have a single configuration file that's common to all my runs? With that phrasing, I feel like I'd be committing a crime by using something that's clearly well supported by nextflow. I would suggest to either:

Remove this if we decide it's ok to use the -c option.

Rephrase this so we don't feel terrible using that option and/or a small explanation on why this would be bad practice.

AlexVCaron added 2 commits February 6, 2026 14:44

first incomplete draft

85a2f98

common guidelines done

4fe7c2b

AlexVCaron force-pushed the feat/pipeline_install branch from 32f3340 to 4fe7c2b Compare February 6, 2026 20:04

AlexVCaron added 2 commits February 7, 2026 01:48

add offline documentation

dd16df3

add common profiles and params

53681dd

AlexVCaron force-pushed the feat/pipeline_install branch from e9271cd to 53681dd Compare February 7, 2026 02:18

AlexVCaron requested a review from gagnonanthony February 7, 2026 02:18

gagnonanthony requested changes Feb 9, 2026

View reviewed changes

levje reviewed Feb 9, 2026

View reviewed changes

	Symbolic links must be carefully verified on HPC, as so they are accessible by computing nodes
	Symbolic links must be carefully verified on HPC to validate that they will be accessible by computing nodes.

		be exact, when in doubt ensure you have a lot of it). The pipeline's execution should not be affected if no more
		space is available to write results, but you won't have access to them easily. In which case, you might need to re-execute

Conversation

AlexVCaron commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gagnonanthony left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexVCaron Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

levje left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

AlexVCaron commented Feb 6, 2026 •

edited

Loading

AlexVCaron Feb 9, 2026 •

edited

Loading