Conversation
32f3340 to
4fe7c2b
Compare
e9271cd to
53681dd
Compare
gagnonanthony
left a comment
There was a problem hiding this comment.
Great job! It's clear and easy to read. I've made a few comments, but before going further, I think we might need a discussion as a team. Here's my current thoughts:
While the doc is clear and super nice, your angle is tailored for a user's perspective (someone who wants to run a pipeline). You made it super general, which is perfect, but I have a hard time finding arguments as to why we should put this doc on the nf-neuro website and not on the pipelines' documentation (each would be more tailored to their own parameters). Otherwise, users would need to do back and forth between two docs.
What I think would be cool is to change the angle of this doc to support pipeline developers create their own pipeline documentation. This way we would get interaction on the nf-neuro website from developers, which means the doc can be more complex and "geeky". Developers will digest this information into a more user-friendly version tailored to their own pipeline, which users will refer to when launching their stuff.
Otherwise, I feel we duplicate the information a lot, which might confuse users as to which doc to refer to. Also, the rest of the nf-neuro website is mostly designed for pipelines developers, not pipeline users, so having a section that targets users is a bit strange imo.
Anyway, those are just thoughts, feel free to jump in @arnaudbore @levje @AlexVCaron
| data and the `input` directory. | ||
|
|
||
| :::caution | ||
| **Symbolic links must be carefully verified on HPC**, as so they are accessible by computing nodes |
There was a problem hiding this comment.
| **Symbolic links must be carefully verified on HPC**, as so they are accessible by computing nodes | |
| **Symbolic links must be carefully verified on HPC** to validate that they will be accessible by computing nodes. |
| 4. **Create or edit `nextflow.config`** in the directory created at **step 1**. In it, set or replace : | ||
|
|
||
| ```groovy | ||
| params.input = 'input' | ||
| params.outdir = 'results' | ||
| ``` | ||
|
|
||
| Refer to the documentation of the pipeline you are running for any other **input parameters** needing to be set, and for **execution | ||
| parameters** that might be of interest to set given your data, project or research question. | ||
|
|
||
| :::tip[Centralize your configuration !] | ||
| You can specify configuration for the pipeline in many different ways. **We cannot recommend enough you centralize everything in the | ||
| `nextflow.config` we made you create above, for debugging purposes, but also for reuse, sharing and safekeeping**. A rule of thumb is | ||
| compiling all **static** configuration in that file, and supply parameters at the command line only to slightly execution | ||
| to specific use-cases. **Overriding parameters using the `-c` nextflow argument should be avoided at all costs !** | ||
| ::: |
There was a problem hiding this comment.
I am not sure I agree with this method for setting pipeline parameters. Asking users to create nextflow.config files is a bit too much for me, especially since they can conflict with each pipeline's config files.
I think we should promote the use of params-file.yaml rather than config files (or simply ask them to add parameters via the CLI). The usage is a bit more intuitive, and we make a clear separation of what is config/parameters.
In any case, the specific parameters are shown in the QC report, so they can refer to that for future analyses.
There was a problem hiding this comment.
I'm not sure I completely agree here. I do think input via the CLI is preferable, but don't understand using yaml for input configuration. It makes sense on a developer standpoint, as they are using the language to implement components, but we don't assume users to know yaml. We assume they read documentation and know Nextflow. Nowhere, except maybe deep in some nf-core guidelines, will they be asked to provide configuration as yaml. Can you argue me why the popularity of yaml would make it preferential here ?
There was a problem hiding this comment.
Personally, I don't even know what is the params-files.yaml. I see the argument to have a file that can't interfere with some essential parameters of the pipeline. However, I think I'm more in favor of using the "imperfect" nextflow.config file (even though, let's be honest, YAML is much cleaner than the nextflow config).
However, I am not sure I agree with the line saying that "Overriding parameters using the -c nextflow argument should be avoided at all costs!". That means I have to create a nextflow.config for each dataset I'm processing? Since I'm running the same pipeline with the same custom parameters on several parameters, can't I just have a single configuration file that's common to all my runs? With that phrasing, I feel like I'd be committing a crime by using something that's clearly well supported by nextflow. I would suggest to either:
- Remove this if we decide it's ok to use the
-coption. - Rephrase this so we don't feel terrible using that option and/or a small explanation on why this would be bad practice.
There was a problem hiding this comment.
From what I know, all you need to override configuration is a single nextflow.config file in the working directory of the pipeline. It takes precedence on everything else no matter what. Only the -c option overwrites it ... @gagnonanthony is it different when nextflow downloads the pipeline ? Do you absolutely need the -c option ?
There was a problem hiding this comment.
I meant that if there's multiple working directories (I would probably have a working directory per-dataset I'm running), I'd have to create a nextflow.config file in each working directory?
There was a problem hiding this comment.
You should always keep a trace of the parameters you use, in a script with CLI arguments (with --) or in a nextflow.config where you launch your study. I think it's better to keep those locally to each run, for history and replicability. So I think it's better to avoid -c if you can.
There was a problem hiding this comment.
Regarding the override configuration, @AlexVCaron and @levje, I don't think it's different when nextflow downloads the pipeline. The pipeline is downloaded to $HOME/.nextflow/..., so it's very unlikely this will also be your working directory. I prefer suggesting to users that they specify additional configs with -c (using a config file named something meaningful, not just nextflow.config). In my mind, if they do so, they need to understand that they are changing the pipeline's default configs (or supplying configs for their cluster).
Regarding the params.yml file, it's documented in the Nextflow docs and is recommended for providing parameters when not using the CLI (JSON is also supported, but I personally prefer YAML). It's also how nf-core recommends doing this (it's part of their boilerplate documentation for pipelines). I see two advantages of using the params.yml file:
- It's cleaner for users who do not have a lot of programming experience (and more intuitive). Here's an example:
input: '/path/to/BIDS/'
outdir: '/path/to/results/'
participant_label:
- sub-01
- sub-02
dti_shells: "0 1500"
fodf_shells: "0 1500 3000"- It creates a clear separation between what is configurations (like resources, executors, etc.) and pipeline parameters. I feel that if inexperienced users start playing in a
nextflow.config-like file (most certainly with the help of ChatGPT or something), we could end up with issues that are hard-to-track because they rewrote something that was not meant to be in the pipeline. Having them create ayamlfor parameters limits the possible problems.
Also, it gets supplied differently, using the -params-file options in nextflow. Let me know what you think!
There was a problem hiding this comment.
The location of the pipeline downloaded ($HOME/.nextflow/...) doesn't matter in the case of the configuration overriding. The location of the downloaded pipeline is labeled as the "project directory", but if we have a nextflow.config in the launch directory (from where we launch the pipeline), it will override the configuration without having to specify the -c option.
Personally, I prefer to manually input the custom configuration in the CLI with the -c option. I see 2 good points on doing that:
- It's transparent to the user that we are using a custom configuration file (or not).
- The configuration file name appears in the QC report. So if I want to reproduce results I ran 2 years ago and I forgot that I inputted a custom configuration, I can just have a look at the command I ran in the QC report and see the exact input I used for my pipeline.
On the -params-file, you're slowly convincing me (with good arguments) to prefer this way of specifying parameters. I think the idea is great. Only problem I have is that the YAML format feels disconnected to the other technologies/languages used in the pipelines (and using a JSON file would be a nightmare). Other than that, I'm all for it.
There was a problem hiding this comment.
I would personally put the custom download script as the first proposed solution, then the nf-core one.
There was a problem hiding this comment.
The custom script was and will always remain a bugfix to me. So in my books, placing it first doesn't make sense. Happy to be proven wrong. But feel free to advertise otherwise in your pipeline documentation !
There was a problem hiding this comment.
I agree with @gagnonanthony. Even thought it's more an ease-of-use script to circumvent some issues, I think the simpler solution should be the first one given to the user here. If I'm a user who has little knowledge about nextflow and its associated libraries, I don't really care about installing nf-core and all that stuff if there's a script that takes care of everything by running a single command.
levje
left a comment
There was a problem hiding this comment.
Great job. I do have the same opinion as @gagnonanthony wrt where this type of documentation belongs.
| be exact, when in doubt ensure you have **a lot** of it). The pipeline's execution should not be affected if no more | ||
| space is available to write results, but you won't have access to them easily. In which case, you might need to re-execute |
There was a problem hiding this comment.
Not so sure about this sentence makes sense: The pipeline's execution should not be affected if no more space is available to write results. If there's no more space left on my device, the pipeline does crash/stop.
There was a problem hiding this comment.
I will precise. I had a use-case on HPC in head when I wrote this. You could write results to project and process in scratch, and run out of space in the results space, while still processing correctly ... my bad 😅
| some of the steps or the whole pipeline a second time, wasting time and computing resources. | ||
|
|
||
| :::tip | ||
| Pipeline's usually work in **overwrite mode**, meaning **subsequent pipeline runs will write over previous ones for the same |
|
|
||
| :::tip | ||
| Pipeline's usually work in **overwrite mode**, meaning **subsequent pipeline runs will write over previous ones for the same | ||
| input subjects**. If unsure, consult the documentation for the specific pipeline you want to use |
There was a problem hiding this comment.
Add period at the end of sentence.
| This is your **main task**. You need to prepare the spaces where your input data lives, according to | ||
| your pipeline's input specification. You also need to allocate a space for the pipeline's outputs. You'll | ||
| need to refer to its own documentation to get everything in order, as each pipeline has its own specificities. | ||
| Nomatterwhat, here is a quick checklist to get everything in good shape, ready to address any pipeline's I/O |
There was a problem hiding this comment.
Nomatterwhat -> No matter what
| 4. **Create or edit `nextflow.config`** in the directory created at **step 1**. In it, set or replace : | ||
|
|
||
| ```groovy | ||
| params.input = 'input' | ||
| params.outdir = 'results' | ||
| ``` | ||
|
|
||
| Refer to the documentation of the pipeline you are running for any other **input parameters** needing to be set, and for **execution | ||
| parameters** that might be of interest to set given your data, project or research question. | ||
|
|
||
| :::tip[Centralize your configuration !] | ||
| You can specify configuration for the pipeline in many different ways. **We cannot recommend enough you centralize everything in the | ||
| `nextflow.config` we made you create above, for debugging purposes, but also for reuse, sharing and safekeeping**. A rule of thumb is | ||
| compiling all **static** configuration in that file, and supply parameters at the command line only to slightly execution | ||
| to specific use-cases. **Overriding parameters using the `-c` nextflow argument should be avoided at all costs !** | ||
| ::: |
There was a problem hiding this comment.
Personally, I don't even know what is the params-files.yaml. I see the argument to have a file that can't interfere with some essential parameters of the pipeline. However, I think I'm more in favor of using the "imperfect" nextflow.config file (even though, let's be honest, YAML is much cleaner than the nextflow config).
However, I am not sure I agree with the line saying that "Overriding parameters using the -c nextflow argument should be avoided at all costs!". That means I have to create a nextflow.config for each dataset I'm processing? Since I'm running the same pipeline with the same custom parameters on several parameters, can't I just have a single configuration file that's common to all my runs? With that phrasing, I feel like I'd be committing a crime by using something that's clearly well supported by nextflow. I would suggest to either:
- Remove this if we decide it's ok to use the
-coption. - Rephrase this so we don't feel terrible using that option and/or a small explanation on why this would be bad practice.
New pages for basic pipeline usage and common configuration, plus guidelines for download using both nf-core and our script.
Build the documentation and run the development server to get a view !