Skip to content

Profile data dumping #6723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 86 commits into from
Jun 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
135a24e
Add config pydantic model
GeigerJ2 May 1, 2025
22ed3c6
Add detect.py
GeigerJ2 May 1, 2025
7925b61
Add group-node-mapping
GeigerJ2 May 1, 2025
7d74cb5
Add dump logger
GeigerJ2 May 1, 2025
be3e9ea
Add dump engine
GeigerJ2 May 1, 2025
6d6706a
Add dump managers
GeigerJ2 May 1, 2025
635e244
Add facades
GeigerJ2 May 1, 2025
ed7e0a4
Add utils
GeigerJ2 May 1, 2025
24cabe7
Add changes to CLI
GeigerJ2 May 1, 2025
0323514
Add changes to init, disable mypy for feature for now
GeigerJ2 May 1, 2025
4a83c11
Add changes to docs
GeigerJ2 May 1, 2025
134f65d
Add changes to and additional tests
GeigerJ2 May 1, 2025
43c22f4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 1, 2025
893f3be
Fix bug in explicitly-groupd sub-workflows being filtered out for pro…
GeigerJ2 May 5, 2025
4e43196
Fix group validation exception on `verdi profile dump -G` and creatio…
GeigerJ2 May 6, 2025
41f0839
Remove default prefix and appendix for group and profile dump paths
GeigerJ2 May 16, 2025
1126d74
Split up ProfileDumpManager
GeigerJ2 May 19, 2025
10413ff
Centralize all path handling in instance-based DumpPathPolicy
GeigerJ2 May 19, 2025
b4a4707
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 19, 2025
e8899d2
Revert name to DumpPaths
GeigerJ2 May 20, 2025
64be6f4
Minor changes to CLI interfaces.
GeigerJ2 May 20, 2025
640d899
Rename logger module
GeigerJ2 May 20, 2025
ef6af68
Rename dump logging infrastructure
GeigerJ2 May 20, 2025
0310d08
Singledispatch for default dump path method.
GeigerJ2 May 20, 2025
125a308
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 20, 2025
86c5da5
Update docs, and remove ProfileDumpSelection enum
GeigerJ2 May 21, 2025
1d5e76d
Remove facades (apart from ProcessDumper). This breaks integration te…
GeigerJ2 May 21, 2025
d6af449
Remove `dump_data` and `dump_processes` flags and not yet implemented…
GeigerJ2 May 23, 2025
84f4751
Fix integration tests to use ORM dump methods rather than Dumper clas…
GeigerJ2 May 23, 2025
22849b1
Rename Managers to Executors
GeigerJ2 May 23, 2025
9db90ad
Fix unwanted group dump nesting
GeigerJ2 May 23, 2025
a4bd051
Finalize rename to executors
GeigerJ2 May 23, 2025
25570ae
Remove config YAML file. Re-introduce later on
GeigerJ2 May 26, 2025
fe8e5ce
Remove bare exception handlers
GeigerJ2 May 26, 2025
94cffcf
Remove DumpRegistryCollection
GeigerJ2 May 26, 2025
4ad1d9f
Simplify DumpConfig (remove serialization to click options)
GeigerJ2 May 26, 2025
de1b418
Fix pre-commit
GeigerJ2 May 26, 2025
ea9db92
Start cleaning up unnecessary logging.
GeigerJ2 May 26, 2025
a85c3bc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 26, 2025
2c9f728
Clean up and streamline detect.py
GeigerJ2 May 27, 2025
115433f
Fix ungrouped path missing on group deletion
GeigerJ2 May 27, 2025
1ce9404
Start finalizing engine.py
GeigerJ2 May 27, 2025
fad3fe7
Clean up and finalize engine.py
GeigerJ2 May 27, 2025
36e54f5
Simplify access to DumpStore (now ProcessingQueue) and DumpTracker re…
GeigerJ2 May 27, 2025
ca02573
Pre executor refactor
GeigerJ2 May 27, 2025
4d4c1d3
Fix RTD and bugs
GeigerJ2 May 27, 2025
9c38f61
Done with all executors apart from Process. Fix bug with overwrite mode.
GeigerJ2 May 28, 2025
eb04a74
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 28, 2025
c29071f
Go through `ProcessDumpExecutor``
GeigerJ2 May 28, 2025
2561d4e
Make module private
GeigerJ2 May 28, 2025
11c1d8e
Finalize `tracking.py`. Fix last and current group_mapping issue.
GeigerJ2 May 30, 2025
986bc87
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 30, 2025
ee58740
Finalize `utils.py` and solve dry-run issue
GeigerJ2 May 30, 2025
46d898e
Remove old, failing ProcessDumpExecutor unit tests for now.
GeigerJ2 May 30, 2025
6066aab
Remove unused type ignore
GeigerJ2 May 30, 2025
6f32d27
Expose arguments via `dump` methods of orm types explicitly, and intr…
GeigerJ2 May 30, 2025
0e9c70f
Finalize config.py and make loggers reflect private module
GeigerJ2 Jun 2, 2025
5127112
Update docs for data dumping
GeigerJ2 Jun 2, 2025
5393d93
Fix failing RTD references
GeigerJ2 Jun 2, 2025
70a392c
Fix dump_mode evaluation via config. For group dump, build mapping on…
GeigerJ2 Jun 2, 2025
1acecee
See if using `field_validators` and simpler type annotations fixes sp…
GeigerJ2 Jun 2, 2025
05e962c
Fix codes and computers type annotation
GeigerJ2 Jun 2, 2025
add67ea
Remove offending annotations for RTD
GeigerJ2 Jun 2, 2025
5a2b92b
Fix type annotation for groups input of config model.
GeigerJ2 Jun 2, 2025
e109917
Expose ProcessDumper via 'public' API. Remove some `aiida_profile_cle…
GeigerJ2 Jun 2, 2025
920161e
Fix RTD
GeigerJ2 Jun 2, 2025
ca3133a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2025
ec1df86
Ignore `_dumping` module from codecov in `pyproject.toml`
GeigerJ2 Jun 4, 2025
8c23bfb
Add minimal CLI tests for dumping feature
GeigerJ2 Jun 4, 2025
4188200
Fix broken dry-run for process dump
GeigerJ2 Jun 4, 2025
35638a3
Expand process dump API and CLI tests
GeigerJ2 Jun 4, 2025
65b7ec9
Fix rtd
GeigerJ2 Jun 4, 2025
0008f8d
Add tests for group dump CLI
GeigerJ2 Jun 4, 2025
864ca34
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 4, 2025
4cea549
Add tests to orm.Group `dump` endpoint
GeigerJ2 Jun 4, 2025
5d53642
Start adding dump API tests to profile
GeigerJ2 Jun 4, 2025
48499d4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 4, 2025
e85d63c
Almost done with tests for `Profile.dump()`
GeigerJ2 Jun 5, 2025
ec0706c
Finalize `Profile.dump` tests and make `_safe_delete_directory` stati…
GeigerJ2 Jun 5, 2025
2647320
Merge remote-tracking branch 'upstream/main' into feature/verdi-profi…
GeigerJ2 Jun 5, 2025
58b4370
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 5, 2025
56e8ae3
Fix tests breaking for groups through past accidental search-and-replace
GeigerJ2 Jun 5, 2025
4193ebd
Fix mypy for group types
GeigerJ2 Jun 5, 2025
e5e90ec
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 5, 2025
435a044
Add minimal CLI inferface tests for `verdi profile dump`
GeigerJ2 Jun 5, 2025
0b77ca1
Merge remote-tracking branch 'upstream/main' into feature/verdi-profi…
GeigerJ2 Jun 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,6 @@ coverage:
patch:
default:
threshold: 0.1%

ignore:
- src/aiida/tools/_dumping/**/*
425 changes: 379 additions & 46 deletions docs/source/howto/data.rst

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions docs/source/reference/command_line.rst
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ Below is a list with all available subcommands.
create Create an empty group with a given label.
delete Delete groups and (optionally) the nodes they contain.
description Change the description of a group.
dump Dump data of an AiiDA group to disk.
list Show a list of existing groups.
move-nodes Move the specified NODES from one group to another.
path Inspect groups of nodes, with delimited label paths.
Expand Down Expand Up @@ -397,6 +398,7 @@ Below is a list with all available subcommands.
Commands:
configure-rabbitmq Configure RabbitMQ for a profile.
delete Delete one or more profiles.
dump Dump all data in an AiiDA profile's storage to disk.
list Display a list of all available profiles.
set-default Set a profile as the default profile.
setdefault (Deprecated) Set a profile as the default profile.
Expand Down
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,11 @@ Documentation = 'https://aiida.readthedocs.io'
Home = 'http://www.aiida.net/'
Source = 'https://github.com/aiidateam/aiida-core'

[tool.coverage.run]
omit = [
"src/aiida/tools/_dumping/**/*"
]

[tool.flit.module]
name = 'aiida'

Expand Down
110 changes: 109 additions & 1 deletion src/aiida/cmdline/commands/cmd_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ def group_move_nodes(source_group, target_group, force, nodes, all_entries):

if not force:
click.confirm(
f'Are you sure you want to move {len(nodes)} nodes from {source_group} ' f'to {target_group}?', abort=True
f'Are you sure you want to move {len(nodes)} nodes from {source_group} to {target_group}?', abort=True
)

source_group.remove_nodes(nodes)
Expand Down Expand Up @@ -325,6 +325,11 @@ def group_relabel(group, label):
echo.echo_critical(str(exception))
else:
echo.echo_success(f"Label changed to '{label}'")
msg = (
'Note that if you are dumping your profile data to disk, to reflect the relabeling of the group, '
'run your `verdi profile dump` command again.'
)
echo.echo_report(msg)


@verdi_group.command('description')
Expand Down Expand Up @@ -632,3 +637,106 @@ def group_path_ls(path, type_string, recursive, as_table, no_virtual, with_descr
if no_virtual and child.is_virtual:
continue
echo.echo(child.path, bold=not child.is_virtual)


@verdi_group.command('dump')
@arguments.GROUP()
@options.PATH()
@options.DRY_RUN()
@options.OVERWRITE()
@options.PAST_DAYS()
@options.START_DATE()
@options.END_DATE()
@options.FILTER_BY_LAST_DUMP_TIME()
@options.ONLY_TOP_LEVEL_CALCS()
@options.ONLY_TOP_LEVEL_WORKFLOWS()
@options.DELETE_MISSING()
@options.SYMLINK_CALCS()
@options.INCLUDE_INPUTS()
@options.INCLUDE_OUTPUTS()
@options.INCLUDE_ATTRIBUTES()
@options.INCLUDE_EXTRAS()
@options.FLAT()
@options.DUMP_UNSEALED()
@click.pass_context
@with_dbenv()
def group_dump(
ctx,
group,
path,
dry_run,
overwrite,
past_days,
start_date,
end_date,
filter_by_last_dump_time,
delete_missing,
only_top_level_calcs,
only_top_level_workflows,
symlink_calcs,
include_inputs,
include_outputs,
include_attributes,
include_extras,
flat,
dump_unsealed,
):
"""Dump data of an AiiDA group to disk."""

import traceback
from pathlib import Path

from aiida.cmdline.utils import echo
from aiida.tools._dumping.utils import DumpPaths

warning_msg = (
'This is a new feature which is still in its testing phase. '
'If you encounter unexpected behavior or bugs, please report them via Discourse or GitHub.'
)
echo.echo_warning(warning_msg)

try:
if path is None:
group_path = DumpPaths.get_default_dump_path(group)
dump_base_output_path = Path.cwd() / group_path
echo.echo_report(f'No output path specified. Using default: `{dump_base_output_path}`')
else:
dump_base_output_path = Path(path).resolve()
echo.echo_report(f'Using specified output path: `{dump_base_output_path}`')

# --- Logical Checks ---
if dry_run and overwrite:
msg = (
'`--dry-run` and `--overwrite` selected (or set in config). Overwrite operation will NOT be performed.'
)
echo.echo_warning(msg)

# Run the dumping
group.dump(
output_path=dump_base_output_path,
dry_run=dry_run,
overwrite=overwrite,
past_days=past_days,
start_date=start_date,
end_date=end_date,
filter_by_last_dump_time=filter_by_last_dump_time,
only_top_level_calcs=only_top_level_calcs,
only_top_level_workflows=only_top_level_workflows,
symlink_calcs=symlink_calcs,
include_inputs=include_inputs,
include_outputs=include_outputs,
include_attributes=include_attributes,
include_extras=include_extras,
flat=flat,
dump_unsealed=dump_unsealed,
)

if not dry_run:
msg = f'Raw files for group `{group.label}` dumped into folder `{dump_base_output_path.name}`.'
echo.echo_success(msg)
else:
echo.echo_success('Dry run completed.')

except Exception as e:
msg = f'Unexpected error during dump of group {group.label}:\n ({e!s}).\n'
echo.echo_critical(msg + traceback.format_exc())
121 changes: 59 additions & 62 deletions src/aiida/cmdline/commands/cmd_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from aiida.cmdline.params import arguments, options, types
from aiida.cmdline.params.options.overridable import OverridableOption
from aiida.cmdline.utils import decorators, echo
from aiida.cmdline.utils.decorators import with_dbenv
from aiida.common.log import LOG_LEVELS, capture_logging

REPAIR_INSTRUCTIONS = """\
Expand Down Expand Up @@ -583,98 +584,94 @@ def process_repair(manager, broker, dry_run):
@verdi_process.command('dump')
@arguments.PROCESS()
@options.PATH()
@options.DRY_RUN()
@options.OVERWRITE()
@click.option(
'--include-inputs/--exclude-inputs',
default=True,
show_default=True,
help='Include the linked input nodes of the `CalculationNode`(s).',
)
@click.option(
'--include-outputs/--exclude-outputs',
default=False,
show_default=True,
help='Include the linked output nodes of the `CalculationNode`(s).',
)
@click.option(
'--include-attributes/--exclude-attributes',
default=True,
show_default=True,
help='Include attributes in the `.aiida_node_metadata.yaml` written for every `ProcessNode`.',
)
@click.option(
'--include-extras/--exclude-extras',
default=True,
show_default=True,
help='Include extras in the `.aiida_node_metadata.yaml` written for every `ProcessNode`.',
)
@click.option(
'-f',
'--flat',
is_flag=True,
default=False,
show_default=True,
help='Dump files in a flat directory for every step of the workflow.',
)
@click.option(
'--dump-unsealed',
is_flag=True,
default=False,
show_default=True,
help='Also allow the dumping of unsealed process nodes.',
)
@options.INCREMENTAL()
@options.INCLUDE_INPUTS()
@options.INCLUDE_OUTPUTS()
@options.INCLUDE_ATTRIBUTES()
@options.INCLUDE_EXTRAS()
@options.FLAT()
@options.DUMP_UNSEALED()
@click.pass_context
@with_dbenv()
def process_dump(
ctx,
process,
path,
dry_run,
overwrite,
include_inputs,
include_outputs,
include_attributes,
include_extras,
flat,
dump_unsealed,
incremental,
) -> None:
"""Dump process input and output files to disk.

Child calculations/workflows (also called `CalcJob`s/`CalcFunction`s and `WorkChain`s/`WorkFunction`s in AiiDA
jargon) run by the parent workflow are contained in the directory tree as sub-folders and are sorted by their
creation time. The directory tree thus mirrors the logical execution of the workflow, which can also be queried by
creation time. The directory tree thus mirrors the logical execution of the workflow, which can also be queried by
running `verdi process status <pk>` on the command line.

By default, input and output files of each calculation can be found in the corresponding "inputs" and
"outputs" directories (the former also contains the hidden ".aiida" folder with machine-readable job execution
settings). Additional input and output files (depending on the type of calculation) are placed in the "node_inputs"
and "node_outputs", respectively.

Lastly, every folder also contains a hidden, human-readable `.aiida_node_metadata.yaml` file with the relevant AiiDA
Lastly, every folder also contains a hidden, human-readable `aiida_node_metadata.yaml` file with the relevant AiiDA
node data for further inspection.
"""
import traceback
from pathlib import Path

from aiida.cmdline.utils import echo
from aiida.tools._dumping.utils import DumpPaths
from aiida.tools.archive.exceptions import ExportValidationError
from aiida.tools.dumping.processes import ProcessDumper

process_dumper = ProcessDumper(
include_inputs=include_inputs,
include_outputs=include_outputs,
include_attributes=include_attributes,
include_extras=include_extras,
overwrite=overwrite,
flat=flat,
dump_unsealed=dump_unsealed,
incremental=incremental,

warning_msg = (
'This is a new feature which is still in its testing phase. '
'If you encounter unexpected behavior or bugs, please report them via Discourse or GitHub.'
)
echo.echo_warning(warning_msg)

# Check for dry_run + overwrite
if overwrite and dry_run:
msg = 'Both `dry_run` and `overwrite` set to true. Operation will NOT be performed.'
echo.echo_warning(msg)
return

if path is None:
process_path = DumpPaths.get_default_dump_path(process)
dump_base_output_path = Path.cwd() / process_path
msg = f'No output path specified. Using default: `{dump_base_output_path}`'
echo.echo_report(msg)
else:
echo.echo_report(f'Using specified output path: `{path}`')
dump_base_output_path = Path(path).resolve()

if dry_run:
echo.echo_success('Dry run completed.')
return

# Execute dumping
try:
dump_path = process_dumper.dump(process_node=process, output_path=path)
except FileExistsError:
echo.echo_critical(
'Dumping directory exists and overwrite is False. Set overwrite to True, or delete directory manually.'
process.dump(
output_path=dump_base_output_path,
dry_run=dry_run,
overwrite=overwrite,
include_inputs=include_inputs,
include_outputs=include_outputs,
include_attributes=include_attributes,
include_extras=include_extras,
flat=flat,
dump_unsealed=dump_unsealed,
)

msg = f'Raw files for process `{process.pk}` dumped into folder `{dump_base_output_path.name}`.'
echo.echo_success(msg)
except ExportValidationError as e:
echo.echo_critical(f'{e!s}')
echo.echo_critical(f'Data validation error during dump: {e!s}')
except Exception as e:
echo.echo_critical(f'Unexpected error while dumping {process.__class__.__name__} <{process.pk}>:\n ({e!s}).')

echo.echo_success(f'Raw files for {process.__class__.__name__} <{process.pk}> dumped into folder `{dump_path}`.')
msg = f'Unexpected error during dump of process {process.pk}:\n ({e!s}).\n'
echo.echo_critical(msg + traceback.format_exc())
Loading
Loading