Skip to content

Commit 8ca3c7a

Browse files
authored
grass.tools: Add raster pack files IO to Tools (#5877)
This is adding r.pack files (aka native GRASS raster files) as input and output to tools when called through the Tools object. Tool calls such as r_grow can take r.pack files as input or output. The format is distinguished by the file extension. Notably, tool calls such as r_mapcalc don't pass input or output data as separate parameters (expressions or base names), so they can be used like that only when a wrapper exists (r_mapcalc_simple) or, in the future, when more information is included in the interface or passed between the tool and the Tools class Python code. Similarly, tools with multiple inputs or outputs in a single parameter are currently not supported. The code is using --json with the tool to get the information on what is input and what is output, because all are files which may or may not exists (this is different from NumPy arrays where the user-provided parameters clearly say what is input (object) and what is output (class)). Consequently, the whole import-export machinery is only started when there are files in the parameters as identified by the parameter converter class. Currently, the in-project raster names are driven by the file names. This will break for parallel usage and will not work for vector as is. While it is good for guessing the right (and nice) name, e.g., for r.mapcalc expression, ultimately, unique names retrieved with an API function are likely the way to go. When cashing is enabled (either through use go context manager or explicitly), import of inputs is skipped when they were already imported or when they are known outputs. Without cache, data is deleted after every tool (function) call. Cashing is keeping the in-project data in the project (as opposed to a hidden cache or deleting them). The parameter to explicitly drive this is called use_cache (originally keep_data). The objects track what is imported and also track import and cleaning tasks at function call versus object level. The data is cleaned even in case of exceptions. The interface was clarified by creating a private/protected version of run_cmd which has the internal-only parameters. This function uses a single try-finally block to trigger the cleaning in case of exceptions. While generally the code supports paths as both strings and Path objects, the actual decisions about import are made from the list of strings form of the command. From caller perspective, overwrite is supported in the same way as for in-project GRASS rasters. The tests use module scope to reduce fixture setup by couple seconds. Changes include a minor cleanup of comments in tests related to testing result without format=json and with, e.g., --json option. The class documentation discusses overhead and parallelization because the calls are more costly and there is a significant state of the object now with the cache and the rasters created in the background. This includes discussion of the NumPy arrays, too, and slightly improves the wording in part discussing arrays. This is building on top of #2923 (Tools API, and it is parallel with #5878 (NumPy array IO), although it runs at a different stage than NumPy array conversions and uses cache for the imported data (may be connected more with the arrays in the future). This can be used efficiently in Python with Tools (caching, assuming project) and in a limited way also with the experimental run subcommand in CLI (no caching, still needs an explicit project). There is more potential use of this with the standalone tools concept (#5843). The big picture is also discussed in #5830.
1 parent 7990003 commit 8ca3c7a

File tree

8 files changed

+1328
-13
lines changed

8 files changed

+1328
-13
lines changed
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
import json
2+
import sys
3+
import subprocess
4+
5+
import pytest
6+
7+
8+
def test_run_with_crs_as_pack_as_input(pack_raster_file4x5_rows):
9+
"""Check that we accept pack as input."""
10+
result = subprocess.run(
11+
[
12+
sys.executable,
13+
"-m",
14+
"grass.app",
15+
"run",
16+
"--crs",
17+
str(pack_raster_file4x5_rows),
18+
"r.univar",
19+
f"map={pack_raster_file4x5_rows}",
20+
"format=json",
21+
],
22+
capture_output=True,
23+
text=True,
24+
check=True,
25+
)
26+
assert (
27+
json.loads(result.stdout)["cells"] == 1
28+
) # because we don't set the computational region
29+
30+
31+
@pytest.mark.parametrize("crs", ["EPSG:3358", "EPSG:4326"])
32+
@pytest.mark.parametrize("extension", [".grass_raster", ".grr", ".rpack"])
33+
def test_run_with_crs_as_pack_as_output(tmp_path, crs, extension):
34+
"""Check outputting pack with different CRSs and extensions"""
35+
raster = tmp_path / f"test{extension}"
36+
subprocess.run(
37+
[
38+
sys.executable,
39+
"-m",
40+
"grass.app",
41+
"run",
42+
"--crs",
43+
crs,
44+
"r.mapcalc.simple",
45+
"expression=row() + col()",
46+
f"output={raster}",
47+
],
48+
check=True,
49+
)
50+
assert raster.exists()
51+
assert raster.is_file()
52+
result = subprocess.run(
53+
[
54+
sys.executable,
55+
"-m",
56+
"grass.app",
57+
"run",
58+
"--crs",
59+
str(raster),
60+
"g.proj",
61+
"-p",
62+
"format=json",
63+
],
64+
capture_output=True,
65+
text=True,
66+
check=True,
67+
)
68+
assert json.loads(result.stdout)["srid"] == crs
69+
70+
71+
def test_run_with_crs_as_pack_with_multiple_steps(tmp_path):
72+
"""Check that we accept pack as both input and output.
73+
74+
The extension is only tested for the output.
75+
Tests basic properties of the output.
76+
"""
77+
crs = "EPSG:3358"
78+
extension = ".grass_raster"
79+
raster_a = tmp_path / f"test_a{extension}"
80+
raster_b = tmp_path / f"test_b{extension}"
81+
subprocess.run(
82+
[
83+
sys.executable,
84+
"-m",
85+
"grass.app",
86+
"run",
87+
"--crs",
88+
crs,
89+
"r.mapcalc.simple",
90+
"expression=row() + col()",
91+
f"output={raster_a}",
92+
],
93+
check=True,
94+
)
95+
assert raster_a.exists()
96+
assert raster_a.is_file()
97+
subprocess.run(
98+
[
99+
sys.executable,
100+
"-m",
101+
"grass.app",
102+
"run",
103+
"--crs",
104+
crs,
105+
"r.mapcalc.simple",
106+
"expression=1.5 * A",
107+
f"a={raster_a}",
108+
f"output={raster_b}",
109+
],
110+
check=True,
111+
)
112+
assert raster_b.exists()
113+
assert raster_b.is_file()
114+
result = subprocess.run(
115+
[
116+
sys.executable,
117+
"-m",
118+
"grass.app",
119+
"run",
120+
"--crs",
121+
crs,
122+
"r.univar",
123+
f"map={raster_b}",
124+
"format=json",
125+
],
126+
capture_output=True,
127+
text=True,
128+
check=True,
129+
)
130+
assert (
131+
json.loads(result.stdout)["cells"] == 1
132+
) # because we don't set the computational region

python/grass/tools/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ include $(MODULE_TOPDIR)/include/Make/Python.make
66
DSTDIR = $(ETC)/python/grass/tools
77

88
MODULES = \
9+
importexport \
910
session_tools \
1011
support
1112

python/grass/tools/importexport.py

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
from __future__ import annotations
2+
3+
import subprocess
4+
from pathlib import Path
5+
from typing import Literal
6+
7+
8+
class ImporterExporter:
9+
"""Imports and exports data while keeping track of it
10+
11+
This is a class for internal use, but it may mature into a generally useful tool.
12+
"""
13+
14+
raster_pack_suffixes = (".grass_raster", ".pack", ".rpack", ".grr")
15+
16+
@classmethod
17+
def is_recognized_file(cls, value):
18+
"""Return `True` if file type is a recognized type, `False` otherwise"""
19+
return cls.is_raster_pack_file(value)
20+
21+
@classmethod
22+
def is_raster_pack_file(cls, value):
23+
"""Return `True` if file type is GRASS raster pack, `False` otherwise"""
24+
if isinstance(value, str):
25+
return value.endswith(cls.raster_pack_suffixes)
26+
if isinstance(value, Path):
27+
return value.suffix in cls.raster_pack_suffixes
28+
return False
29+
30+
def __init__(self, *, run_function, run_cmd_function):
31+
self._run_function = run_function
32+
self._run_cmd_function = run_cmd_function
33+
# At least for reading purposes, public access to the lists makes sense.
34+
self.input_rasters: list[tuple[Path, str]] = []
35+
self.output_rasters: list[tuple[Path, str]] = []
36+
self.current_input_rasters: list[tuple[Path, str]] = []
37+
self.current_output_rasters: list[tuple[Path, str]] = []
38+
39+
def process_parameter_list(self, command, **popen_options):
40+
"""Ingests any file for later imports and exports and replaces arguments
41+
42+
This function is relatively costly as it calls a subprocess to digest the parameters.
43+
44+
Returns the list of parameters with inputs and outputs replaced so that a tool
45+
will understand that, i.e., file paths into data names in a project.
46+
"""
47+
# Get processed parameters to distinguish inputs and outputs.
48+
# We actually don't know the type of the input or outputs) because that is
49+
# currently not included in --json. Consequently, we are only assuming that the
50+
# files are meant to be used as in-project data. So, we need to deal with cases
51+
# where that's not true one by one, such as r.unpack taking file,
52+
# not raster (cell), so the file needs to be left as is.
53+
parameters = self._process_parameters(command, **popen_options)
54+
tool_name = parameters["module"]
55+
args = command.copy()
56+
# We will deal with inputs right away
57+
if "inputs" in parameters:
58+
for item in parameters["inputs"]:
59+
if tool_name != "r.unpack" and self.is_raster_pack_file(item["value"]):
60+
in_project_name = self._to_name(item["value"])
61+
record = (Path(item["value"]), in_project_name)
62+
if (
63+
record not in self.output_rasters
64+
and record not in self.input_rasters
65+
and record not in self.current_input_rasters
66+
):
67+
self.current_input_rasters.append(record)
68+
for i, arg in enumerate(args):
69+
if arg.startswith(f"{item['param']}="):
70+
arg = arg.replace(item["value"], in_project_name)
71+
args[i] = arg
72+
if "outputs" in parameters:
73+
for item in parameters["outputs"]:
74+
if tool_name != "r.pack" and self.is_raster_pack_file(item["value"]):
75+
in_project_name = self._to_name(item["value"])
76+
record = (Path(item["value"]), in_project_name)
77+
# Following the logic of r.slope.aspect, we don't deal with one output repeated
78+
# more than once, but this would be the place to address it.
79+
if (
80+
record not in self.output_rasters
81+
and record not in self.current_output_rasters
82+
):
83+
self.current_output_rasters.append(record)
84+
for i, arg in enumerate(args):
85+
if arg.startswith(f"{item['param']}="):
86+
arg = arg.replace(item["value"], in_project_name)
87+
args[i] = arg
88+
return args
89+
90+
def _process_parameters(self, command, **popen_options):
91+
"""Get parameters processed by the tool itself"""
92+
popen_options["stdin"] = None
93+
popen_options["stdout"] = subprocess.PIPE
94+
# We respect whatever is in the stderr option because that's what the user
95+
# asked for and will expect to get in case of error (we pretend that it was
96+
# the intended run, not our special run before the actual run).
97+
return self._run_cmd_function([*command, "--json"], **popen_options)
98+
99+
def _to_name(self, value, /):
100+
return Path(value).stem
101+
102+
def import_rasters(self, rasters, *, env):
103+
for raster_file, in_project_name in rasters:
104+
# Overwriting here is driven by the run function.
105+
self._run_function(
106+
"r.unpack",
107+
input=raster_file,
108+
output=in_project_name,
109+
superquiet=True,
110+
env=env,
111+
)
112+
113+
def export_rasters(
114+
self, rasters, *, env, delete_first: bool, overwrite: Literal[True] | None
115+
):
116+
# Pack the output raster
117+
for raster_file, in_project_name in rasters:
118+
# Overwriting a file is a warning, so to avoid it, we delete the file first.
119+
# This creates a behavior consistent with command line tools.
120+
if delete_first:
121+
Path(raster_file).unlink(missing_ok=True)
122+
123+
# Overwriting here is driven by the run function and env.
124+
self._run_function(
125+
"r.pack",
126+
input=in_project_name,
127+
output=raster_file,
128+
flags="c",
129+
superquiet=True,
130+
env=env,
131+
overwrite=overwrite,
132+
)
133+
134+
def import_data(self, *, env):
135+
# We import the data, make records for later, and the clear the current list.
136+
self.import_rasters(self.current_input_rasters, env=env)
137+
self.input_rasters.extend(self.current_input_rasters)
138+
self.current_input_rasters = []
139+
140+
def export_data(
141+
self, *, env, delete_first: bool = False, overwrite: Literal[True] | None = None
142+
):
143+
# We export the data, make records for later, and the clear the current list.
144+
self.export_rasters(
145+
self.current_output_rasters,
146+
env=env,
147+
delete_first=delete_first,
148+
overwrite=overwrite,
149+
)
150+
self.output_rasters.extend(self.current_output_rasters)
151+
self.current_output_rasters = []
152+
153+
def cleanup(self, *, env):
154+
# We don't track in what mapset the rasters are, and we assume
155+
# the mapset was not changed in the meantime.
156+
remove = [name for (unused, name) in self.input_rasters]
157+
remove.extend([name for (unused, name) in self.output_rasters])
158+
if remove:
159+
self._run_function(
160+
"g.remove",
161+
type="raster",
162+
name=remove,
163+
superquiet=True,
164+
flags="f",
165+
env=env,
166+
)
167+
self.input_rasters = []
168+
self.output_rasters = []

0 commit comments

Comments
 (0)