fix spelling errors, configure codespell

hydrologie · Feb 10, 2025 · 5dc9b20 · 5dc9b20
1 parent 7307add
commit 5dc9b20
Show file tree

Hide file tree

Showing 17 changed files with 80 additions and 70 deletions.
diff --git a/Makefile b/Makefile
@@ -71,7 +71,7 @@ lint: lint/flake8 lint/black ## check style
 test: ## run tests quickly with the default Python
 	python -m pytest
 
-test-distributed: ## run tests quickly with the default Python and distibuted workers
+test-distributed: ## run tests quickly with the default Python and distributed workers
 	python -m pytest --num-processes=logical
 
 test-notebooks: ## run tests on notebooks and compare outputs

diff --git a/docs/installation.rst b/docs/installation.rst
@@ -2,7 +2,7 @@
 Installation
 ============
 
-We strongly recommend installing `xhydro` in an Anaconda Python environment. Futhermore, due to the complexity of some packages, the default dependency solver can take a long time to resolve the environment. If `mamba` is not already your default solver, consider running the following commands in order to speed up the process:
+We strongly recommend installing `xhydro` in an Anaconda Python environment. Furthermore, due to the complexity of some packages, the default dependency solver can take a long time to resolve the environment. If `mamba` is not already your default solver, consider running the following commands in order to speed up the process:
 
     .. code-block:: console
 

diff --git a/docs/locales/fr/LC_MESSAGES/notebooks/regional_frequency_analysis.po b/docs/locales/fr/LC_MESSAGES/notebooks/regional_frequency_analysis.po
@@ -319,8 +319,8 @@ msgstr "Incertitudes de l'analyse fréquentielle locale"
 
 #: ../../notebooks/regional_frequency_analysis.ipynb:485
 msgid ""
-"To add some uncertainities, we will work with only one catchment and two "
-"distributions as uncertainities can be intensive in computation. We "
+"To add some uncertainties, we will work with only one catchment and two "
+"distributions as uncertainties can be intensive in computation. We "
 "select the station 023401, and distribution 'genextreme' and 'pearson3'."
 msgstr ""
 "Pour ajouter l'incertitude, nous travaillerons avec un seul "
@@ -342,7 +342,7 @@ msgid "Bootstraping the observations"
 msgstr "Rééchantillonnage des observations"
 
 #: ../../notebooks/regional_frequency_analysis.ipynb:514
-msgid "A way to get uncertainities is to bootstrap the observations 200 times."
+msgid "A way to get uncertainties is to bootstrap the observations 200 times."
 msgstr ""
 "Une façon d’obtenir des incertitudes est de rééchantillonner les observations "
 "200 fois."
@@ -365,10 +365,10 @@ msgstr "Incertitudes de l'analyse fréquentielle régionale"
 
 #: ../../notebooks/regional_frequency_analysis.ipynb:642
 msgid ""
-"For the regional analysis, we again use ``boostrap_obs`` to resample the "
+"For the regional analysis, we again use ``bootstrap_obs`` to resample the "
 "observations, but, this time, it's much faster as no fit is involved."
 msgstr ""
-"Pour l'analyse régionale, nous utilisons à nouveau ``boostrap_obs`` pour "
+"Pour l'analyse régionale, nous utilisons à nouveau ``bootstrap_obs`` pour "
 "rééchantillonner les observations, mais, cette fois, c'est beaucoup plus "
 "rapide, car aucun ajustement n'est impliqué."
 

diff --git a/docs/notebooks/extreme_value_analysis.ipynb b/docs/notebooks/extreme_value_analysis.ipynb
@@ -167,7 +167,7 @@
    "source": [
     "### Parameter estimation for non-stationary model\n",
     "\n",
-    "For this example the location paramerter vary as linear funcion of the year. To do this, a new dimension containing the year is created."
+    "For this example the location parameter varies as a linear function of the year. To do this, a new dimension containing the year is created."
    ]
   },
   {
@@ -217,7 +217,7 @@
    "source": [
     "### Return level estimation for non-stationary model\n",
     "\n",
-    "100-year return level with the location paramerter vary as linear funcion of the year."
+    "100-year return level with the location parameter varies as a linear function of the year."
    ]
   },
   {

diff --git a/docs/notebooks/gis.ipynb b/docs/notebooks/gis.ipynb
@@ -889,7 +889,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can also extract the surface properties for the same `gpd.GeoDataFrame` : "
+    "We can also extract the surface properties for the same `gpd.GeoDataFrame` :"
    ]
   },
   {
@@ -1019,7 +1019,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Again, for convenience, we can output the results in `xarray.Dataset` format : "
+    "Again, for convenience, we can output the results in `xarray.Dataset` format :"
    ]
   },
   {
@@ -1464,7 +1464,7 @@
    "metadata": {},
    "source": [
     "### b) Land-use classification\n",
-    "Land use classification is powered by the Planetary Computer's STAC catalog. It uses the `10m Annual Land Use Land Cover` dataset by default (\"io-lulc-annual-v02\"), but other collections can be specified by using the collection argument. "
+    "Land use classification is powered by the Planetary Computer's STAC catalog. It uses the `10m Annual Land Use Land Cover` dataset by default (\"io-lulc-annual-v02\"), but other collections can be specified by using the collection argument."
    ]
   },
   {
@@ -1977,7 +1977,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Because the next few steps use [xclim](https://xclim.readthedocs.io/en/stable/index.html) under the hood, the dataset is required to be [CF-compliant](http://cfconventions.org/cf-conventions/cf-conventions.html). At a minimum, the `xarray.DataArray` used must follow these principles:\n",
+    "Because the next few steps use [xclim](https://xclim.readthedocs.io/en/stable/index.html) under the hood, the dataset is required to be [CF-compliant](https://cfconventions.org/cf-conventions/cf-conventions.html). At a minimum, the `xarray.DataArray` used must follow these principles:\n",
     "\n",
     "- The dataset needs a time dimension, usually at a daily frequency with no missing timesteps (NaNs are supported). If your data differs from that, you'll need to be extra careful on the results provided.\n",
     "- If there is a spatial dimension, such as \"``Station``\" in the example below, it needs an attribute ``cf_role`` with ``timeseries_id`` as its value.\n",
@@ -3965,7 +3965,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The same data can also be visualized as a `pd.DataFrame` as well : "
+    "The same data can also be visualized as a `pd.DataFrame` as well :"
    ]
   },
   {

diff --git a/docs/notebooks/hydrological_modelling.ipynb b/docs/notebooks/hydrological_modelling.ipynb
@@ -91,7 +91,7 @@
     "Hydrological models can differ from one another in terms of required inputs and available functions, but an effort will be made to homogenize them as much as possible as new models get added. Currently, all models have these 3 functions:\n",
     "- `.run()` which will execute the model, reformat the outputs to be compatible with analysis tools in `xhydro`, then return the simulated streamflows as a `xarray.Dataset`.\n",
     "  - The streamflow will be called `streamflow` and have units in `m3 s-1`.\n",
-    "  - In the case of 1D data (such as hydrometric stations), that dimension in the dataset will be identified trough a `cf_role: timeseries_id` attribute.\n",
+    "  - In the case of 1D data (such as hydrometric stations), that dimension in the dataset will be identified through a `cf_role: timeseries_id` attribute.\n",
     "- `.get_inputs()` to retrieve the meteorological inputs.\n",
     "- `.get_streamflow()` to retrieve the simulated streamflow.\n",
     "\n",

diff --git a/docs/notebooks/optimal_interpolation.ipynb b/docs/notebooks/optimal_interpolation.ipynb
@@ -13,7 +13,7 @@
    "source": [
     "Optimal interpolation is a tool that allows combining a spatially distributed field (i.e. the \"background field\") with point observations in such a way that the entire field can be adjusted according to deviations between the observations and the field at the point of observations. For example, it can be used to combine a field of reanalysis precipitation (e.g. ERA5) with observation records, and thus adjust the reanalysis precipitation over the entire domain in a statistically optimal manner.\n",
     "\n",
-    "This page demonstrates how to use `xhydro` to perform optimal interpolation using field-like simulations and point observations for hydrological modelling. In this case, the background field is a set of outputs from a distributed hydrological model and the observations correspond to real hydrometric stations. The aim is to correct the background field (i.e. the distributed hydrological simulations) using optimal interpolation, as in Lachance-Cloutier et al (2017).\n",
+    "This page demonstrates how to use `xhydro` to perform optimal interpolation using field-like simulations and point observations for hydrological modelling. In this case, the background field is a set of outputs from a distributed hydrological model and the observations correspond to real hydrometric stations. The aim is to correct the background field (i.e. the distributed hydrological simulations) using optimal interpolation, as in Lachance-Cloutier et al. (2017).\n",
     "\n",
     "*Lachance-Cloutier, S., Turcotte, R. and Cyr, J.F., 2017. Combining streamflow observations and hydrologic simulations for the retrospective estimation of daily streamflow for ungauged rivers in southern Quebec (Canada). Journal of hydrology, 550, pp.294-306.*"
    ]
@@ -99,7 +99,7 @@
     "\n",
     "We now have the basic data required to start processing using optimal interpolation. However, before doing so, we must provide some hyperparameters. Some are more complex than others, so let's break down the main steps.\n",
     "\n",
-    "The first is the need to compute differences (also referred to as \"departures\" between observations and simulations where they both occur simultaneously. We also need to scale the data by the catchment area to ensure errors are relative and can then be interpolated. We then take the logarithm of these values to ensure extrapolation does not cause negative streamflow. We will reverse the transformation later."
+    "The first is the need to compute differences (also referred to as \"departures\") between observations and simulations where they both occur simultaneously. We also need to scale the data by the catchment area to ensure errors are relative and can then be interpolated. We then take the logarithm of these values to ensure extrapolation does not cause negative streamflow. We will reverse the transformation later."
    ]
   },
   {
@@ -429,7 +429,7 @@
     "\n",
     "Notice that there are again 274 stations, like in the \"qobs\" dataset. This is because this specific dataset was used to perform leave-one-out cross validation to assess the optimal interpolation performance, and as such, only simulations at gauged sites is of interest. In an operational setting, there is no limit on the number of stations for \"qsim\".\n",
     "\n",
-    "Now let's take a look at the correspondance tables and the observed station dataset."
+    "Now let's take a look at the correspondence tables and the observed station dataset."
    ]
   },
   {
@@ -521,7 +521,7 @@
     "# If we do a leave-one-out cross-validation over the 96 catchments, the entire optimal interpolation process is repeated 96 times but\n",
     "# only over the observation sites, each time leaving one station out and kept independent for validation. This is time-consuming and\n",
     "# can be parallelized by adjusting this flag and setting an appropriate number of CPU cores according to your computer. By default,\n",
-    "# the code will only use 1 core. However, if increased, the maximum number tht will be actually used is ([number-of-available-cores / 2] - 1)\n",
+    "# the code will only use 1 core. However, if increased, the maximum number that will be actually used is ([number-of-available-cores / 2] - 1)\n",
     "# CPU cores as to not overexert the computer.\n",
     "parallelize = False\n",
     "max_cores = 1\n",

diff --git a/docs/notebooks/regional_frequency_analysis.ipynb b/docs/notebooks/regional_frequency_analysis.ipynb
@@ -178,7 +178,7 @@
    "metadata": {},
    "source": [
     "### b) Clustering\n",
-    "In this example we'll use `AgglomerativeClustering`, but other methods would also provide valid results. The regional clustering itself is performed using xhfa.regional.get_group_from_fit, which can take the arguments of the skleanr functions as a dictionnary."
+    "In this example we'll use `AgglomerativeClustering`, but other methods would also provide valid results. The regional clustering itself is performed using xhfa.regional.get_group_from_fit, which can take the arguments of the skleanr functions as a dictionary."
    ]
   },
   {
@@ -261,15 +261,15 @@
     "- **Interpretation**:\n",
     "\n",
     "    - **Low Z-Score**: A good fit of the model to the observed data. Typically, an absolute value of the Z-Score below 1.64 suggests that the model is appropriate and the fit is statistically acceptable.\n",
-    "    \n",
+    "\n",
     "    - **High Z-Score**: Indicates significant discrepancies between the observed and expected values. An absolute value above 1.64 suggests that the model may not fit the data well, and adjustments might be necessary.\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To calculate H and Z, we also need a `KappaGen` object from the lmoment3 library. This library is not part of the default xhydro environment and will need to be installed seperately."
+    "aTo calculate H and Z, we also need a `KappaGen` object from the lmoments3 library. This library is not part of the default xhydro environment and will need to be installed separately."
    ]
   },
   {
@@ -392,8 +392,8 @@
    "source": [
     "# Uncertainties\n",
     "## Local frequency analysis uncertainties\n",
-    "To add some uncertainities, we will work with only one catchment and two distributions, as uncertainities can be intensive in computation.\n",
-    "We select the station 023401, and distribution 'genextreme' and 'pearson3'. \n",
+    "To add some uncertainties, we will work with only one catchment and two distributions, as uncertainties can be intensive in computation.\n",
+    "We select the station 023401, and distribution 'genextreme' and 'pearson3'.\n",
     "\n",
     "For the local frequency analysis, we need to fit the distribution so the calulting time can be long."
    ]
@@ -414,8 +414,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Bootstraping the observations\n",
-    "A way to get uncertainities is to bootstrap the observations. For this example, we will boostrap the observations a low amount of times, although a higher number (e.g. 5000) would be preferable in practice."
+    "### Bootstrapping the observations\n",
+    "A way to get uncertainties is to bootstrap the observations. For this example, we will bootstrap the observations a low amount of times, although a higher number (e.g. 5000) would be preferable in practice."
    ]
   },
   {
@@ -424,7 +424,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "ds_4fa_iter = xhfa.uncertainities.boostrap_obs(ds_4fa_one_station, 35)\n",
+    "ds_4fa_iter = xhfa.uncertainties.bootstrap_obs(ds_4fa_one_station, 35)\n",
     "params_boot_obs = xhfa.local.fit(ds_4fa_iter, distributions=[\"genextreme\", \"pearson3\"])"
    ]
   },
@@ -454,10 +454,10 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "values = xhfa.uncertainities.boostrap_dist(\n",
+    "values = xhfa.uncertainties.bootstrap_dist(\n",
     "    ds_4fa_one_station, params_loc_one_station, 35\n",
     ")\n",
-    "params_boot_dist = xhfa.uncertainities.fit_boot_dist(values)"
+    "params_boot_dist = xhfa.uncertainties.fit_boot_dist(values)"
    ]
   },
   {
@@ -523,9 +523,9 @@
    "metadata": {},
    "source": [
     "## Regional frequency analysis uncertainties\n",
-    "### Bootstraping the observations\n",
+    "### Bootstrapping the observations\n",
     "\n",
-    "For the regional analysis, we again use `boostrap_obs` to resample the observations, but, this time, it's much faster as no fit is involved."
+    "For the regional analysis, we again use `bootstrap_obs` to resample the observations, but, this time, it's much faster as no fit is involved."
    ]
   },
   {
@@ -534,8 +534,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "ds_reg_samples = xhfa.uncertainities.boostrap_obs(ds_4fa, 35)\n",
-    "ds_moments_iter = xhfa.uncertainities.calc_moments_iter(ds_reg_samples).load()"
+    "ds_reg_samples = xhfa.uncertainties.bootstrap_obs(ds_4fa, 35)\n",
+    "ds_moments_iter = xhfa.uncertainties.calc_moments_iter(ds_reg_samples).load()"
    ]
   },
   {
@@ -544,7 +544,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "Q_reg_boot = xhfa.uncertainities.calc_q_iter(\n",
+    "Q_reg_boot = xhfa.uncertainties.calc_q_iter(\n",
     "    \"023401\", \"streamflow_max_annual\", ds_groups_H1, ds_moments_iter, return_periods\n",
     ")"
    ]
@@ -562,7 +562,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Since we'll do a few plots to illustrate the results, let's make a function to somplify things a litle."
+    "Since we'll do a few plots to illustrate the results, let's make a function to simplify things a little."
    ]
   },
   {
@@ -620,7 +620,7 @@
    "metadata": {},
    "source": [
     "### Multiple regions\n",
-    "Another way to get the uncertainty is to have many regions for one catchement of interest. We can achive this by trying different clustering methods. Or by performing a jackknife on the station list. We dont do too many tests here since it can take quite a while to run and the goal is just to illustrate the possibilities"
+    "Another way to get the uncertainty is to have many regions for one catchment of interest. We can achieve this by trying different clustering methods. Or by performing a jackknife on the station list. It can take quite a while to run, so we show here a simplified example; The goal is just to illustrate the possibilities."
    ]
   },
   {
@@ -647,7 +647,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We now generaste stations combination by removing 0-n stations. "
+    "We now generaste stations combination by removing 0-n stations."
    ]
   },
   {
@@ -657,7 +657,7 @@
    "outputs": [],
    "source": [
     "n = 2\n",
-    "combinations_list = xhfa.uncertainities.generate_combinations(data, n)"
+    "combinations_list = xhfa.uncertainties.generate_combinations(data, n)"
    ]
   },
   {
@@ -692,7 +692,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The following steps are similar to the previous one, just with more regions. "
+    "The following steps are similar to the previous one, just with more regions."
    ]
   },
   {
@@ -781,7 +781,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "Q_reg_boot = xhfa.uncertainities.calc_q_iter(\n",
+    "Q_reg_boot = xhfa.uncertainties.calc_q_iter(\n",
     "    \"023401\", \"streamflow_max_annual\", ds_groups_H1, ds_moments_iter, return_periods\n",
     ")\n",
     "Q_reg_boot"

diff --git a/pyproject.toml b/pyproject.toml
@@ -166,7 +166,8 @@ values = [
 ]
 
 [tool.codespell]
-ignore-words-list = "astroid,socio-economic"
+ignore-words-list = "ans,astroid,nd,parametre,projet,socio-economic"
+skip = "*.po"
 
 [tool.coverage.paths]
 source = ["src/xhydro/", "*/site-packages/xhydro/"]

diff --git a/src/xhydro/extreme_value_analysis/julia_import.py b/src/xhydro/extreme_value_analysis/julia_import.py
@@ -1,4 +1,4 @@
-"""Load and install Julia dependancies into python environment."""
+"""Load and install Julia dependencies into python environment."""
 
 import contextlib
 import io
@@ -80,14 +80,14 @@ def check_function_output(func, expected_output, *args, **kwargs) -> bool:
     return expected_output in output
 
 
-# It was not necessary to add a dependancy dictionary as we only need Extremes.jl, however this mechanism is more
-# scalable in case we need to add many other julia dependancies in the future
+# It was not necessary to add a dependency dictionary as we only need Extremes.jl, however this mechanism is more
+# scalable in case we need to add many other julia dependencies in the future
 deps = {
     "Extremes": "fe3fe864-1b39-11e9-20b8-1f96fa57382d",
 }
-for dependancy, uuid in deps.items():
-    if not check_function_output(juliapkg.deps.status, dependancy):
-        juliapkg.add(dependancy, uuid)
+for dependency, uuid in deps.items():
+    if not check_function_output(juliapkg.deps.status, dependency):
+        juliapkg.add(dependency, uuid)
 juliapkg.resolve()
 jl = cast(ModuleType, jl)
 jl_version = (

diff --git a/src/xhydro/extreme_value_analysis/structures/util.py b/src/xhydro/extreme_value_analysis/structures/util.py
@@ -143,7 +143,7 @@ def return_level_cint(
     nobsperblock_pareto: int | None = None,
 ) -> dict[str, list[float]]:
     r"""
-    Return a list of retun level and confidence intervals for a given Julia fitted model.
+    Return a list of return levels and confidence intervals for a given Julia fitted model.
 
     Parameters
     ----------