diff --git a/docs/index.rst b/docs/index.rst index b504a57..ffa9abd 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -16,12 +16,15 @@ :caption: Data Source Examples notebooks/data_sources/NOAA_API.ipynb + notebooks/data_sources/es_covid.ipynb + notebooks/data_sources/HDX_API.ipynb .. toctree:: :caption: Development Resources :hidden: contributing + maintainer_guide code_of_conduct style_guide data_model diff --git a/docs/maintainer_guide.rst b/docs/maintainer_guide.rst new file mode 100644 index 0000000..a3da2ff --- /dev/null +++ b/docs/maintainer_guide.rst @@ -0,0 +1,220 @@ +Maintainer Guide +================ + +This document purpose is to define and concrete the tasks related to maintaining the github repository +to have a detailed description of each of the task to serve as both: + +1. Reference for Mantainers of the repository +2. Guide for users who may be interested in improve it's contribution workflow. + + +Issues +------ + +Issues can be classified into three types: + +- **Bug reports** + +- **Feature requests** + +- **Tasks** + +Let's see them in more detail: + +Bug reports +*********** + +Bug reports are communications that something is not working as intended. They should follow this guideline: + ++ **Title** + Short but concise, must refer to the component failing and the type of error and the context. + Examples of good titles are:: + + XXXX Exception on function path.to.function + XXXX Exception on function path.to.function when calling with parameter param=value + Error when doing $something on OS + + Examples of bad titles are:: + + Error # not descriptive + xxx crash # doesn't refer to a component + wrong values # doesn't refer to a component + +The general idea is that reading the title of an issue can give a general idea of what the bug is. +This will make it easier for people searching the bug reports to filter what they can do, and also +for users trying to find a bug related to a problem they may have. + ++ **General information:** + + * Commit SHA or version of package when available. ( This will help us locate the error, and + make sure it hasn't been fixed yet). + * OS version: This can help replicate OS specific issues. + * Python version: This can help replicate python-version specific issues. + +- **Description:** This is where the main explanation of what happened, here the user must explain + not only what they did, but what they intended to do, in order to avoid to fix something that is + working properly but is not being used properly. + +- **Code snippet:** It should provide a `minimal, complete and reproducible example`_ of the issue + being reported. + +- **Traceback:** If the bug reported causes a crash, the traceback should be posted in the + *Additional Content* section. + +- **Dependencies:** If the bug reported caused a crash, or we have suspicions that the issue may be + happening in some call to an external dependency, the actual dependency list the user has + installed maybe required. ( It can be obtained with ``pip freeze``) + + +All the contents listed above should serve to an only purpose: Help duplicate and identify the issue. + +Bug report workflow +................... + +After a bug report is created, it passes through 4 stages: + +1. **Verification:** This is the stage where we make sure we are able to duplicate the issue. + We will ask for whatever information is needed in order to replicate the bug, when this step + is completed, we create a branch using the naming convention + ``gh-$(ISSUE_NUMBER)-$(TITLE OF THE ISSUE)`` and add a test that replicate the bug being + reported and push it to the main repository. + + The build shall fail, meaning that we have been able to reproduce the bug. At this point the tag + ``Ready to start`` can be added. If the issue is specially easy to solve, the tag ``good first issue`` + can be added too. + + In the case that the bug is not such, for example, bad usage from the user, or correct, + albeit unexpected behavior, the tag ``wontfix`` can be added and the issue closed. + +2. **Assignement:** After we create the branch, we create a Trello card linking it to the github issue + and explaining the problem, the project manager will find someone to fix the bug. + +3. **Development:** After the project manager finds somebody to fix the bug, this is the stage where the + bug is actually fixed. +4. **Resolution:**. After the bug has been fixed, the changes are submited with a Pull Request. + + +Feature requests +**************** + +Feature requests are request for improvements in our package. + +They should include two parts: + +- **Motivation:** Either something that is not correct and can be improved, or a new feature they + consider the package should provide. + +- **Proposal of solution:** How they propose the issue to be solved, or how, in general terms, they + whish the behavior was after everything is solved. + +Feature request workflow +........................ + +After a feature request is created, it passes through 4 steps: + +1. **Verification and design:** After receiving the feature request, we must make sure that the proposal: + + a) Makes sense in the scope of this project and his motivations. + b) It's doable with the resources available. + c) It's consistent with the behavior of the rest of the package. + + If any of this requisites is not filled, we will add the `wontfix` tag and close the issue. + If all three requisites are fulfilled, then we will start studying the proposal of solution if + exist, or to design it according the reported desired behavior. To do so we should take in + consideration: + + a) What structures will be required to be created? + b) Which inputs, outputs and functionality will have each part? + c) How will it be used by the final user? + d) Are components in our codebase with similar functionality to any of the components that + can be used, or modified to fulfill the requirements? + e) In which part of the repository should be added this components? + f) There will be any potential conflicts with other components? + + We will add the design proposal in a comment. + +2. **Assignement:** After we create the branch, we create a Trello card linking it to the github + issue and a brief summary of the design proposal, the project manager will find someone to + implement it. + +3. **Development:** After the project manager finds somebody to implement the feature, this is + the stage where coding happens. + +4. **Resolution:**. After the feature has been, the changes are submited with a Pull Request. + + +Support +******* + +Support issues came from users who tried to use our package but the the documentation ( or the +lack of thereof) wasn't helpful enough to get the results they wanted. + +Usually this issues don't require developing code, just explaining the user how to use our package. +In the case that documentation is missing or incomplete, a Pull Request can be open to improve the +documentation. + +Tasks +***** + +Tasks are related to issues created from Trello and contain improvements to be done discussed +in the trello card or for which general documentation already exists, so no further work for the +maintainer is required. + + +Pull requests +------------- + +Pull Requests are open by collaborators who whish their work to be included in the repository. + +The review process has the following steps: + +1. **Review of General format:** This is the first and preliminary stage, where we simply review + that the general format of the PR is correct, this include answering the following questions: + + - There is an issue related to the Pull Request? + - Are the items on the checklist checked? The ones that are not, do they have a reason for? Could + the review process continue without these? + - Is there a description of the changes included? + + If any of this questions is answered with a NO, we must go to the collaborator and ask him to made + the corresponding changes. + +2. **Continous Integration / Build**: Github will automatically launch a build job for every new + Pull Request that is open, that will run the tests and linters. If the build fails, we will add + the ``ci-fail`` label and ask the submitter to fix it. If the build passes, we add the ``ci-ok`` + label and move to the next step. + +3. **Output / notebooks / examples / documentation format:** Now we will review that the outputs + and the components behavior is as expected, usually this can be done with: + + - Cloning the fork/ checking out the branch + - Create a new virtualenv and install from scratch the module + - Run the notebooks provided, if any, seeing if the outputs change with the ones submitted. + - Generate the docs and review the docs added for the submission. + - Use the public API and see that the output is compliant with the specifications. + + If we found any issues, we will add the ``output-fail`` label and ask for the changes. If + everything is ok, we add the ``output-ok`` label and continue to the next step. + +4. **Peer review / Code review:** This is the final step, where we read the code and analyze it +looking for flaws, things that should be looked include: + + - Good naming practices + - Code redundancy + - Readability + - Coherent APIs (arguments, defaults, types) + - Structured code + - Good use of external libraries. + + All issues that are found should be explained and with a proposed change. After the changes are + made, the PR can be merged. + + + + + + + + + +.. _minimal, complete and reproducible example: https://stackoverflow.com/help/minimal-reproducible-example \ No newline at end of file diff --git a/task_geo/data_sources/covid/es_covid/__init__.py b/task_geo/data_sources/covid/es_covid/__init__.py new file mode 100644 index 0000000..1e0fe77 --- /dev/null +++ b/task_geo/data_sources/covid/es_covid/__init__.py @@ -0,0 +1,36 @@ +from task_geo.data_sources.covid.es_covid.es_covid_connector import es_covid_connector +from task_geo.data_sources.covid.es_covid.es_covid_formatter import es_covid_formatter + + +def es_covid(): + """Daily updates for cases for Spain, joined with static demographic data. + + The following sources are updated on each run: + 1. https://covid19.isciii.es/ provides the following + + - autonomous_community_iso + - date + - cases + - hospitalized + - intensive care unit (icu) + - deceased + - recovered + + The following sources were used for one time access: + 1. https://en.wikipedia.org/wiki/Autonomous_communities_of_Spain provides the + following 19 Spanish Autonomous Communities + + - area + - population + - density + - gdp_per_capita_euros + + Arguments: + None + + Returns: + pandas.DataFrame + + """ + data = es_covid_connector() + return es_covid_formatter(data) diff --git a/task_geo/data_sources/covid/spain/__main__.py b/task_geo/data_sources/covid/es_covid/__main__.py similarity index 100% rename from task_geo/data_sources/covid/spain/__main__.py rename to task_geo/data_sources/covid/es_covid/__main__.py diff --git a/task_geo/data_sources/covid/spain/audit.md b/task_geo/data_sources/covid/es_covid/audit.md similarity index 100% rename from task_geo/data_sources/covid/spain/audit.md rename to task_geo/data_sources/covid/es_covid/audit.md diff --git a/task_geo/data_sources/covid/spain/datapackage.json b/task_geo/data_sources/covid/es_covid/datapackage.json similarity index 100% rename from task_geo/data_sources/covid/spain/datapackage.json rename to task_geo/data_sources/covid/es_covid/datapackage.json diff --git a/task_geo/data_sources/covid/spain/es_covid_connector.py b/task_geo/data_sources/covid/es_covid/es_covid_connector.py similarity index 82% rename from task_geo/data_sources/covid/spain/es_covid_connector.py rename to task_geo/data_sources/covid/es_covid/es_covid_connector.py index ddc6d91..5dc8837 100644 --- a/task_geo/data_sources/covid/spain/es_covid_connector.py +++ b/task_geo/data_sources/covid/es_covid/es_covid_connector.py @@ -2,13 +2,14 @@ def es_covid_connector(): - """Retrieves data from https://covid19.isciii.es + """Retrieve data from https://covid19.isciii.es. Arguments: None Returns: pandas.DataFrame + """ url = "https://covid19.isciii.es/resources/serie_historica_acumulados.csv" return pd.read_csv(url, encoding="latin_1") diff --git a/task_geo/data_sources/covid/spain/es_covid_formatter.py b/task_geo/data_sources/covid/es_covid/es_covid_formatter.py similarity index 99% rename from task_geo/data_sources/covid/spain/es_covid_formatter.py rename to task_geo/data_sources/covid/es_covid/es_covid_formatter.py index 961f1d4..433f5cd 100644 --- a/task_geo/data_sources/covid/spain/es_covid_formatter.py +++ b/task_geo/data_sources/covid/es_covid/es_covid_formatter.py @@ -2,13 +2,14 @@ def es_covid_formatter(df): - """Formats data retrieved from https://covid19.isciii.es + """Format data retrieved from https://covid19.isciii.es. Arguments: raw(pandas.DataFrame): Returns: pandas.DataFrame + """ df.columns = df.columns.str.lower() df.rename(columns={'ccaa': 'autonomous_community_iso', 'fecha': 'date', diff --git a/task_geo/data_sources/covid/fr_covid/__init__.py b/task_geo/data_sources/covid/fr_covid/__init__.py new file mode 100644 index 0000000..86f7c0c --- /dev/null +++ b/task_geo/data_sources/covid/fr_covid/__init__.py @@ -0,0 +1,31 @@ +"""Retrieve granular data of COVID-19 cases in France. + +Functions: + - fr_covidata_connector: Extracts data from CSV URL + - fr_covidata_formatter: Cleans CSV data + - fr_covidata: Combines the two previous functions + +Data Credits: + OpenCOVID19-fr + https://www.data.gouv.fr/en/datasets/chiffres-cles-concernant-lepidemie-de-covid19-en-france/ + https://github.com/opencovid19-fr/data + +""" + + +from task_geo.data_sources.covid.fr_covid.fr_covid_connector import fr_covid_connector +from task_geo.data_sources.covid.fr_covid.fr_covid_formatter import fr_covid_formatter + + +def fr_covidata(): + """Retrieve and format granular data of COVID-19 cases in France. + + Arguments: + None + + Returns: + pandas.DataFrame + + """ + df = fr_covid_connector() + return fr_covid_formatter(df) diff --git a/task_geo/data_sources/covid/fr_covidata/__main__.py b/task_geo/data_sources/covid/fr_covid/__main__.py similarity index 100% rename from task_geo/data_sources/covid/fr_covidata/__main__.py rename to task_geo/data_sources/covid/fr_covid/__main__.py diff --git a/task_geo/data_sources/covid/fr_covidata/audit.md b/task_geo/data_sources/covid/fr_covid/audit.md similarity index 100% rename from task_geo/data_sources/covid/fr_covidata/audit.md rename to task_geo/data_sources/covid/fr_covid/audit.md diff --git a/task_geo/data_sources/covid/fr_covidata/datapackage.json b/task_geo/data_sources/covid/fr_covid/datapackage.json similarity index 100% rename from task_geo/data_sources/covid/fr_covidata/datapackage.json rename to task_geo/data_sources/covid/fr_covid/datapackage.json diff --git a/task_geo/data_sources/covid/fr_covid/fr_covid_connector.py b/task_geo/data_sources/covid/fr_covid/fr_covid_connector.py new file mode 100644 index 0000000..36ca6b3 --- /dev/null +++ b/task_geo/data_sources/covid/fr_covid/fr_covid_connector.py @@ -0,0 +1,28 @@ +import io + +import pandas as pd +import requests + +url = ( + 'https://raw.githubusercontent.com/opencovid19-fr/' + 'data/master/dist/chiffres-cles.csv' +) + + +def fr_covid_connector(): + """Extract data from OpenCOVID19-fr's Github repository. + + Description: + - Downloads the URL's data in a Unicode CSV Format + - Unicode CSV Format: ACS 5Y UTF-8 + + Arguments: + None + + Returns: + dataset (DataFrame with CSV Data) + + """ + urlData = requests.get(url).content + + return pd.read_csv(io.StringIO(urlData.decode('utf-8'))) diff --git a/task_geo/data_sources/covid/fr_covid/fr_covid_formatter.py b/task_geo/data_sources/covid/fr_covid/fr_covid_formatter.py new file mode 100644 index 0000000..4bc7320 --- /dev/null +++ b/task_geo/data_sources/covid/fr_covid/fr_covid_formatter.py @@ -0,0 +1,81 @@ +import numpy as np +import pandas as pd + + +def fr_covid_formatter(dataset): + """Formatter for FR COVID-19 Data. + + Description: + + - Drop unnecessary rows with irrelevant regions' info and only keep + info related to subregions in Metropolitan France, as well as + repetitive data. + - Check the dataset for instances where there are more than one source + of data in the same subregion for the same date, then complement all + the sources information, and take the highest value in case there are + different values for the same column, while aggregating the sources info + - Rename/Translate the column titles, and add a country column (France) + + + Arguments: + dataset(pandas.DataFrame): Data as returned by fr_covidata_connector. + + Returns: + pandas.DataFrame + + """ + no_gr = ['region', 'monde', 'pays', 'collectivite-outremer'] + no_mc = ['DEP-971', 'DEP-972', 'DEP-973', 'DEP-974', 'DEP-976'] + dataset = dataset[ + (~dataset.granularite.isin(no_gr)) & (~dataset.maille_code.isin(no_mc)) + ] + dataset = dataset.drop(['depistes', 'granularite'], axis=1) + dataset = dataset.drop_duplicates( + subset=['date', 'maille_code', 'cas_confirmes', 'deces', + 'reanimation', + 'hospitalises', 'gueris'], keep=False) + dataset['date'] = pd.to_datetime(dataset['date'].astype(str)).dt.date + + # Reset indices: + dataset = dataset.reset_index(drop=True) + + # Turn source columns' values type to string: + str_columns = ['source_nom', 'source_url', 'source_archive', 'source_type'] + dataset[str_columns] = dataset[str_columns].astype(str) + + aggre = { + 'cas_confirmes': np.max, + 'cas_ehpad': np.max, + 'cas_confirmes_ehpad': np.max, + 'cas_possibles_ehpad': np.max, + 'deces': np.max, + 'deces_ehpad': np.max, + 'reanimation': np.max, + 'hospitalises': np.max, + 'gueris': np.max, + 'source_nom': ','.join, + 'source_url': ','.join, + 'source_archive': ','.join, + 'source_type': ','.join + } + + group_columns = ['date', 'maille_code', 'maille_nom'] + dataset = dataset.groupby(group_columns).aggregate(aggre).reset_index() + + # Rename/Translate the column titles: + renamed_columns = { + "maille_code": "subregion_code", + "maille_nom": "subregion_name", "cas_confirmes": "confirmed", + "deces": "deaths", "reanimation": "recovering", + "hospitalises": "hospitalized", "gueris": "recovered", + "source_nom": "source_name" + } + dataset = dataset.rename(columns=renamed_columns) + dataset['country'] = 'France' + frcovidata = dataset[ + 'subregion_code', 'subregion_name', 'country', 'date', 'confirmed', + 'hospitalized', 'recovering', 'recovered', + 'deaths', 'source_name', 'source_url', 'source_archive', + 'source_type'] + + return frcovidata diff --git a/task_geo/data_sources/covid/fr_covidata/__init__.py b/task_geo/data_sources/covid/fr_covidata/__init__.py deleted file mode 100644 index 80d4d99..0000000 --- a/task_geo/data_sources/covid/fr_covidata/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -from task_geo.data_sources.covid.fr_covidata.fr_covidata import fr_covidata - -__all__ = ['fr_covidata'] diff --git a/task_geo/data_sources/covid/fr_covidata/fr_covidata.py b/task_geo/data_sources/covid/fr_covidata/fr_covidata.py deleted file mode 100644 index b2656b4..0000000 --- a/task_geo/data_sources/covid/fr_covidata/fr_covidata.py +++ /dev/null @@ -1,124 +0,0 @@ -""" -fr_covidata.py - -Functions: - - fr_covidata_connector: Extracts data from CSV URL - - fr_covidata_formatter: Cleans CSV data - - fr_covidata: Combines the two previous functions - -Data Credits: - OpenCOVID19-fr - https://www.data.gouv.fr/en/datasets/chiffres-cles-concernant-lepidemie-de-covid19-en-france/ - https://github.com/opencovid19-fr/data -""" - -import io - -import numpy as np -import pandas as pd -import requests - -url = ( - 'https://raw.githubusercontent.com/opencovid19-fr/' - 'data/master/dist/chiffres-cles.csv' -) - - -def fr_covidata(): - """Data Source for the French COVID-19 Data. - Arguments: - None - Returns: - pandas.DataFrame - """ - df = fr_covidata_connector() - return fr_covidata_formatter(df) - - -def fr_covidata_connector(): - """Extract data from OpenCOVID19-fr's Github repository. - Description: - - Downloads the URL's data in a Unicode CSV Format - - Unicode CSV Format: ACS 5Y UTF-8 - Returns: - dataset (DataFrame with CSV Data) - """ - - urlData = requests.get(url).content - - dataset = pd.read_csv(io.StringIO(urlData.decode('utf-8'))) - return dataset - - -def fr_covidata_formatter(dataset): - """Formatter for FR COVID-19 Data. - Arguments: - dataset(pandas.DataFrame): Data as returned by fr_covidata_connector. - Description: - - Drop unnecessary rows with irrelevant regions' info and only keep - info related to subregions in Metropolitan France, as well as - repetitive data - - Check the dataset for instances where there are more than one source - of data in the same subregion for the same date, then complement all - the sources information, and take the highest value in case there are - different values for the same column, while aggregating the sources - info - - Rename/Translate the column titles, and add a country column (France) - Returns: - frcovidata(pandas.DataFrame) - """ - - no_gr = ['region', 'monde', 'pays', 'collectivite-outremer'] - no_mc = ['DEP-971', 'DEP-972', 'DEP-973', 'DEP-974', 'DEP-976'] - dataset = dataset[ - (~dataset.granularite.isin(no_gr)) & (~dataset.maille_code.isin(no_mc)) - ] - dataset = dataset.drop(['depistes', 'granularite'], axis=1) - dataset = dataset.drop_duplicates( - subset=['date', 'maille_code', 'cas_confirmes', 'deces', - 'reanimation', - 'hospitalises', 'gueris'], keep=False) - dataset['date'] = pd.to_datetime(dataset['date'].astype(str)).dt.date - - # Reset indices: - dataset = dataset.reset_index(drop=True) - - # Turn source columns' values type to string: - str_columns = ['source_nom', 'source_url', - 'source_archive', 'source_type'] - dataset[str_columns] = dataset[str_columns].astype(str) - - aggre = { - 'cas_confirmes': np.max, - 'cas_ehpad': np.max, - 'cas_confirmes_ehpad': np.max, - 'cas_possibles_ehpad': np.max, - 'deces': np.max, - 'deces_ehpad': np.max, - 'reanimation': np.max, - 'hospitalises': np.max, - 'gueris': np.max, - 'source_nom': ','.join, - 'source_url': ','.join, - 'source_archive': ','.join, - 'source_type': ','.join - } - dataset = dataset.groupby(['date', - 'maille_code', - 'maille_nom']).aggregate(aggre).reset_index() - - # Rename/Translate the column titles: - dataset = dataset.rename( - columns={"maille_code": "subregion_code", - "maille_nom": "subregion_name", "cas_confirmes": "confirmed", - "deces": "deaths", "reanimation": "recovering", - "hospitalises": "hospitalized", "gueris": "recovered", - "source_nom": "source_name"}) - dataset['country'] = 'France' - frcovidata = dataset[ - 'subregion_code', 'subregion_name', 'country', 'date', 'confirmed', - 'hospitalized', 'recovering', 'recovered', - 'deaths', 'source_name', 'source_url', 'source_archive', - 'source_type'] - - return frcovidata diff --git a/task_geo/data_sources/covid/spain/__init__.py b/task_geo/data_sources/covid/spain/__init__.py deleted file mode 100644 index 6713bd5..0000000 --- a/task_geo/data_sources/covid/spain/__init__.py +++ /dev/null @@ -1,38 +0,0 @@ -from task_geo.data_sources.covid.spain.es_covid_connector import es_covid_connector -from task_geo.data_sources.covid.spain.es_covid_formatter import es_covid_formatter - - -def es_covid(): - """ - Daily updates for cases for Spain, joined with static demographic data. - - Discovery - Dynamic - The following sources are updated on each run: - 1. https://covid19.isciii.es/ provides the following - - autonomous_community_iso - - date - - cases - - hospitalized - - intensive care unit (icu) - - deceased - - recovered - - Static - The following sources were used for one time access: - 1. https://en.wikipedia.org/wiki/Autonomous_communities_of_Spain provides the - following 19 Spanish Autonomous Communities - - area - - population - - density - - gdp_per_capita_euros - - - Arguments: - N/A - - Returns: - pandas.DataFrame - """ - data = es_covid_connector() - return es_covid_formatter(data) diff --git a/task_geo/data_sources/noaa/__init__.py b/task_geo/data_sources/noaa/__init__.py index c5315ee..e57548d 100644 --- a/task_geo/data_sources/noaa/__init__.py +++ b/task_geo/data_sources/noaa/__init__.py @@ -6,11 +6,12 @@ def noaa_api(countries, start_date, end_date=None, metrics=None, country_aggr=Fa """NOAA API Data Source. Please, note the following: - - The metrics variable will only filter out available metrics, if the metric is not available, - requesting it will have no effect. + - The metrics variable will only filter out available metrics, if the metric is not + available, requesting it will have no effect. + + - Country_agg will only return the min for `TMIN`, that is the absolute minimum, + and the max for `TMAX`, the absolute maximum. - - Country_agg will only return the min for `TMIN`, that is the absolute minimum, - and the max for `TMAX`, the absolute maximum. Arguments: countries(list[str]): @@ -26,6 +27,7 @@ def noaa_api(countries, start_date, end_date=None, metrics=None, country_aggr=Fa SNOW: Snowfall (mm). SNWD: Snow depth (mm). PRCP: Precipitation. + country_aggr(bool): When True, only an aggregate for each date/country will be returned. Example: diff --git a/task_geo/dataset_builders/nasa/__init__.py b/task_geo/dataset_builders/nasa/__init__.py index bdecaa8..01601ea 100644 --- a/task_geo/dataset_builders/nasa/__init__.py +++ b/task_geo/dataset_builders/nasa/__init__.py @@ -12,7 +12,6 @@ def nasa(df, start_date, end_date=None, parms=None, join=True): data at the location. Arguments: - --------- df(pandas.DataFrame): Dataset with columns lon, and lat start_date(datetime): Start date for the time series end_date(datetime): End date fo rthe time series (optional) @@ -22,8 +21,7 @@ def nasa(df, start_date, end_date=None, parms=None, join=True): join(bool): Determine if the meteorologic data has to be joined to the original dataset - Return: - ------ + Returns: pandas.DataFrame: Columns are lon, lat, date, and the desired data, plus the columns of the original dataframe if join=True. diff --git a/task_geo/dataset_builders/nasa/nasa_connector.py b/task_geo/dataset_builders/nasa/nasa_connector.py index 8af33f8..3ce24c4 100644 --- a/task_geo/dataset_builders/nasa/nasa_connector.py +++ b/task_geo/dataset_builders/nasa/nasa_connector.py @@ -6,21 +6,18 @@ from task_geo.dataset_builders.nasa.references import PARAMETERS -def nasa_data_loc(lat, lon, str_start_date, str_end_date, parms_str): - """ - Extract data for a single location. +def nasa_data_loc(lat, lon, str_start_date, str_end_date, params_str): + """Extract data for a single location. - Parameters - ---------- - lat : string - lon : string - str_start_date : string - str_end_date : string - parms_str : string + Parameters: + lat(string) + lon(string) + str_start_date(string) + str_end_date(string) + params_str(string) - Returns - ------- - df : pandas.DataFrame + Returns: + pandas.DataFrame """ base_url = "https://power.larc.nasa.gov/cgi-bin/v1/DataAccess.py" @@ -32,7 +29,7 @@ def nasa_data_loc(lat, lon, str_start_date, str_end_date, parms_str): user = "user=anonymous" url = ( - f"{base_url}?request=execute&{identifier}&{parms_str}&" + f"{base_url}?request=execute&{identifier}&{params_str}&" f"startDate={str_start_date}&endDate={str_end_date}&" f"lat={lat}&lon={lon}&{temporal_average}&{output_format}&" f"{user_community}&{user}" @@ -54,17 +51,16 @@ def nasa_connector(df_locations, start_date, end_date=None, parms=None): data at the location. Arguments: - --------- df_locations(pandas.DataFrame): Dataset with columns lon, and lat start_date(datetime): Start date for the time series end_date(datetime): End date for the time series (optional) parms(list of strings): Desired data, accepted are 'temperature', 'humidity', and 'pressure' (optional) - Return: - ------ + Returns: pandas.DataFrame: Columns are country, region, sub_region (non-null), lon, lat, date, and the desired data. + """ if parms is None: parms = list(PARAMETERS.keys()) diff --git a/task_geo/dataset_builders/nasa/nasa_formatter.py b/task_geo/dataset_builders/nasa/nasa_formatter.py index 2886833..2dc2dc6 100644 --- a/task_geo/dataset_builders/nasa/nasa_formatter.py +++ b/task_geo/dataset_builders/nasa/nasa_formatter.py @@ -17,7 +17,7 @@ def nasa_formatter(df_nasa, parms=None): Parameters ---------- df_nasa : pandas.DataFrame - parms : list of strings + parms : list[str] Returns ------- diff --git a/tests/data_sources/covid/spain/__init__.py b/tests/data_sources/covid/es_covid/__init__.py similarity index 100% rename from tests/data_sources/covid/spain/__init__.py rename to tests/data_sources/covid/es_covid/__init__.py diff --git a/tests/data_sources/covid/spain/test_es_covid_formatter.py b/tests/data_sources/covid/es_covid/test_es_covid_formatter.py similarity index 86% rename from tests/data_sources/covid/spain/test_es_covid_formatter.py rename to tests/data_sources/covid/es_covid/test_es_covid_formatter.py index 7f3ab1d..d246741 100644 --- a/tests/data_sources/covid/spain/test_es_covid_formatter.py +++ b/tests/data_sources/covid/es_covid/test_es_covid_formatter.py @@ -2,7 +2,7 @@ import pandas as pd -from task_geo.data_sources.covid.spain import es_covid_formatter +from task_geo.data_sources.covid.es_covid import es_covid_formatter from task_geo.testing import check_dataset_format