Python package to download data from the Australian Bureau of Statistics (ABS) and process to tidy format.
It can be used either as a command-line tool, or as a library.
pip install abs@git+https://github.com/swsphn/abs.git
pipx install abs@git+https://github.com/swsphn/abs.git
# show available commands
abs --help
# show options for ascceg command
abs ascceg --help
# download ascceg file to current directory (defaults to parquet)
abs ascceg
# download ascceg file to a different directory as csv
abs ascceg -f csv some/other/directory
# download ascceg file with a specific name and filetype
abs ascceg some/file.csv
abs ascceg some/file.parquet
You can also use this package as a library. Data is returned as tidy Polars DataFrames. If you are more comfortable with Pandas, you can convert the DataFrame with the to_pandas method.
import abs
ascceg = abs.ascceg.df()
print(ascceg)
Here are the steps to add a new data source to this tool:
- Create a new module to fetch and transform the ABS data.
- The module path should be
src/abs/<module>.py
, where<module>
represents the name of the new module. - This module MUST contain a function named
df()
which returns a Polars DataFrame. - This module SHOULD contain a docstring as the first line of the file. The module docstring will be automatically used as the help for the subcommand.
- The module name will be automatically used as the name of the subcommand, and also of the output file.
- The module path should be
- There is no step 2. The module will be automatically added to the
abs
package asabs.<module>
, and a CLI subcommand will be automatically created asabs <module>
, where<module>
represents the name of the new module.
For example, suppose you create a new module called sacc.py
to fetch
and tidy the Standard Australian Classification of Countries
(SACC) data source. Assuming that you have added a docstring as
the first line of sacc.py
, and have defined a function df()
in
sacc.py
which returns a Polars DataFrame, then you are done!