Download and convert climate pdfs from EPA and (in the future) other sources.
$ git clone https://github.com/jlnav/climpdfgetter.git
$ cd climpdfgetter; pip install -e .
Usage: climpdf crawl [OPTIONS] STOP_IDX START_IDX
Specify a source out of EPA
, NOAA
, or OSTI
to climpdf crawl
. Then specify
the stop index and start index out of the search results. For instance, to download
the first hundred documents:
climpdf crawl 100 0
Usage: climpdf count [OPTIONS] SOURCE
Specify a source to count the number of downloaded files.
For instance:
$ climpdf count EPA
2342
Usage: climpdf resume [OPTIONS] SOURCE NUM_DOCS
Instruct climpdf
to download NUM_DOCS
additional documents from the
specified source.
For instance:
climpdf resume EPA 1000
Usage: climpdf convert [OPTIONS] SOURCE
Instruct climpdf
to try converting downloaded files in a given directory
to json. Subdirectories are also searched.
For instance:
climpdf convert data/EPA_2024-12-18_15:09:27
or:
climpdf convert data
Development and package management is done with Pixi.
Enter the development environment with:
pixi shell -e dev
climpdf
uses: