Krawler based jobs to scrape various data related to administrative entities.
This job relies on:
- osmium to extract administrative boundaries at different level from OSM pbf files,
- ogr2ogr to generate sequential GeoJson files to handle large datasets,
- mapshaper to simplify complex geometries,
- tippecanoe to generate MBTiles,
- turfjs to compute the position of toponyms.
Important
osmium, ogr, mapshaper and tippecanoe command-line tools must be installed on your system.
To setup the regions to process, you must export the environment variables REGIONS
with the GeoFabrik regions. For instance:
export REGIONS="europe/france;europe/albania"
If you'd like to simplify geometries you can setup the simplification tolerance and algorithm:
export SIMPLIFICATION_TOLERANCE=500 # defaults to 128
export SIMPLIFICATION_ALGORITHM=visvalingam # defaults to 'db'
Note
The given simplification tolerance will be scaled according to administrative level using this formula:
tolerance at level N = tolerance / 2^(N-2)
For testing purpose you can also limit the processed administrative levels using the MIN_LEVEL/MAX_LEVEL
environment variables.
To generate the whole planet use continent extracts like this first to launch the osm-boundaries
job from level 3 to 8:
export REGIONS="africa;asia;australia-oceania;central-america;europe;north-america;south-america"
As large files are generated for e.g. Europe you might have to increase the default NodeJS memory limit:
export NODE_OPTIONS=--max-old-space-size=8192
Then, launch the osm-planet-boundaries
job for level 2, which uses a planet extract, and planet MBTiles generation. Indeed, country-level (i.e. administrative level 2) requires a whole planet file to avoid missing relation between continental and islands areas.
Last but not least, launch the generate-osm-boundaries-mbtiles.sh
script to generate a MBTils file from GeoJson files produced by the job.
To avoid generating data multiple times you can easily dump/restore it from/to MongoDB databases:
mongodump --host=localhost --port=27017 --username=user --password=password --db=atlas --collection=osm-boundaries --gzip --out dump
mongorestore --db=atlas --gzip --host=mongodb.example.net --port=27018 --username=user --password=password dump/atlas
This job relies on archive shape files from IGN and the mapshaper and 7z tools.
https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#admin-express
An updated shell script is available that include French Polynesia data in the result mbtiles: generate-admin-express-with-french-polynesia.sh
Call it like this :
./generate-admin-express-with-french-polynesia.sh $PATH_TO_WORK_FOLDER
It'll download required data and build the result mbtiles in $PATH_TO_WORK_FOLDER/admin-express.mbtiles
The script requires the following tools:
mapshaper
(can be installed withnpm install -g mapshaper
)wget
,7z
,tippecanoe
andtile-join
. All of those can probably be found as packages in your favorite distribution (apt install 7z wget tippecanoe
).
The script relies on Admin Express data available from here (using FRA zone
). French Polynesia data is fetched from here and here for the INSEE codes.
The script reprojects and patches the French Polynesia data to match the Admin Express schema, adding properties like INSEE_COM
, INSEE_DEP
, NOM
, NOM_M
and POPULATION
to it. It then merges with the Admin Express dataset to build the final mbtiles.
This job relies on archive shape files from IGN and the mapshaper and 7z tools.
https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#bdpr
To debug you can run this command from a local krawler install node --inspect . ../k-atlas/jobfile-bdpr.js
.
To run it on the infrastructure we use Docker images based on the provided Docker files, if you'd like to test it manually you can clone the repo then do:
docker build --build-arg KRAWLER_TAG=latest -f dockerfile.bdpr -t k-atlas/bdpr-latest .
docker run --name bdpr --network=host --rm -e S3_ACCESS_KEY -e S3_SECRET_ACCESS_KEY -e S3_ENDPOINT -e S3_BUCKET -e "DEBUG=krawler*" k-atlas:bdpr-latest