Skip to content

A Krawler based service to scrape various data related to administrative entities

License

Notifications You must be signed in to change notification settings

kalisio/k-atlas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

k-atlas

Latest Release CI License: MIT

Krawler based jobs to scrape various data related to administrative entities.

OSM boundaries

This job relies on:

  • osmium to extract administrative boundaries at different level from OSM pbf files,
  • ogr2ogr to generate sequential GeoJson files to handle large datasets,
  • mapshaper to simplify complex geometries,
  • tippecanoe to generate MBTiles,
  • turfjs to compute the position of toponyms.

Important

osmium, ogr, mapshaper and tippecanoe command-line tools must be installed on your system.

To setup the regions to process, you must export the environment variables REGIONS with the GeoFabrik regions. For instance:

export REGIONS="europe/france;europe/albania"

If you'd like to simplify geometries you can setup the simplification tolerance and algorithm:

export SIMPLIFICATION_TOLERANCE=500 # defaults to 128
export SIMPLIFICATION_ALGORITHM=visvalingam # defaults to 'db'

Note

The given simplification tolerance will be scaled according to administrative level using this formula: tolerance at level N = tolerance / 2^(N-2)

For testing purpose you can also limit the processed administrative levels using the MIN_LEVEL/MAX_LEVEL environment variables.

Planet generation

To generate the whole planet use continent extracts like this first to launch the osm-boundaries job from level 3 to 8:

export REGIONS="africa;asia;australia-oceania;central-america;europe;north-america;south-america"

As large files are generated for e.g. Europe you might have to increase the default NodeJS memory limit:

export NODE_OPTIONS=--max-old-space-size=8192

Then, launch the osm-planet-boundaries job for level 2, which uses a planet extract, and planet MBTiles generation. Indeed, country-level (i.e. administrative level 2) requires a whole planet file to avoid missing relation between continental and islands areas.

Last but not least, launch the generate-osm-boundaries-mbtiles.sh script to generate a MBTils file from GeoJson files produced by the job.

To avoid generating data multiple times you can easily dump/restore it from/to MongoDB databases:

mongodump --host=localhost --port=27017 --username=user --password=password --db=atlas --collection=osm-boundaries --gzip --out dump
mongorestore --db=atlas --gzip --host=mongodb.example.net --port=27018 --username=user --password=password dump/atlas

Admin-Express

This job relies on archive shape files from IGN and the mapshaper and 7z tools.

https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#admin-express

with French Polynesia

An updated shell script is available that include French Polynesia data in the result mbtiles: generate-admin-express-with-french-polynesia.sh

Call it like this :

./generate-admin-express-with-french-polynesia.sh $PATH_TO_WORK_FOLDER

It'll download required data and build the result mbtiles in $PATH_TO_WORK_FOLDER/admin-express.mbtiles

The script requires the following tools:

  • mapshaper (can be installed with npm install -g mapshaper)
  • wget, 7z, tippecanoe and tile-join. All of those can probably be found as packages in your favorite distribution (apt install 7z wget tippecanoe).

The script relies on Admin Express data available from here (using FRA zone). French Polynesia data is fetched from here and here for the INSEE codes.

The script reprojects and patches the French Polynesia data to match the Admin Express schema, adding properties like INSEE_COM, INSEE_DEP, NOM, NOM_M and POPULATION to it. It then merges with the Admin Express dataset to build the final mbtiles.

BDPR

This job relies on archive shape files from IGN and the mapshaper and 7z tools.

https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#bdpr

Development

To debug you can run this command from a local krawler install node --inspect . ../k-atlas/jobfile-bdpr.js.

To run it on the infrastructure we use Docker images based on the provided Docker files, if you'd like to test it manually you can clone the repo then do:

docker build --build-arg KRAWLER_TAG=latest -f dockerfile.bdpr -t k-atlas/bdpr-latest .
docker run --name bdpr --network=host --rm -e S3_ACCESS_KEY -e S3_SECRET_ACCESS_KEY -e S3_ENDPOINT -e S3_BUCKET -e "DEBUG=krawler*" k-atlas:bdpr-latest

About

A Krawler based service to scrape various data related to administrative entities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •