-
Notifications
You must be signed in to change notification settings - Fork 0
2. Datasets
The dataset is extracted from OpenStreetMap and contains all the POIs with a key-value tag amenity:restaurant in the selected area. By default, the code we provide will get data from the Milano area (Italy); however, you can change the reference bounding box, so to get POI information from your favorite area in the world.
To follow this step of the tutorial, clone the GWAP Enabler Tutorial from Github.
The required DB tables can be generated as follows:
-
Tables
resource,topicandresource_has_topic: execute the R script to generate the INSERT queries, adopting one of the following solutions:-
If you have R installed on your machine: run the R script named
script-query-generation.Rfrom thedataset-creationfolder of this tutorial; please note that you need also theosm-utilities-tutorial.rscript (but you got it once you've cloned the tutorial project). -
If you don't have R installed on your machine, but still you want to understand the process for dataset creation, follow the Jupiter notebook contained in this tutorial; to run it, you need a cloud Jupiter server with an R execution environment installed (examples are Microsoft Azure Notebook and Jupiter Notebook) and the
osm-utilities-tutorial.rscript (but you got it once you've cloned the tutorial project). - If you just want to continue the tutorial, download directly the .sql file containing the example Milano data; the dataset consists in the following files:
As mentioned above, the script can be easily customized to generate new datasets for your specific needs. Starting from this script you can download different POIs or you can focus your attention on different geographical areas by setting different bounding boxes. Read the Jupiter notebook guide to get insights about how to tune those parameters.
-
If you have R installed on your machine: run the R script named
-
Table
badges: you can either use the5_Insert_Badge.sqlSQL script to use standard badges or you can customize it by modifying the name, thumbnail, etc as explained in this guide. -
Table
configuration: adjust the6_Insert_Configuration.sqlSQL script by modifying the values according to your specific use case. Here is a brief explanation of the meaning of the parameters:- upperThreshold: when the classification score of a resource overcomes this threshold, the resource is considered classified. The higher the threshold, the higher is the number of contributions and the time required to classify the resource.
- positiveK: how much the classification score of a resource should be incremented after each "correct" answer. The lower the increment, the higher is number of contributions required to reach the threshold. Higher numbers of contributions can guarantee more reliable classifications.
- nOfLevels: maximum number of resources that can be classified in a round
- maxScore: maximum score that can be gained in a single round
- Once you have all the 6 SQL script above, load the data into the
gwap-enabler-dbyou have previously created on your MySQL