-
Notifications
You must be signed in to change notification settings - Fork 0
Loading data
- Input format: to get an example of the input CSV format, look under the
documentationfolder of the release, where a sample is provided; - Wikidata data model : a separate page depicts the target Wikidata data model for ISSNbot
The precise update algorithms are documented in the Algorithm page.
- Copy an input CSV file in the
inputdirectory - Run
java -Dpassword=xxxxx -jar issnbot-app-<version>-onejar.jar load_issn; that command will compute all updates to do by comparing input CSV data with existing Wikidata data, but not send the actual updates; - While running, do
tail -f log/issnbot-all.logto watch the log - When the command stops, the report is printed in the console and in the log file
- Review the expected output in the
output/<filename>_output.csvfile, and review records in error in theerrorfolder
- If you are happy with the expected output, re-run the command but this time with the
updateflag to send updates to Wikidata :java -Dpassword=xxxxx -jar issnbot-app-<version>-onejar.jar load_issn update
- View error records in the
errorsubfolder, and re-process them if necessary. To reprocess them, use the same batch ID than the initial load: - remove original CSV file from
inputfolder - copy
error/<filename>.csv_error_api.csvininputfolder - Find the previous batchId in the
log/issnbot-all.logby looking for this line:o.i.i.IssnBot - Initialized batch identifier to : 1396b5592exxxxxx - Re-run the command by passing this value to the
batchIdoption:java -Dpassword=xxxxx -jar issnbot-app-<version>-onejar.jar load_issn update batchId=1396b5592exxxxxx. Note that the batchId must be 16 characters hexadecimal string.
The error folder will contain 2 files:
-
<filename>.csv_error_api.csv: records/lines for which the call to Wikidata failed, and that could be reprocessed -
<filename>.csv_error_data.csv: records/lines for which there is a structural problem in the data, e.g. wrong language code or wrong country code; these records do not need to be reprocessed as the same error will still happen;
The output folder will contain one CSV file per input CSV file, with the following columns; the status of each column is documented in the Wikidata Update Statuses page:
- Time: time the record was processed
- ISSN-L: ISSN-L of the record
- Wikidata QID: QID of associated Wikidata item
- Status: Either
SUCCESS-
ERROR APIin case the record produced an error when calling Wikidata API (in this case it will be in the corresponding file in the error folder) -
ERROR DATAin case the record contains and error in the data (in this case it will be in the corresponding file in the error folder)
- Message: The error message if the record was in error, empty otherwise
- ISSN-L (P7363): Update status of the ISSN-L property
- Label: Update status of the label in Wikidata
- Alias: Update status of the alias in Wikidata
- Title (P1476): Update status of the title property
- Language (P407): Update status of the language property
- Place of Publication (P291): Update status of the place of publication property
- Official Website (P856): Update status of the official website property. Note this is multivalued, so a global update status is given, with individual update statuses in parenthesis for each value
- ISSN1 (P236): Update status for the first ISSN value
- ISSN2 (P236): Update status for the second ISSN value
- ISSN3 (P236): Update status for the third ISSN value
- ISSN4 (P236): Update status for the fourth ISSN value
- Cancelled ISSNs (P236): Update status for the cancelled ISSNs values. Note this is multivalued, so a global update status is given, with individual update statuses in parenthesis for each value
- Previous values of Place of Publication (P291): Update status for the previously loaded values of place of publication
- Previous values of Place of Official Website (P856): Update status for the previously loaded values of official website
- Previous values of ISSN (P236): Update status for the previously loaded values of ISSN
- Previous Cancelled ISSNs (P236): Update status for the previously loaded values of cancelled ISSNs
The log subfolder will contain:
-
issnbot-all.log: the full log of this run and all previous runs (logs of a new run get appended in this file) -
issnbot-errors.log: only ERROR levels log messages of this run and all previous runs -
issnbot-report.log: only the final report message
You may want to clear the log folder before running the bot in case you want to garantee that the log file contain only the log of the last run of the bot.
To deal with Wikidata maxLag, refer to the available options.
Look at the help message of the ISSNBot for a full description of the options.
- You can customize the input, output and error folders.
- You can set a maximum limit of rows to be processed with
limit=500 - You can set the
batchIdin which the updates will be tracked in EditGroups
ISSNBot was registered in the EditGroups tool, a tool that can rollback edits in Wikidata. Everytime the bot is run, it generates a new batchId with this code, unless a batchId value is provided in the command line:
Long.toString((new Random()).nextLong(), 16).replace("-", "");
If you want to undo a load, the simplest way to do is:
- Look in the output CSV file and look for one the Wikidata QID that was updated
- Go to the corresponding Wikidata item page https://www.wikidata.org/wiki/Qxxxxxxxxx
- Go to the Page History
- Look for the ISSNBot modification in the history, and click on the "details" link at the end of the message
- This will bring you to the EditGroups tool where you can Undo all modifications tracked in this batch
Alternatively, you can go to https://editgroups.toolforge.org/?tool=ISSNBot to see all modifications performed by ISSNBot
Alternativaly, if you know a batch ID, you can go to https://editgroups.toolforge.org/b/ISSNBot/xxxxxxxxxxxxxxxx/