Skip to content

Loading data

Thomas Francart edited this page Jul 9, 2020 · 12 revisions

Loading ISSN data to Wikidata using the ISSNbot

What data is imported in Wikidata?

  • Input format: to get an example of the input CSV format, look under the documentation folder of the release, where a sample is provided;
  • Wikidata data model : a separate page depicts the target Wikidata data model for ISSNbot

How is it imported?

The precise update algorithms are documented in the Algorithm page.

Typical usage scenario

Run once with no updates

  1. Copy an input CSV file in the input directory
  2. Run java -Dpassword=xxxxx -jar issnbot-app-<version>-onejar.jar load_issn; that command will compute all updates to do by comparing input CSV data with existing Wikidata data, but not send the actual updates;
  3. While running, do tail -f log/issnbot-all.log to watch the log
  4. When the command stops, the report is printed in the console and in the log file
  5. Review the expected output in the output/<filename>_output.csv file, and review records in error in the error folder

Then rerun with updates

  1. If you are happy with the expected output, re-run the command but this time with the update flag to send updates to Wikidata : java -Dpassword=xxxxx -jar issnbot-app-<version>-onejar.jar load_issn update

Then reprocess records in errors

  1. View error records in the error subfolder, and re-process them if necessary. To reprocess them, use the same batch ID than the initial load:
  2. remove original CSV file from input folder
  3. copy error/<filename>.csv_error_api.csv in input folder
  4. Find the previous batchId in the log/issnbot-all.log by looking for this line: o.i.i.IssnBot - Initialized batch identifier to : 1396b5592exxxxxx
  5. Re-run the command by passing this value to the batchId option: java -Dpassword=xxxxx -jar issnbot-app-<version>-onejar.jar load_issn update batchId=1396b5592exxxxxx. Note that the batchId must be 16 characters hexadecimal string.

Error records

The error folder will contain 2 files:

  • <filename>.csv_error_api.csv: records/lines for which the call to Wikidata failed, and that could be reprocessed
  • <filename>.csv_error_data.csv: records/lines for which there is a structural problem in the data, e.g. wrong language code or wrong country code; these records do not need to be reprocessed as the same error will still happen;

Output file

The output folder will contain one CSV file per input CSV file, with the following columns; the status of each column is documented in the Wikidata Update Statuses page:

  1. Time: time the record was processed
  2. ISSN-L: ISSN-L of the record
  3. Wikidata QID: QID of associated Wikidata item
  4. Status: Either
    • SUCCESS
    • ERROR API in case the record produced an error when calling Wikidata API (in this case it will be in the corresponding file in the error folder)
    • ERROR DATA in case the record contains and error in the data (in this case it will be in the corresponding file in the error folder)
  5. Message: The error message if the record was in error, empty otherwise
  6. ISSN-L (P7363): Update status of the ISSN-L property
  7. Label: Update status of the label in Wikidata
  8. Alias: Update status of the alias in Wikidata
  9. Title (P1476): Update status of the title property
  10. Language (P407): Update status of the language property
  11. Place of Publication (P291): Update status of the place of publication property
  12. Official Website (P856): Update status of the official website property. Note this is multivalued, so a global update status is given, with individual update statuses in parenthesis for each value
  13. ISSN1 (P236): Update status for the first ISSN value
  14. ISSN2 (P236): Update status for the second ISSN value
  15. ISSN3 (P236): Update status for the third ISSN value
  16. ISSN4 (P236): Update status for the fourth ISSN value
  17. Cancelled ISSNs (P236): Update status for the cancelled ISSNs values. Note this is multivalued, so a global update status is given, with individual update statuses in parenthesis for each value
  18. Previous values of Place of Publication (P291): Update status for the previously loaded values of place of publication
  19. Previous values of Place of Official Website (P856): Update status for the previously loaded values of official website
  20. Previous values of ISSN (P236): Update status for the previously loaded values of ISSN
  21. Previous Cancelled ISSNs (P236): Update status for the previously loaded values of cancelled ISSNs

Logs

The log subfolder will contain:

  • issnbot-all.log: the full log of this run and all previous runs (logs of a new run get appended in this file)
  • issnbot-errors.log: only ERROR levels log messages of this run and all previous runs
  • issnbot-report.log: only the final report message

You may want to clear the log folder before running the bot in case you want to garantee that the log file contain only the log of the last run of the bot.

Deal with Wikidata maxLag

To deal with Wikidata maxLag, refer to the available options.

Other options

Look at the help message of the ISSNBot for a full description of the options.

  • You can customize the input, output and error folders.
  • You can set a maximum limit of rows to be processed with limit=500
  • You can set the batchId in which the updates will be tracked in EditGroups

Undoing using EditGroups

ISSNBot was registered in the EditGroups tool, a tool that can rollback edits in Wikidata. Everytime the bot is run, it generates a new batchId with this code, unless a batchId value is provided in the command line:

Long.toString((new Random()).nextLong(), 16).replace("-", "");

If you want to undo a load, the simplest way to do is:

  1. Look in the output CSV file and look for one the Wikidata QID that was updated
  2. Go to the corresponding Wikidata item page https://www.wikidata.org/wiki/Qxxxxxxxxx
  3. Go to the Page History
  4. Look for the ISSNBot modification in the history, and click on the "details" link at the end of the message
  5. This will bring you to the EditGroups tool where you can Undo all modifications tracked in this batch

Alternatively, you can go to https://editgroups.toolforge.org/?tool=ISSNBot to see all modifications performed by ISSNBot

Alternativaly, if you know a batch ID, you can go to https://editgroups.toolforge.org/b/ISSNBot/xxxxxxxxxxxxxxxx/

Clone this wiki locally