Skip to content

QPA example: python can't get CPD file from iucr directly #54

@Tieqiong

Description

@Tieqiong

Problem:

Now the QPA example code gets cpd-1a.prn by doing

curl -O https://www.iucr.org/__data/iucr/powder/QARR/col/cpd-1a.prn

but iucr is Cloudflare protected. Cloudflare only hands the interstitial HTML (“Just a moment… Enable JavaScript and cookies to continue” HTML) instead of the raw .prn data.

Proposed Solution:

Either

  1. use some python packages to bypass Cloudflare's anti-bot page (for example cloudscraper) or
  2. include the .prn file directly in the source.

I'd like to go for 2. It doesn't sound like a good idea to scrape or crawl iucr website, with some extra dependency tools. Cloudflare will also continually changing and hardening their protection page.

I'm not sure about any potential legal issue with redistributing scientific data from iucr, but iucr’s policy states that:

Copyright protection is not extended to files of scientific data (e.g. structural data CIFs, structure factors, primary diffraction images), and such data sets may be used freely for bona fide research purposes within the scientific community so long as proper attribution is given to the source from which they were obtained.

so proper attribution to the source could be needed if data files will be distributed with the package.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions