QPA example: python can't get CPD file from iucr directly

Problem:
---------

Now the QPA example code gets `cpd-1a.prn` by doing
```
curl -O https://www.iucr.org/__data/iucr/powder/QARR/col/cpd-1a.prn
```
but iucr is Cloudflare protected. Cloudflare only hands the interstitial HTML (“Just a moment… Enable JavaScript and cookies to continue” HTML) instead of the raw `.prn` data.

Proposed Solution:
-------------------

Either 
1. use some python packages to bypass Cloudflare's anti-bot page (for example cloudscraper) or 
2. include the `.prn` file directly in the source.

I'd like to go for 2. It doesn't sound like a good idea to scrape or crawl iucr website, with some extra dependency tools. Cloudflare will also continually changing and hardening their protection page.

I'm not sure about any potential legal issue with redistributing scientific data from iucr, but [iucr’s policy](https://journals.iucr.org/services/copyrightpolicy.html) states that:

> Copyright protection is not extended to files of scientific data (e.g. structural data CIFs, structure factors, primary diffraction images), and such data sets may be used freely for bona fide research purposes within the scientific community so long as proper attribution is given to the source from which they were obtained.

so proper attribution to the source could be needed if data files will be distributed with the package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QPA example: python can't get CPD file from iucr directly #54

Problem:

Proposed Solution:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

QPA example: python can't get CPD file from iucr directly #54

Description

Problem:

Proposed Solution:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions