-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
installation problems with pip 7+ #8
Comments
There might also be some magic flag I can set somewhere in the tool chain that tells pip or pypi or someone to never try to make a wheel for this package, even locally for caching purposes. I can't find such a thing right now, but that's another avenue of investigation. |
I think that the first method is used in the moviepy package. When you |
@fitnr hmm, that is definitely another option. I'm not a huge fan of that approach because making a network request on |
Another option would be to simply include the corpora project data as part of the package source, then periodically release new versions of the package with updated data. (pro: simple; con: constantly answering the question, "why can't I use the data I just pull-requested into the corpora project with pycorpora?" etc) |
I agree about auto-downloading on |
The benefit of writing more software to deploy software is that it's a potentially an infinite loop :) |
After talking with Allison about this problem, I think it would be useful to include a version of the corpora zip without worrying too much about keeping it updated. Even an old version will satisfy most people. On top of that we can add a downloader for mirroring the most up-to-date version to a specific directory, and loading corpora from a directory. Each installation would need to decide which directory was right for it. And on top of that we could adapte the nltk implementation of default_download_dir (http://www.nltk.org/_modules/nltk/downloader.html) to pick a good default download directory. (nltk is Apache licensed.) |
Thinking about it a bit more after your pull request, I've been mentally leaning toward just taking the corpora data completely out of the hands of the module and having an I'm wary of engineering a situation where what's in this module and what's in the official repo are different, since the easiest way to browse corpora is poking through the GitHub repo, and I'm anticipating a bunch of GitHub issues where people will be like "I'm getting a file not found error for ((file just added to corpora yesterday)), what gives?" Forcing the user to download the files has the benefit of being explicit, admittedly with the drawback of requiring an extra step. In my head I'm optimizing for two scenarios: first, the afternoon workshop tutorial, and second, the situation in which I've just submitted a pull request to Darius and want to use what I submitted immediately after the PR is accepted. In the former case, the workflow from my proposed implementation is a bit more complicated than the ideal, but still seems pretty simple; you just need to do something like
In the second scenario, you just need to merge upstream into your own local fork and then:
Or even something radical, like
... moving the functionality of the module into a class, which would have the additional (speculative) benefit of being able to use two different copies of corpora at once. Unfortunately any of these scenarios (aside from just including a copy of the corpora data in this repo) basically mean rewriting the module from scratch. shrug |
I'm going to try including a copy of corpora in olipy. I'll change my corpus-loading API to be compatible with pycorpora, so that when we come up with a better solution I can switch over easily. |
After upgrading
pip
to the newest version, installation ofpycorpora
fails. Or, more specifically: the library installs fine, but the data files are missing. After some investigation, it appears that in recent versions ofpip
, packages downloaded from PyPI are locally cached as wheels when first installed; subsequent installations of cached packages circumvent the build process entirely. Right now,setup.py
downloads and installs the corpora project data as part of the build process; if the build process doesn't run, no data is downloaded, and so you'll have sessions that look like:It looks like you can tell pip to not use pre-cached wheels by invoking the command like so:
... so that's the short-term workaround. I'm not sure what a more permanent fix would be; it would probably take one of the following forms:
python -m pycorpora download
), the same way (e.g.) nltk does it. (pro: never have to worry about this issue again; simplifies the build process considerably; con: you won't be able to make reproducible installations of pycorpora projects with arequirements.txt
file alone; need a bunch of code to find the appropriate place to store the data on a per system/user/project basis)The text was updated successfully, but these errors were encountered: