-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dashes in category/file names make retrieval difficult #11
Comments
Generally speaking, it'd be good to have corpora all nice and consistent, but a great thing about that project is it gets contributions from people who aren't familiar with Git in the first place, which is already quite a hurdle. So it's probably better to have this tool deal with it. (It might be an idea to have a guideline to avoid dashes over in corpora. It may be worth converting existing filenames, but then it may break code alway using it. And both those things are unnecessary if these tools deal with it.) |
I merged a fix for this in #9 a few weeks ago, actually. It just hasn't made it to PyPI yet. For now, you can take advantage of the fix by installing directly from github. I'll leave this open until I have a chance to make a new release and close when the fix is generally available. |
@aparrish Is this on PyPI now? I took a look at it but it seems like it's still at |
Ok! |
At the moment there are categories in corpora like "film-tv" and files like "materials/abridged-body-fluids" which cannot be accessed using the standard syntax of
pycorpora.category_name.file_name['key']
, because-
is not a legal character in Python identifiers.I can work around this as follows:
getattr(pycorpora, 'film-tv').tv_shows['tv_shows']
pycorpora.materials.get_file('abridged-body-fluids')['abridged body fluids']
However, this isn't ideal and probably either pycorpora should perform these workarounds internally (translating
-
to_
, for instance), or corpora should restrict category and file names to valid JS/Python/C (for example) identifiers.I've opened a similar issue in corpora: dariusk/corpora#236.
The text was updated successfully, but these errors were encountered: