-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Add encoding to file writing function #405
base: main
Are you sure you want to change the base?
Conversation
Right now, this is just adapted from #395. Not tested, just quickly done in GitHub editor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Same as in #395, we would need some tests, primarily to prevent regressions. After those are added, the PR can be merged.
Sorry, am rather busy right now, so not sure when (or if) I'll have time for that. |
Thanks for the comment. I'll mark the PR to be ready for anyone to continue working on this. If anyone is willing to do so, or if @claell resumes his work: Please comment here to avoid having two people working on the same thing at the same time. p.s. Continuing to work on the PR can be done as follows: Add the fork of @claell as additional |
Hi, we're looking to add I am happy to take over from @claell and write a few tests, though the issue is that I fail to reproduce the original issue (#403) at this time: Using the core functionality, i.e., without any customizations to the formatting options for writing, I am able to write an entry: import bibtexparser
from bibtexparser import *
bib_library = bibtexparser.Library()
fields = []
fields.append(bibtexparser.model.Field("author", "ö"))
entry = bibtexparser.model.Entry("ARTICLE", "test", fields)
bib_library.add(entry)
bibtexparser.write_file("my_new_file.bib", bib_library) i.e., the same MWE as previously reported in #403 (comment), and parsing
@ARTICLE{test,
author = {ö}
} and reading this programmatically: import bibtexparser
library = bibtexparser.parse_file("my_new_file.bib") Therefore, {'test': Entry(entry_type=`article`, key=`test`, fields=`[Field(key=`author`, value=`ö`, start_line=1)]`, start_line=0)} which I am able to write to a new |
Great, thanks a lot for taking over! Strange to hear that the problem cannot be reproduced, I am not aware of any changes we merged recently (although I have not checked ;-) ). As we're talking about encoding, I would not be surprised if the behavior was somewhat system-dependent, which could explain the issue at hand. In either case, I think we should still merge this PR - to keep the interface consistent and generally applicable. While it's not ideal to do so without reproducing the problem above, I'd suggest implementing tests similar to #395 - this should, at the very least, protect us from some regressions. Do you agree?
That would be amazing. I'll be happy to help you along the way wherever I can (feedback, reviews, ...) |
I could not agree more. I am aware systems like Windows choose CP1252 by default for writing to files if UTF-8 isn't specified – maybe the reason I'm not seeing this error is because I'm on macOS? Is it fine if we can discuss things here itself before I proceed to write a PR? I couldn't wrap my head around the source code for the middlewares and customisers for now—very custom engineering TBF—but I did manage to write a simple test for def test_write_article_with_umlauts():
entry_block = Entry(
entry_type="article",
key="myKey",
fields=[
Field(key="title", value='"myTitle"'),
Field(key="author", value='"Müller, Max"'),
],
)
library = Library(blocks=[entry_block])
string = writer.write(library)
assert string == '@article{myKey,\n\ttitle = "myTitle",\n\tauthor = "Müller, Max"\n}\n' I would appreciate feedback on this, afterwards we should be able to Edit: removed some redundant print statements, was just debugging something to stdout |
Your suggested test, as I read it, would not actually test the method targeted in this PR. Instead, you would have to test the method |
Makes sense. I improved the test such that it parses a given BibTeX string with umlauts symbols, writes to a temporary file, and reads from it; as follows: def test_write_file_with_umlauts():
bibtex_str = """@article{umlauts,
author = {Müller, Hans},
title = {A title},
year = {2014},
journal = {A Journal}
}"""
library = parse_string(bibtex_str)
with tempfile.NamedTemporaryFile(mode="w", encoding="utf-8") as f:
write_file(f, library, encoding="utf-8")
f.seek(0)
library = parse_file(f.name, encoding="utf-8")
assert library.entries[0]["author"] == "Müller, Hans"
assert library.entries[0]["title"] == "A title"
assert library.entries[0]["year"] == "2014"
assert library.entries[0]["journal"] == "A Journal" I opted to use temporary files so as to not create any clutter. Does this look fair enough? I can then write a few more tests or parameterize this as needed, perhaps with a few more characters. |
the problem still exists on Windows (as outlined above). |
Fixes #403