-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IMAGING-168 installing package with Swedish characters adds junk char… #18
base: trunk
Are you sure you want to change the base?
Conversation
…acters to dc:title property
@@ -125,6 +127,9 @@ public PhotoshopApp13Data parsePhotoshopSegment(final byte[] bytes, | |||
protected List<IptcRecord> parseIPTCBlock(final byte[] bytes, final boolean verbose) | |||
throws IOException { | |||
final List<IptcRecord> elements = new ArrayList<IptcRecord>(); | |||
final String DEFAULT_ENCODING = "ISO-8859-1"; | |||
final int ENV_TAG_CODED_CHARACTER_SET = 90; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should both better be private static final
constants on this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ENV_TAG_CODED_CHARACTER_SET
could even be added to IptcConstants
.
… encoding of the IPTC tag values
Hello @khemkasudeep, after applying your changes I get:
Without your changes, all tests pass. Can you have a look? Please make sure to run |
@britter Tests are being passed in windows machine. What OS are you using? |
@khemkasudeep I'm on
Maybe you can try in a Linux VM? |
Hi @britter I tried on But it again passed. Can it depend on locale? |
Hello @khemkasudeep here is the build log: https://gist.github.com/britter/ec0b7b15128bf8931e6c I'm currently looking for a way to configure surefire to use english locale on my system... |
@britter I tried below. Even then the test passed. |
@khemkasudeep I also found that one and tried to set my build to your locale. Build still failing. I assume the configuration doesn't work or this problem is unrelated to locale. |
Hi @britter , @khemkasudeep |
return codedCharacterSetString; | ||
} | ||
}catch (IllegalCharsetNameException e){ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth to add at least one comment saying why these exceptions are not important, and that can be safely ignored.
I patched this library with this merge request myself and can verify that under windows and linux all tests are passing successfully and the bugfix is solved successfully, too. Please merge this to fix this bug!
PS: If you want to do this, too:
|
Hi @janArb ! Thanks for testing and reporting back here. Unfortunately there is some feedback not addressed in the pull request. I can sort out the conflict files, but would be better if @khemkasudeep could reply/address the pending review comments. Otherwise we would need a new clean pull request, but that would be bad as @khemkasudeep appears to have a working solution that only needs a bit of polishing. I'll see if I can have another look with more calm in the next days. Bruno |
final byte[] recordData = element.value.getBytes("ISO-8859-1"); | ||
if (!new String(recordData, "ISO-8859-1").equals(element.value)) { | ||
final byte[] recordData = element.value.getBytes(DEFAULT_ENCODING); | ||
if (!new String(recordData, DEFAULT_ENCODING).equals(element.value)) { | ||
throw new ImageWriteException( | ||
"Invalid record value, not ISO-8859-1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also use `DEFAULT_ENCODING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this way we won't forget changing the exception text 👍
BTW I think Latin1 is one of the 8bit transparent encodings, so the old code would never be unequal. This is one of the reasons why Latin1 is used if a encoding is not well defined. I do not understand iptc enough here to say if it is good or bad to use a different encoding, but it might need to be configurable... |
@ecki It was not configurable before and latin is still the fallback if no encoding is defined. Why should this bugfix here solve more features? I think an encoding configuration should be treated as feature. |
this.iptcType = iptcType; | ||
byte[] tempBytes; | ||
try { | ||
tempBytes = value.getBytes("ISO-8859-1"); | ||
tempBytes = value.getBytes(charsetName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think the code looks good to be merged. I would need to find some spec/docs for reference to clear up a few more questions about the issue.
But the main issue right now are the conflicts. The tempBytes
was removed from IptcRecord
, and now we have only the String value
. We need to look at the git history and past issues, and try to understand why it was changed this way.
Then we can resolve the conflicts and leave it ready to be merged 👍
I don't know much about IPTC, and this issue with transparent encodings. But looks to me like after this PR we would have ISO-8859-1 as default/fallback, but if within the metadata another value was defined, then the parser would now respect and use it... so in a certain way it would be configurable I think? |
Hi! And thanks again for your contribution. I just fetched the code from your branch as was in the process or rebasing against I will need to look at the commit log and understand better what changed why, and how to massage your code to get it in a state ready to be merged again (unless you have spare time to fix it?). Once that's done, I think it's ready to be merged - and @janArb kindly tested it as well, so looks like a good issue to be included in Bruno |
Yes, exactly, the code I commented did not look like it is configurable. If it is, it might be fine (however configurable is not very useful in scenarios where you work on collections of files where some have different encoding, a binary transparent encoding instead of encoding exceptions is preferred in that case (at least to me). |
Just realized that even after the patch for reading UTF8 the writing of IPTC tags is also still limited to Latin ( commons-imaging/src/main/java/org/apache/commons/imaging/formats/jpeg/iptc/IptcParser.java Line 489 in 2752dd7
|
…acters to dc:title property