Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of little clean-up after running the code and seeing corner cases #59

Merged
merged 2 commits into from
Feb 10, 2025

Conversation

dpp
Copy link
Contributor

@dpp dpp commented Feb 10, 2025

💻 Description of Change(s) (w/ context)

Catch exceptions generated by Tika so processing an artifact does not cease if Tika can't determine mime type.

Better logging for manifest parsing.

Catch an out of memory error when trying to parse invalid bytecode

🧠 Rationale Behind Change(s)

Each of the above issues was causing a whole artifact to not be processed. Catching the exceptions/errors
and substituting a reasonable value and logging the issues seems to be a better approach

📝 Test Plan

Existing tests pass.

📜 Documentation

What documentation did you add or update?
Was the documentation appropriate for the scope of this change?

💣 Quality Control

(All items must be checked before a PR is merged)
Did you…

  • Mention an issue number in the PR title?
  • Update the version # in the build file?
  • Create new and/or update relevant existing tests?
  • Create or update relevant documentation and/or diagrams?
  • Comment your code?
  • Fix any stray verbose logging (removing, or moving to debug / trace level)?

Before Merging…

  • Make sure the Quality Control boxes are all ticked
  • Make sure any open comments or conversations on the PR are resolved

… cease. Better logging for manifest parsing, Catch an out of memory error when trying to parse invalid bytecode

Signed-off-by: David Pollak <[email protected]>
props.load(manifest.asStream())
} catch {
case e: Exception =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎊

@@ -178,6 +178,11 @@ object Helpers {

val clz = cp.parse()
clz.getSourceFilePath()
} catch {
case e: OutOfMemoryError =>
// if the classfile is corrupt, we may get an OOME, swallow it and just don't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa.

@dpp dpp merged commit 95675b3 into main Feb 10, 2025
2 checks passed
@dpp dpp deleted the better_tika branch February 12, 2025 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants