Skip to content

Conversation

@michaelsembwever
Copy link
Member

https://github.com/riptano/cndb/issues/16022

Port into main-5.0 commit d19ed6d

CNDB-16022: CNDB-13162: Optimize Component.Type pattern matching by pre-compiling regex patterns 
https://github.com/riptano/cndb/issues/13612

Summary
This PR optimizes the Component.Type.fromRepresentation() method by pre-compiling regex patterns in the enum constructor instead of compiling them on every match operation. This improves performance for component type matching, which can occur frequently during SSTable operations.
Changes

Added pre-compiled Pattern field to Component.Type enum

Patterns are now compiled once during enum initialization CUSTOM type correctly handles null pattern

Updated matching logic in fromRepresentation()

Changed from Pattern.matches() to reusing pre-compiled patterns Added null check with IllegalArgumentException for invalid input Made method package-private (was incorrectly public despite @VisibleForTesting)

Added comprehensive unit tests

…re-compiling regex patterns (#2104)

riptano/cndb#13612

Summary
This PR optimizes the Component.Type.fromRepresentation() method by
pre-compiling regex patterns in the enum constructor instead of
compiling them on every match operation. This improves performance for
component type matching, which can occur frequently during SSTable
operations.
Changes

Added pre-compiled Pattern field to Component.Type enum

Patterns are now compiled once during enum initialization
CUSTOM type correctly handles null pattern

Updated matching logic in fromRepresentation()

Changed from Pattern.matches() to reusing pre-compiled patterns
Added null check with IllegalArgumentException for invalid input
Made method package-private (was incorrectly public despite
@VisibleForTesting)

Added comprehensive unit tests
@github-actions
Copy link

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@michaelsembwever
Copy link
Member Author

@lesnik2u, can i get your eyeballs on this. a fair bit changed in the rebasing…

{
if (type.repr != null && Pattern.matches(type.repr, repr) && type.formatClass.isAssignableFrom(format.getClass()))
if (type.pattern != null && type.pattern.matcher(repr).matches()
&& type.formatClass.isAssignableFrom((null != format ? format.getClass() : SSTableFormat.class)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the ternary operator here cause us to check against the wrong class type?

type.formatClass.isAssignableFrom((null != format ? format.getClass() : SSTableFormat.class))
If format is a subclass of SSTableFormat, then format.getClass() returns the concrete subclass, but shouldn't we be checking against the base SSTableFormat.class that the type was registered with?

Other than this, LGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have no way to specify a format of SSTableFormat.class as it is not possible to instantiate such an abstract class.

This is why we see the base components registered with null here:
https://github.com/datastax/cassandra/blob/mck-cndb-16022-main-5.0/src/java/org/apache/cassandra/io/sstable/format/SSTableFormat.java#L160-L176

That gets registered for a SSTableFormat.class here: https://github.com/datastax/cassandra/blob/mck-cndb-16022-main-5.0/src/java/org/apache/cassandra/io/sstable/Component.java#L98

To your question, should a call to

Component.Type.fromRepresentation("Data.db", BtiFormat.getInstance())

return the already registered SSTableFormat.Components.Types.DATA
(which is also the same as BtiFormat.Components.Types.DATA)
??

This is tested here in ComponentTest:140
but only with assertEquals(…) (it should be assertSame(…) ?

assertEquals(BtiFormat.Components.Types.DATA, Component.Type.fromRepresentation("Data.db", BtiFormat.getInstance()));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed a squash commit to better align to that^
1fcf963

i'm not aware of any reasons why it shouldn't (or can't) be so…

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, makes sense!

@sonarqubecloud
Copy link

Copy link

@lesnik2u lesnik2u left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants