-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-16022: CNDB-13162: Optimize Component.Type pattern matching by pre-compiling regex patterns #2128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main-5.0
Are you sure you want to change the base?
Conversation
…re-compiling regex patterns (#2104) riptano/cndb#13612 Summary This PR optimizes the Component.Type.fromRepresentation() method by pre-compiling regex patterns in the enum constructor instead of compiling them on every match operation. This improves performance for component type matching, which can occur frequently during SSTable operations. Changes Added pre-compiled Pattern field to Component.Type enum Patterns are now compiled once during enum initialization CUSTOM type correctly handles null pattern Updated matching logic in fromRepresentation() Changed from Pattern.matches() to reusing pre-compiled patterns Added null check with IllegalArgumentException for invalid input Made method package-private (was incorrectly public despite @VisibleForTesting) Added comprehensive unit tests
Checklist before you submit for review
|
|
@lesnik2u, can i get your eyeballs on this. a fair bit changed in the rebasing… |
| { | ||
| if (type.repr != null && Pattern.matches(type.repr, repr) && type.formatClass.isAssignableFrom(format.getClass())) | ||
| if (type.pattern != null && type.pattern.matcher(repr).matches() | ||
| && type.formatClass.isAssignableFrom((null != format ? format.getClass() : SSTableFormat.class))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the ternary operator here cause us to check against the wrong class type?
type.formatClass.isAssignableFrom((null != format ? format.getClass() : SSTableFormat.class))
If format is a subclass of SSTableFormat, then format.getClass() returns the concrete subclass, but shouldn't we be checking against the base SSTableFormat.class that the type was registered with?
Other than this, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have no way to specify a format of SSTableFormat.class as it is not possible to instantiate such an abstract class.
This is why we see the base components registered with null here:
https://github.com/datastax/cassandra/blob/mck-cndb-16022-main-5.0/src/java/org/apache/cassandra/io/sstable/format/SSTableFormat.java#L160-L176
That gets registered for a SSTableFormat.class here: https://github.com/datastax/cassandra/blob/mck-cndb-16022-main-5.0/src/java/org/apache/cassandra/io/sstable/Component.java#L98
To your question, should a call to
Component.Type.fromRepresentation("Data.db", BtiFormat.getInstance())
return the already registered SSTableFormat.Components.Types.DATA
(which is also the same as BtiFormat.Components.Types.DATA)
??
This is tested here in ComponentTest:140
but only with assertEquals(…) (it should be assertSame(…) ?
assertEquals(BtiFormat.Components.Types.DATA, Component.Type.fromRepresentation("Data.db", BtiFormat.getInstance()));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed a squash commit to better align to that^
1fcf963
i'm not aware of any reasons why it shouldn't (or can't) be so…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, makes sense!
|
lesnik2u
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM



https://github.com/riptano/cndb/issues/16022
Port into main-5.0 commit d19ed6d