Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/jsinger67/scnr
Browse files Browse the repository at this point in the history
  • Loading branch information
jsinger67 committed Feb 5, 2025
2 parents 751a95e + 37ff35d commit 99f417f
Show file tree
Hide file tree
Showing 18 changed files with 750 additions and 201 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ Be aware that this project is still v0.y.z which means that anything can change
We defined for this project that while being on major version zero we mark incompatible changes with
new minor version numbers. Please note that this is no version handling covered by `Semver`.

## 0.7.2 - Not release yet

- Refactor internal structs
- Refactored `CompiledNfa` to `CompiledDfa` for clarity
- Refactored `ScannerNfaImpl` to `ScannerImpl` for clarity
- Minimization of the `CompiledDfa` for enhanced scanning performance

## 0.7.1 - 2025-01-21

- Handle iterator exhaustion in FindMatchesImpl as partial fix for
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ fn main() {
## Guard rails

* The scanners should be built quickly.
* Scanners are based on NFAs internally.
* Scanners are based on finite automata internally.
* The scanners only support `&str`, i.e. patterns are of type `&str` and the input is of type
`&str`. `scnr` focuses on programming languages rather than byte sequences.

Expand Down Expand Up @@ -114,8 +114,8 @@ The lookahead patterns denoted above as `S` are not considered as part of the ma

## Greediness of repetitions

The generated scanners work with *compact NFAs* in which all repetition patterns like `*` and `+`
match **greedily**.
The generated scanners work with *compact DFAs* in which all repetition patterns like `*`, `+` and
`?` match **greedily**.

The `scnr` scanner generator does not directly support non-greedy quantifiers like *? or +? found in
some other regex engines. However, you can achieve non-greedy behavior by carefully structuring your
Expand Down
18 changes: 9 additions & 9 deletions doc/scnr.puml
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ package internal {
- advance_char_indices_beyond_match(char_indices: &mut CharIndices, matched: Match)

}
struct ScannerNfaImpl {
struct ScannerImpl {
- current_mode: usize
~ match_char_class(char_class_id: CharClassID, c: char) -> bool
~ mode_name(&self, index: usize) -> Option<&str>
Expand All @@ -161,7 +161,7 @@ package internal {
object SCANNER_CACHE<<(S, #FF7700) Singleton>>
struct ScannerCacheEntry<<tuple>> {
- modes: Vec<ScannerMode>
- nfa: ScannerNfaImpl
- scanner: ScannerImpl
}
struct CharacterClassRegistry {
}
Expand All @@ -183,7 +183,7 @@ package internal {
struct CompiledScannerMode {
~ name: String
}
struct CompiledNfa {
struct CompiledDfa {
- pattern: String
- end_states: Vec<StateSetID>
}
Expand All @@ -205,10 +205,10 @@ package internal {
~ terminal: TerminalID,
}

FindMatchesImpl *--> ScannerNfaImpl: - scanner_impl
FindMatchesImpl *--> ScannerImpl: - scanner_impl

ScannerNfaImpl *--> CharacterClassRegistry: ~ character_classes
ScannerNfaImpl *--> "*" CompiledScannerMode: ~ scanner_modes
ScannerImpl *--> CharacterClassRegistry: ~ character_classes
ScannerImpl *--> "*" CompiledScannerMode: ~ scanner_modes

CharacterClassRegistry *--> "*" CharacterClass: - character_classes

Expand All @@ -233,8 +233,8 @@ package internal {
CompiledScannerMode *--> "*" NfaWithTerminal: ~ nfas
CompiledScannerMode *--> "*" ScannerModeTransition: ~ transitions

CompiledNfa *--> "*" StateData: ~ states
CompiledNfa "Option" *--> CompiledLookahead: ~ lookahead
CompiledDfa *--> "*" StateData: ~ states
CompiledDfa "Option" *--> CompiledLookahead: ~ lookahead
ScannerCache <-- SCANNER_CACHE: + instance_of

}
Expand All @@ -255,7 +255,7 @@ ScannerBuilder *--> "1*" ScannerMode: - scanner_modes
ScannerBuilder .> Scanner: build()
ScannerBuilder ...> SCANNER_CACHE: uses

Scanner *--> internal.ScannerNfaImpl: - inner
Scanner *--> internal.ScannerImpl: - inner
Scanner -|> ScannerModeSwitcher: implements
Scanner .> FindMatches: find_iter()

Expand Down
4 changes: 2 additions & 2 deletions src/find_matches.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use log::trace;

use crate::{
internal::{find_matches_impl::FindMatchesImpl, ScannerNfaImpl},
internal::{find_matches_impl::FindMatchesImpl, ScannerImpl},
Match, Position, PositionProvider, ScannerModeSwitcher,
};

Expand Down Expand Up @@ -34,7 +34,7 @@ pub struct FindMatches<'h> {

impl<'h> FindMatches<'h> {
/// Creates a new `FindMatches` iterator.
pub(crate) fn new(scanner_impl: ScannerNfaImpl, input: &'h str) -> Self {
pub(crate) fn new(scanner_impl: ScannerImpl, input: &'h str) -> Self {
Self {
inner: FindMatchesImpl::new(scanner_impl, input),
}
Expand Down
8 changes: 7 additions & 1 deletion src/internal/character_class_registry.rs
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,12 @@ impl CharacterClassRegistry {
self.character_classes.is_empty()
}

/// Creates a match function for the character classes in the registry.
///
/// Safety:
/// The callers ensure that the character classes in the registry are valid.
/// All character classes in the registry are valid which is guaranteed by the construction
/// of the registry.
pub(crate) fn create_match_char_class(
&self,
) -> Result<Box<dyn (Fn(CharClassID, char) -> bool) + 'static + Send + Sync>> {
Expand All @@ -76,7 +82,7 @@ impl CharacterClassRegistry {
})?;
Ok(Box::new(move |char_class, c| {
// trace!("Match char class #{} '{}' -> {:?}", char_class.id(), c, res);
match_functions[char_class].call(c)
unsafe { match_functions.get_unchecked(char_class.as_usize()).call(c) }
}))
}
}
Expand Down
Loading

0 comments on commit 99f417f

Please sign in to comment.