All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Be aware that this project is still v0.y.z which means that anything can change anytime:
"4. Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable."
(Semantic Versioning Specification)
We defined for this project that while being on major version zero we mark incompatible changes with
new minor version numbers. Please note that this is no version handling covered by Semver
.
-
Refactor internal structs
- Refactored
CompiledNfa
toCompiledDfa
for clarity - Refactored
ScannerNfaImpl
toScannerImpl
for clarity
- Refactored
-
Minimization of the
CompiledDfa
for enhanced scanning performance -
Introduce the feature
regex_automata
. Using the default feature set, which is actually empty, usually results in a slower scanner, but it is faster at compiling the regexes. Theregex_automata
feature is faster at scanning the input, but it is possibly slower at compiling the regexes. This depends on the size of your scanner modes, i.e. the number of regexes you use.Both features are mutually exclusive.
Using the default feature set is straight forward:
scnr = "0.8.0"
For the feature
regex_automata
to be enabled use this variant:scnr = { version = "0.8.0", default-features = false, features = [ "regex_automata" ] }
- Handle iterator exhaustion in FindMatchesImpl as partial fix for jsinger67/parol#558. This fix updates the line offset vector when a newline character immediately precedes the end of input.
- Improved benchmarking with throughput measurements
- Performance: several optimizations
CompiledNfa
is now able to handle multiple terminals in one automaton per scanner mode- To make creation feasible a new struct
MultiPatternNfa
was introduced as intermediate creation step - Each
Nfa
contains aPattern
now instead of a plainString
to have terminal ids and optional lookahead data available - Introduced a
ScannerCache
that saves time when the same scanner is built multiple times during the lifetime of a parser process
- Two public methods on the
Scanner
struct have changed their arguments, thus this release is potentially breaking
- performance: Refactor NFA end states representation to use a boolean vector for accepting states
- doc: Update UML diagram
- Simplify internal structure and use the fact that only one implementation of
ScannerImplTrait
exists. The trait has lost its purpose. - test: Add hundreds of match tests in integration test
match_test
.
- Cleanup of the library
- Removed DFA scanner variant and making NFA implementation the default, hence
use_nfa()
onScannerBuilder
is no more needed and was removed - Update of documentation
- Performance optimization of scanner phase of compiled NFA
- Support for lookahead, negative and positive. Please see README.md for details.
- Support for Scanners based on NFAs. These scanners can handle overlapping character classes.
Call
use_nfa()
on the scanner builder before callingbuild()
.
let scanner = ScannerBuilder::new()
.add_scanner_modes(&*MODES)
.use_nfa()
.build()
.unwrap();
let find_iter = scanner.find_iter(INPUT).with_positions();
let matches: Vec<MatchExt> = find_iter.collect();
- Provide an iterator adapter
WithPositions
to convert the iterator over typeMatch
to an iterator over typesMatchExt
which contains line and column information for the start position as well as the end position of each match.
let scanner = ScannerBuilder::new().add_scanner_modes(&*MODES).build().unwrap();
let find_iter = scanner.find_iter(INPUT).with_positions();
let matches: Vec<MatchExt> = find_iter.collect();
-
Fixed handling of current scanner mode. There was a bug that scanner mode switching from the outside had no effect on cloned
ScannerImpl
instances. This was fixed by removing the mode from theScanner
and leaving it only on theScannerImpl
. -
We also allow now to set the scanner mode on a
FindMatches
and even on aWithPositions
by implementing the new traitScannerModeSwitcher
for both of them. -
Add some documentation like PlantUML overview diagram to the
doc
folder. Also movedmatching_state.dot
into this folder to have anything in one place. For viewing the PlantUML diagram in Visual Studio Code I recommend the excellent PlantUML extension. Let me add that this overview diagram is in no way complete. It should just give a rough overview.
- Performance:
Scanner
no more holdsScannerImpl
in aRc<RefCell<>>
to save time during creation of a newfind_iter
. InsteadScannerImpl
is nowClone
by wrapping the match functions array in anArc
. This makes theScanner
usable as static global again and has the same effect regarding performance. Scanner::mode_name
returns aOption<&str>
again, instead ofOption<String>
which saves an additional heap allocation.
- Add support for lots of unicode named classes like
XID_Start
andXID_Continue
by the help of theseshat-unicode
crate - Performance: Scanner holds ScannerImpl in a
Rc<RefCell<>>
to save time during creation of a newfind_iter
- Add support for generating compiled DFAs as DOT files to scanner implementation
- Renamed
Scanner::trace_compiled_dfa_as_dot
toScanner::log_compiled_dfas_as_dot
- Fixed some help comments
- Fixed the
Display
implementation ofDFA
- Added a new test to module
internal::match_function
- Added new function
FindMatches::with_offset
to support resetting the input test - Added new function
FindMatches::offset
to retrieve the total offset of the char indices iterator in bytes.
Scanner::find_iter
now returns aFindMatches
directly instead ofResult<FindMatches>
because the construction is basically infallible.
- Add a new API
add_patterns
toScannerBuilder
to support simple use cases with only one scanner state. - Add derive
Debug
trait toScanner
- Add CHANGELOG
- Changed description in Cargo.toml
- First release