-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi
I can't get dafsa working with multi-character tokens.
with a simple test list defined as:
test = ["a b c", "a ab ac", "a ab ab c"]
dseq = DAFSA(test, delimiter=" ")
The expected behavior would be to have spaces processed as delimiters but they are considered as tokens:
print(dseq)
DAFSA with 10 nodes and 11 edges (3 inserted sequences)
+-- #0: 0(#1/3:/3) [('a', 1)]
+-- #1: n(#2/3:< >/3) [(' ', 2)]
+-- #2: n(#3/3:/2|#7/3:/1) [('a', 3), ('b', 7)]
+-- #3: n(#4/2:/2) [('b', 4)]
+-- #4: n(#5/2:< >/2) [(' ', 5)]
+-- #5: n(#6/2:/2) [('a', 6)]
+-- #6: n(#7/2:/1|#9/2:/1) [('b', 7), ('c', 9)]
+-- #7: n(#8/2:< >/2) [(' ', 8)]
+-- #8: n(#9/2:/2) [('c', 9)]
+-- #9: F() []
Same issue with spaces changed to underscores and delimiter="_" added.
I probably did something stupidly wrong...
My system is Windows 11 with Python 3.9.16 et dafsa 1.0 installed.
Thanks!