Delimiter not taken into account in multi-character tokens

Hi
I can't get dafsa working with multi-character tokens.

with a simple test list defined as:
```
test = ["a b c", "a ab ac", "a ab ab c"]
dseq = DAFSA(test, delimiter=" ")
```

The expected behavior would be to have spaces processed as delimiters but they are considered as tokens:

`print(dseq)`

DAFSA with 10 nodes and 11 edges (3 inserted sequences)

>   +-- #0: 0(#1/3:<a>/3) [('a', 1)]
>   +-- #1: n(#2/3:< >/3) [(' ', 2)]
>   +-- #2: n(#3/3:<a>/2|#7/3:<b>/1) [('a', 3), ('b', 7)]
>   +-- #3: n(#4/2:<b>/2) [('b', 4)]
>   +-- #4: n(#5/2:< >/2) [(' ', 5)]
>   +-- #5: n(#6/2:<a>/2) [('a', 6)]
>   +-- #6: n(#7/2:<b>/1|#9/2:<c>/1) [('b', 7), ('c', 9)]
>   +-- #7: n(#8/2:< >/2) [(' ', 8)]
>   +-- #8: n(#9/2:<c>/2) [('c', 9)]
>   +-- #9: F() []

Same issue with spaces changed to underscores and delimiter="_" added.
I probably did something stupidly wrong...
My system is Windows 11 with Python 3.9.16 et dafsa 1.0 installed.
Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delimiter not taken into account in multi-character tokens #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Delimiter not taken into account in multi-character tokens #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions