-
Notifications
You must be signed in to change notification settings - Fork 296
Segment List
Is a list of parts used to construct a sequence of characters. Each part represents a range of
from the original sequence or text outside the original sequence implemented by SegmentList
with properties:
-
range: range of the resulting sequence in original sequence.Range.NULLfor a list which does not contain such information. ℹ️ null range has start =Integer.MAX_VALUEand end =Integer.MIN_VALUE -
startOffset:rangestart, start offset of the resulting sequence in original sequence, -
endOffset:rangeend, end offset of the resulting sequence in original sequence, -
lastSegment: last segment contained in the list, is null for empty list. -
segments: list of segments describing the resulting sequence and their position in the original sequence. -
length: length of resulting sequence. -
textLength: length of all text in the resulting sequence which is out of the original sequence.
STRING type and
TEXT no longer has any base offset information. BASE or ANCHOR segments before/after
TEXT provide the base sequence context. If the TEXT segment is first in the list then
SegmentBuilder startOffset is the base context for its start, if it is last then endOffset
of string builder is the endOffset for its base context
Each segment part consists of a range in the original sequence that it represents and text from
out of original sequence. It has span and length properties, with length being the length
of text it is contributing to the sequence being constructed. span is the number of characters
in the original sequence this segment represents or replaces. Either the span or length can
be 0. If both are zero then this is a NULL segment with no effect on the resulting sequence
and should not be part of the segment list.
A segment list can contain the following segment types:
-
NULL: null segment, should not exist in a normalized list. Ignored when building a list -
ANCHOR: range of span =0, text is empty,spanis span of range,lengthandspan= 0, position is the offset of this anchor in the original sequence. This segment does not exist in a normalized list. Used only to provide range information toSTRINGand list while building a list. -
STRING: range is null, text is not empty,lengthis length of text,spanis 0 -
TEXT: range of span >= 0, text is not empty,lengthis length of text,spanis span of range. Start isstartOffsetof list if it is the first element in the list. Otherwise it is equal to previousBASEend. End offset is equalendOffsetof list when the last segment in a list. Otherwise it is equal to start of followingBASE. -
BASE: range of span >0, text is empty,spanis span of range,lengthisspan
Only ANCHOR, BASE or STRING types can be appended to a segment list and always results in
a normalized segment list. Concrete ANCHOR and BASE are Range instances with the former
having Range.getSpan() == 0. Concrete STRING is String.
ANCHOR is can only set startOffset of a list which has no offset information or extend its
end offset if it is less than the ANCHOR end. It is only used while building list to mark the
start/end extent of a segment list in original sequence. It adds no content and is ignored when
it does not affect the extent of the resulting sequence.
-
lastSegmentis the last segment in the list or null for empty list -
endOffset,rangeend are equal, iflastSegmentis not null then its end is equal toendOffset -
startOffset,rangestart are equal, list is not empty then the first segment start is equal tostartOffset -
lengthequals the sum oflengthof segments in a list -
textLengthequals the sum oflengthof nonBASEsegments in the list. -
STRINGsegment can only be the the only element in a list with null range. -
TEXTstart will always bestartOffsetif first element or end of previousBASE. end offset will beendOffsetifTEXTis last segment in list or it will be start of nextBASE.
Result of an appending operation are determined only by the list range and lastSegment
properties, the segment being appended.
Only appending of a BASE can be: disjoint, adjacent or overlapped and only if the list
has original sequence offset information.
-
disjoint: appended segment has no range information, list has no offset information, or segment start >endOffset. -
adjacent: appended segment start ==endOffset -
overlapped: appended segment start <endOffset
The overlap requires resolving and depends on type of sequence and has the following outcomes based on result of the overlap resolution:
- null: appending range is thrown out, list unchanged, for example if the last segment contains the appending range.
-
BASE: last segment end set to appending range end. Overlap is removed from appending range and resulting adjacent range merged into the last segment. -
TEXT: overlap converted to text of original sequence with start set to last segment end. The overlap contains the appending range and is converted to original sequence characters. -
TEXT,BASE: converted to text of original sequence characters and non-overlapping range. The intersection of appending range and last segment is converted to original sequence characters and a range. with start == end of last segment. The range ofTEXTsegment is the end of last segment and start of converted non-overlapped range.
| Last | Appended | Result (= replaces last, + appended) | Description |
|---|---|---|---|
BASE |
+BASE
|
||
STRING |
+STRING
|
||
ANCHOR |
startOffset/endOffset set to anchor pos |
||
STRING |
ANCHOR |
=TEXT
|
startOffset/endOffset and TEXT start/end = anchor position |
STRING |
STRING |
=STRING
|
STRING appended text |
STRING |
BASE |
=TEXT +BASE
|
startOffset and TEXT start/end set to BASE start, endOffset set to range end |
TEXT |
STRING |
=TEXT
|
STRING appended to TEXT text |
TEXT |
BASE |
=TEXT +BASE
|
TEXT end, endOffset = range start |
BASE |
BASE |
=BASE
|
appending range start == endOffset, BASE end = appending range end |
BASE |
STRING |
+TEXT
|
TEXT start = range end, text = STRING text |
BASE |
BASE |
+BASE
|
appended range start is > endOffset
|
BASE |
BASE |
see overlapped append | appending range start < endOffset, overlap resolution required |