You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Redact: Remove the identified feature from the source text. Takes no params.
Replace: Replace an identified entity with a pattern/value supplied via parameter, or optionally for identified entities with the entity category (NAME, ORGANIZATION, etc) output by spaCy.
Mask: Substitute each character of the identified feature with a single char supplied in param. (E.g. "*"). Optionally replace only a subset of the characters as specified (for example, leaving first and last char unmasked) based on supplied start and end char count.
Hash: Hash value to sha256 or md5 (should be designed for extensibility to other hash options).
Managing potential overlaps:
The spaCy EntityRecognizer should not produce overlapping entity spans. However, features identified by regex within the pipeline may produce overlaps. This should (potentially?) be tracked in the db and handled accordingly:
No overlap identified: Apply requested ops as normal
Full overlap: Warn/log and apply in pipeline order (e.g. if a name is recognized by spaCy and a replacement is requested for named ents, do not apply any further op requested for pattern match). Also applies to substring match (e.g. one match contained within another).
Partial overlap: Apply requested op only to partial span
The text was updated successfully, but these errors were encountered:
Operations in core lib:
Redact: Remove the identified feature from the source text. Takes no params.
Replace: Replace an identified entity with a pattern/value supplied via parameter, or optionally for identified entities with the entity category (NAME, ORGANIZATION, etc) output by spaCy.
Mask: Substitute each character of the identified feature with a single char supplied in param. (E.g. "*"). Optionally replace only a subset of the characters as specified (for example, leaving first and last char unmasked) based on supplied start and end char count.
Hash: Hash value to sha256 or md5 (should be designed for extensibility to other hash options).
Managing potential overlaps:
The spaCy EntityRecognizer should not produce overlapping entity spans. However, features identified by regex within the pipeline may produce overlaps. This should (potentially?) be tracked in the db and handled accordingly:
The text was updated successfully, but these errors were encountered: