You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What steps will reproduce the problem?
1. Have a corpus with mixed-case or punctuation
2. Run any of the algorithms
What is the expected output? What do you see instead?
The output would have things lower-cased as needed and the punctuation handled
according to user-specified rules.
Ideally, we could support some type of filter that would take in a Document and
transform it according to whatever rules it wanted. This might be useful to
incorporate with the token filter and IteratorFactory? Or it could be a step
that exists totally in GenericMain?
Original issue reported on code.google.com by [email protected] on 17 Jul 2011 at 12:16
The text was updated successfully, but these errors were encountered:
Original issue reported on code.google.com by
[email protected]
on 17 Jul 2011 at 12:16The text was updated successfully, but these errors were encountered: