Build my own corpus by a specific language with its part of speech

I think if i want to use a specific tokenizer (for processing language such as CJK) to build corpus with part of speech,
i should implement my own tokenstream and set it to CorpusData object 
and call encode method to format it.
And with the help of decode function in https://github.com/PolMine/polmineR
i can perform CQP on my own corpus .
(then it is only require install cwbtools and polmineR without need the help from
http://cwb.sourceforge.net/devs.php)

I want to know if i am right ?

And if the lexer use to parse CQP can also match the “pos” i defined by my own specific tokenizer ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build my own corpus by a specific language with its part of speech #41

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Build my own corpus by a specific language with its part of speech #41

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions