Using text2ngram with huge corpus files

I created a 2.7 GB corpus file for Turkish. But it seems text2ngram can't handle such a big file. Can some optimizations be made in the program to work in large files?

On my system [1] second iteration can't finish:

`for i in 1 2 3; do text2ngram  -n $i -l -f sqlite -o database_aa.db mytext.filtered; done`

By the way, thanks for the open source alternative to XT9 and good documentation on how to use it :) I already start test it with a small corpus [2].

[1] 5950HQ + 16 GB RAM
[2] https://pbs.twimg.com/media/DY_ftChXUAAQP3t.jpg:large

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using text2ngram with huge corpus files #24

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Using text2ngram with huge corpus files #24

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions