Word generator 2 (wg2 shortly) is a token based text generator. A tokenizer slices input text to small pieces (tokens) and remembers wich tokens go after wich. After a generator selects random token and creates next tokens in the chain.
THE MORE TEXT PROVIDED AS INPUT, THE BETTER RESULT WILL BE. I think an acceptable amount would be 500-600 kilobytes
Note
New version uses Console Application Interface, to use console arguments, download v1.0 or select console-arguments branch.
Write help command to view commands and their usage.
- run [path/text] -- if file with that name exists, process file, otherwise process input
- ts [number] -- set token size to [number], as smaller the [number], as smaller pieces of text will create tokenizer and vice versa
- tmin [number] -- set minimal token size to [number], created to support tokenizers with random token size
- tmax [number] -- set maximal token size to [number], created to support tokenizers with random token size
- tsc [number] -- set subsequent tokens count to [number]. This number is means, how many tokens after itself a token will "remember". I. e this value specifies size of List[] SubsequentTokens array (SubsequentTokens = new List[count]. For example, when tsc=5, Token will remember tokens at 5 next positions.
- tn [number] -- count of next tokens, that token will remebmer. For example if tsc=5 and tn=10, token will remember 10 tokens at 5 next positions
- tg [number] -- generate [number] tokens. Do not confuse characters and tokens. If ts=2 and tg=200, wg2 will generate 400 symbols
- tr [number in range 0..1] -- chance random next token (default 0)
- separate [size/space] -- "space": separate tokens with spaces, potentially works better with analytic languages (like english)
"size": default separating tokens to several characters - fr [number] -- process input to tokens, generate response and tokenize it again ([number] times)
- hp -- set high process priority (requires admin rights!)
- np -- set normal process priority
- debinf [true/false] -- log debug info?
- abracadabra.txt -- just abracadabra lol
- words.txt-- just some russian words, separated with comma
- englishWords.txt -- google translated words.txt
- source.txt -- the biggest standard source file, contains very much russian text (817KB)