Skip to content

Yoz75/WordGenerator2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word generator 2 (wg2 shortly) is a token based text generator. A tokenizer slices input text to small pieces (tokens) and remembers wich tokens go after wich. After a generator selects random token and creates next tokens in the chain.
THE MORE TEXT PROVIDED AS INPUT, THE BETTER RESULT WILL BE. I think an acceptable amount would be 500-600 kilobytes

изображение

Note

New version uses Console Application Interface, to use console arguments, download v1.0 or select console-arguments branch.

Write help command to view commands and their usage.

Short commands description

  • run [path/text] -- if file with that name exists, process file, otherwise process input
  • ts [number] -- set token size to [number], as smaller the [number], as smaller pieces of text will create tokenizer and vice versa
  • tmin [number] -- set minimal token size to [number], created to support tokenizers with random token size
  • tmax [number] -- set maximal token size to [number], created to support tokenizers with random token size
  • tsc [number] -- set subsequent tokens count to [number]. This number is means, how many tokens after itself a token will "remember". I. e this value specifies size of List[] SubsequentTokens array (SubsequentTokens = new List[count]. For example, when tsc=5, Token will remember tokens at 5 next positions.
  • tn [number] -- count of next tokens, that token will remebmer. For example if tsc=5 and tn=10, token will remember 10 tokens at 5 next positions
  • tg [number] -- generate [number] tokens. Do not confuse characters and tokens. If ts=2 and tg=200, wg2 will generate 400 symbols
  • tr [number in range 0..1] -- chance random next token (default 0)
  • separate [size/space] -- "space": separate tokens with spaces, potentially works better with analytic languages (like english)
    "size": default separating tokens to several characters
  • fr [number] -- process input to tokens, generate response and tokenize it again ([number] times)
  • hp -- set high process priority (requires admin rights!)
  • np -- set normal process priority
  • debinf [true/false] -- log debug info?

Standard source files

  • abracadabra.txt -- just abracadabra lol
  • words.txt-- just some russian words, separated with comma
  • englishWords.txt -- google translated words.txt
  • source.txt -- the biggest standard source file, contains very much russian text (817KB)