-
Notifications
You must be signed in to change notification settings - Fork 15
Primitive Parsers
Primitive parsers process a single character or word at the very start of the input text. These parsers are building the blocks of the more elaborate predefined parsers and those built by client code. Typically they're arguments in calls to parser combinators. The following sample usage uses run
for easy testing at the repl; thus the result after the semicolons is what the function prints, not what it returns.
We first load the Kern core
namespace.
(use 'blancas.kern.core)
return
always succeeds, producing the supplied value.
(run (return "foo") "xyz") ;; "foo"
fail
fails with the supplied error message.
(:ok (parse (fail "foo") "xyz")) ;; false
(run (fail "this is bad") "xyz")
;; line 1 column 1
;; this is bad
satisfy
applies the supplied predicate on the first character in the input, which determines success or failure.
(run (satisfy #(= \x %)) "xyz") ;; \x
(run (satisfy #(= \x %)) "Z")
;; line 1 column 1
;; unexpected \Z
any-char
accepts the next character in the input.
(run any-char "x") ;; \x
letter
accepts any lowercase or uppercase letter.
(run letter "z") ;; \z
lower
accepts any lowercase letter.
(run lower "a") ;; \a
upper
accepts any uppercase letter.
(run upper "M") ;; \M
white-space
accepts a whitespace character.
(run white-space "\t") ;; \tab
space
accepts a space character.
(run space " ") ;; \space
tab
accepts a tab character.
(run tab "\t") ;; \tab
digit
accepts a decimal digit character \0 to \9.
(run digit "9") ;; \9
hex-digit
accepts a hex digit character \0 to \9, \A, \B, \C, \D, \E, \F.
(run hex-digit "F") ;; \F
oct-digit
accepts an octal digit character \0 to \7.
(run oct-digit "7") ;; \7
alpha-num
accepts any letter or digit.
(run (many alpha-num) "9A") ;; [\9 \A]
sym*
accepts a specific character.
(run (sym* \x) "x") ;; \x
sym-
accepts a specific character; not case-sensitive. Note that you get the character specified to sym-
, not the one from the input.
(run (sym- \x) "X") ;; \x
token*
parsers a specific sequence of characters, not necessarily delimited. If multiple target sequences are given they're tried in turn until one succeeds or the parser fails.
(run (token* "foo") "football") ;; "foo"
(run (token* "foo" "bar" "baz") "bazaar") ;; "baz"
token-
is a version of token*
that is not case-sensitive.
(run (token- "foo") "FOOTBALL") ;; "foo"
(run (token- "foo" "bar" "baz") "BaZaAr") ;; "baz"
word*
works like token*
but lets you specify what not to accept as delimiters.
(run (word* (one-of* "|-/") "foobar") "foobar*") ;; "foobar"
(run (word* (one-of* "|-/") "football" "foobar") "foobar*") ;; "foobar"
(run (word* (one-of* "|-/") "foobar") "foobar/")
;; line 1 column 8
;; unexpected /
;; expecting end of foobar
word-
is a version of word*
that is not case-sensitive.
(run (word- (one-of* "|-/") "foobar") "FooBar*") ;; "foobar"
(run (word- (one-of* "|-/") "football" "foobar") "FOOBAR*") ;; "foobar"
(run (word- (one-of* "|-/") "foobar") "FoObAr/")
;; line 1 column 8
;; unexpected /
;; expecting end of foobar
one-of*
accepts one of the characters in the supplied string.
(run (one-of* "xyz") "zap") ;; \z
none-of*
accepts any character other than those in the supplied string.
(run (none-of* "xyz") "foo") ;; \f
new-line*
succeeds if the next character is \n.
(run new-line* "\n/**/") ;; \newline
eof
succeeds if the input sequence is empty.
(run (<*> letter digit eof) "U2") ;; [\U \2 nil]
field*
parses an unquoted string delimited by any character in the supplied string.
(run (field* ",;") "California,") ;; "California"
split-on
splits the input text on whitespace or any of the characters in the supplied string.
(run (split-on ",./") "Now, is the time") ;; ["Now" "is the time"]
split
splits the input text on whitespace.
(run split "Now is the time") ;; ["Now" "is" "the" "time"]