Skip to content

Primitive Parsers

Armando Blancas edited this page Aug 23, 2017 · 20 revisions

Primitive parsers process a single character or word at the very start of the input text. These parsers are building the blocks of the more elaborate predefined parsers and those built by client code. Typically they're arguments in calls to parser combinators. The following sample usage uses run for easy testing at the repl; thus the result after the semicolons is what the function prints, not what it returns.

We first load the Kern core namespace.

(use 'blancas.kern.core)

return always succeeds, producing the supplied value.

(run (return "foo") "xyz")  ;; "foo"

fail fails with the supplied error message.

(:ok (parse (fail "foo") "xyz"))  ;; false
(run (fail "this is bad") "xyz")
;; line 1 column 1
;; this is bad

satisfy applies the supplied predicate on the first character in the input, which determines success or failure.

(run (satisfy #(= \x %)) "xyz")  ;; \x
(run (satisfy #(= \x %)) "Z")
;; line 1 column 1
;; unexpected \Z

any-char accepts the next character in the input.

(run any-char "x")  ;; \x

letter accepts any lowercase or uppercase letter.

(run letter "z")  ;; \z

lower accepts any lowercase letter.

(run lower "a")  ;; \a

upper accepts any uppercase letter.

(run upper "M")  ;; \M

white-space accepts a whitespace character.

(run white-space "\t")  ;; \tab

space accepts a space character.

(run space " ")  ;; \space

tab accepts a tab character.

(run tab "\t")  ;; \tab

digit accepts a decimal digit character \0 to \9.

(run digit "9")  ;; \9

hex-digit accepts a hex digit character \0 to \9, \A, \B, \C, \D, \E, \F.

(run hex-digit "F")  ;; \F

oct-digit accepts an octal digit character \0 to \7.

(run oct-digit "7")  ;; \7

alpha-num accepts any letter or digit.

(run (many alpha-num) "9A")  ;; [\9 \A]

sym* accepts a specific character.

(run (sym* \x) "x")  ;; \x

sym- accepts a specific character; not case-sensitive. Note that you get the character specified to sym-, not the one from the input.

(run (sym- \x) "X")  ;; \x

token* parsers a specific sequence of characters, not necessarily delimited. If multiple target sequences are given they're tried in turn until one succeeds or the parser fails.

(run (token* "foo") "football")  ;; "foo"
(run (token* "foo" "bar" "baz") "bazaar")  ;; "baz"

token- is a version of token* that is not case-sensitive.

(run (token- "foo") "FOOTBALL")  ;; "foo"
(run (token- "foo" "bar" "baz") "BaZaAr")  ;; "baz"

word* works like token* but lets you specify what not to accept as delimiters.

(run (word* (one-of* "|-/") "foobar") "foobar*")  ;; "foobar"
(run (word* (one-of* "|-/") "football" "foobar") "foobar*")  ;; "foobar"
(run (word* (one-of* "|-/") "foobar") "foobar/")
;; line 1 column 8
;; unexpected /
;; expecting end of foobar

word- is a version of word* that is not case-sensitive.

(run (word- (one-of* "|-/") "foobar") "FooBar*")  ;; "foobar"
(run (word- (one-of* "|-/") "football" "foobar") "FOOBAR*")  ;; "foobar"
(run (word- (one-of* "|-/") "foobar") "FoObAr/")
;; line 1 column 8
;; unexpected /
;; expecting end of foobar

one-of* accepts one of the characters in the supplied string.

(run (one-of* "xyz") "zap")  ;; \z

none-of* accepts any character other than those in the supplied string.

(run (none-of* "xyz") "foo")  ;; \f

new-line* succeeds if the next character is \n.

(run new-line* "\n/**/") ;; \newline

eof succeeds if the input sequence is empty.

(run (<*> letter digit eof) "U2") ;; [\U \2 nil]

field* parses an unquoted string delimited by any character in the supplied string.

(run (field* ",;") "California,") ;; "California"

split-on splits the input text on whitespace or any of the characters in the supplied string.

(run (split-on ",./") "Now, is the time")  ;; ["Now" "is the time"]

split splits the input text on whitespace.

(run split "Now is the time")  ;; ["Now" "is" "the" "time"]