-
Notifications
You must be signed in to change notification settings - Fork 15
Getting Started
Parsers are high-order functions that are applied indirectly to the input text. We start with various ways to execute primitive parsers using functions designed for interactive development.
(use 'blancas.kern.core)
Loading the library with use
makes it easier if you'll be typing. Now let's parse a digit:
(run digit "123")
;; \1
run
prints the result of the parsing; the rest of the input is left unprocessed. You can look at the parser state with the run*
function:
(run* digit "123")
;; {:input (\2 \3),
;; :pos {:src "", :line 1, :col 2},
;; :value \1,
;; :ok true,
;; :empty false,
;; :user nil,
;; :error nil}
run*
prints the parser state after the parser digit
was applied. The fields are as follows:
input | the input text as a sequence |
pos | the parser's position within the input |
value | the result of the parsing |
ok | whether the parsing was successful |
empty | whether the parser consumed any input |
user | any arbitrary client data |
error | a list of error-message records |
The full form of both run
functions take an additional name for the input and a value, for use by a custom parser.
(run* digit "123" "repl" {:foo 1 :bar 2})
;; {:input (\2 \3),
;; :pos {:src "repl", :line 1, :col 2},
;; :value \1,
;; :ok true,
;; :empty false,
;; :user {:foo 1, :bar 2},
;; :error nil}
Functions runf
and runf*
perform similar duties but with input coming from a file.
$ cat file
abc
(runf letter "file")
;; \a
(runf* letter "file")
;; {:input (\b \c \newline),
;; :pos {:src "file", :line 1, :col 2},
;; :value \a,
;; :ok true,
;; :empty false,
;; :user nil,
;; :error nil}
Instead of taking a string, these functions take a filename and an optional encoding. The default encoding is UTF-8, so the above runf
call is the same as:
(runf letter "file" "UTF-8")
;; \a
If something goes wrong run
will display one or more error messages.
(run (<*> letter letter digit) "abc")
;; line 1 column 3
;; unexpected \c
;; expecting digit
The combinator <*>
applies the given parsers in sequence and returns a vector with the corresponding results. This call failed because the letter c doesn't match the parser digit
. The proper input fixes the problem.
(run (<*> letter letter digit) "os8")
;; [\o \s \8]
The string abc may be parsed whole or per-character as needed.
(run (many letter) "abc")
;; [\a \b \c]
(run (token* "abc") "abc")
;; "abc"
The combinator many
will apply the given parser zero or more times on the input for as long as it succeeds. The parser token*
expects to find the given string as a prefix of the input, whether or not the string is a whole word.
Function parse
works like run
but returns the parser's state record. Its companion function parse-file
takes a filename and an optional encoding string (which also defaults to UTF-8). Checking the :ok
field is the proper way to determine success or failure. Use the :value
selector to get the parser's result.
(let [st (parse (token* "abc") "abc")]
(when (:ok st) (str "got " (:value st))))
;; "got abc"
(let [st (parse-file (token* "abc") "file")]
(when (:ok st) (str "got " (:value st))))
;; "got abc"
Function print-error
takes the parser state record and prints error messages like run
does. The result of this function may also be stored into a string with the function with-out-str
.
(let [st (parse digit "x")]
(when-not (:ok st)
(print-error st)))
;; line 1 column 1
;; unexpected \x
;; expecting digit
Function value
works similarly to parse
but returns the :value
field of the resulting state record. If there is an error it returns nil. There's no similar function for file input, but you can just do (value p (slurp "file"))
.
(value (many letter) "qwerty")
;; [\q \w \e \r \t \y]
With regard to data as a string or data files, Kern will do more work that necessary as it keeps track of token locations and possible error diagnostics. With text that is strictly serialized data, as opposed to a language-like input, we may want to treat its parsing as just pass or fail, and diagnose later as needed, in exchange for a considerable performance improvement, from twice as fast to sometimes several times faster.
parse-data
is a function that works like parse
but whose diagnostic is limited to the value of the :ok
field. Use it for serialized data that most of the time won't generate parsing errors. Here we use code from the JSON Sample:
(load "json")
(ns json)
(parse-data jvalue (slurp "src/main/resources/tweet.json"))
;; #blancas.kern.core.PState{ ... }
parse-data-file
is a function that works like parse-file
but whose diagnostic is limited to the value of the :ok
field. Use it for serialized data that most of the time won't generate parsing errors.
(parse-data-file jvalue "src/main/resources/tweet.json")
;; #blancas.kern.core.PState{ ... }