Skip to content

Getting Started

Armando Blancas edited this page Aug 23, 2017 · 10 revisions

Parsers are high-order functions that are applied indirectly to the input text. We start with various ways to execute primitive parsers using functions designed for interactive development.

(use 'blancas.kern.core)

Loading the library with use makes it easier if you'll be typing. Now let's parse a digit:

(run digit "123")
;; \1

run prints the result of the parsing; the rest of the input is left unprocessed. You can look at the parser state with the run* function:

(run* digit "123")
;; {:input (\2 \3),
;;  :pos {:src "", :line 1, :col 2},
;;  :value \1,
;;  :ok true,
;;  :empty false,
;;  :user nil,
;;  :error nil}

run* prints the parser state after the parser digit was applied. The fields are as follows:

input the input text as a sequence
pos the parser's position within the input
value the result of the parsing
ok whether the parsing was successful
empty whether the parser consumed any input
user any arbitrary client data
error a list of error-message records

The full form of both run functions take an additional name for the input and a value, for use by a custom parser.

(run* digit "123" "repl" {:foo 1 :bar 2})
;; {:input (\2 \3),
;;  :pos {:src "repl", :line 1, :col 2},
;;  :value \1,
;;  :ok true,
;;  :empty false,
;;  :user {:foo 1, :bar 2},
;;  :error nil}

Functions runf and runf* perform similar duties but with input coming from a file.

$ cat file
abc

(runf letter "file")
;; \a

(runf* letter "file")
;; {:input (\b \c \newline),
;;  :pos {:src "file", :line 1, :col 2},
;;  :value \a,
;;  :ok true,
;;  :empty false,
;;  :user nil,
;;  :error nil}

Instead of taking a string, these functions take a filename and an optional encoding. The default encoding is UTF-8, so the above runf call is the same as:

(runf letter "file" "UTF-8")
;;  \a

If something goes wrong run will display one or more error messages.

(run (<*> letter letter digit) "abc")
;; line 1 column 3
;; unexpected \c
;; expecting digit

The combinator <*> applies the given parsers in sequence and returns a vector with the corresponding results. This call failed because the letter c doesn't match the parser digit. The proper input fixes the problem.

(run (<*> letter letter digit) "os8")
;; [\o \s \8]

The string abc may be parsed whole or per-character as needed.

(run (many letter) "abc")
;; [\a \b \c]
(run (token* "abc") "abc")
;; "abc"

The combinator many will apply the given parser zero or more times on the input for as long as it succeeds. The parser token* expects to find the given string as a prefix of the input, whether or not the string is a whole word.

Function parse works like run but returns the parser's state record. Its companion function parse-file takes a filename and an optional encoding string (which also defaults to UTF-8). Checking the :ok field is the proper way to determine success or failure. Use the :value selector to get the parser's result.

(let [st (parse (token* "abc") "abc")]
  (when (:ok st) (str "got " (:value st))))
;; "got abc"
(let [st (parse-file (token* "abc") "file")]
  (when (:ok st) (str "got " (:value st))))
;; "got abc"

Function print-error takes the parser state record and prints error messages like run does. The result of this function may also be stored into a string with the function with-out-str.

(let [st (parse digit "x")]
  (when-not (:ok st)
    (print-error st)))
;; line 1 column 1
;; unexpected \x
;; expecting digit

Function value works similarly to parse but returns the :value field of the resulting state record. If there is an error it returns nil. There's no similar function for file input, but you can just do (value p (slurp "file")).

(value (many letter) "qwerty")
;; [\q \w \e \r \t \y]

Parsing Data

With regard to data as a string or data files, Kern will do more work that necessary as it keeps track of token locations and possible error diagnostics. With text that is strictly serialized data, as opposed to a language-like input, we may want to treat its parsing as just pass or fail, and diagnose later as needed, in exchange for a considerable performance improvement, from twice as fast to sometimes several times faster.

parse-data is a function that works like parse but whose diagnostic is limited to the value of the :ok field. Use it for serialized data that most of the time won't generate parsing errors. Here we use code from the JSON Sample:

(load "json")
(ns json)
(parse-data jvalue (slurp "src/main/resources/tweet.json"))
;; #blancas.kern.core.PState{ ... }

parse-data-file is a function that works like parse-file but whose diagnostic is limited to the value of the :ok field. Use it for serialized data that most of the time won't generate parsing errors.

(parse-data-file jvalue "src/main/resources/tweet.json")
;; #blancas.kern.core.PState{ ... }
Clone this wiki locally