-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Readtables back #535
Comments
Ah, thought I had already made an issue for this. Indeed, they are missing and we need to add them back. We had them pre-1.0 (thanks @jlongster!) but then I rewrote the reader. |
Could we use readtables to make sweetjs whitespace aware? Say for instance like coffescript or ruby? |
That'd be really, really, really hard. |
Readtables are more intended for a case where you want to add a new construct to an existing language with minimal changes. An indentation-based syntax is a drastic change to the language so you really need to implement a parser for that. |
I implemented indent blocks in 0.7. The table used a simple regular expression to track the flow of indentation. The syntax was:
Unfortunately, outdenting two levels at once was blocked by an issue related to nested braces. |
@disnet are you looking to have the same API as the old readtables? |
Not necessarily. |
I'm looking to extend the utility of |
So far I've come up with a primitive api that's similar to that of lexeme ~> = function(reader) {
reader.key(); // ~>
reader.read(); // eats 'skipMe'. uses default reader behavior
// creates a reset point that is consumed when .reset is called.
// other possible names: shift, savePos, checkpoint
reader.mark();
const delim = reader.next().value // [
let brackets;
if (delim.isLeftBracket()) {
reader.reset();
brackets = reader.read('Delimiter');
} else {
// throw
}
return `thread { ${brackets.inner()} }`; // .inner returns another reader instance
}
foo ~> skipMe [bar, baz(1)] // foo thread { bar, baz(1) } Implicitly, the current readtable is being extended. I do have questions about using modules at read time. Are syntactic extensions and other libraries available at read time? import m from './m' for reader |
Things I liked about the old API are mostly whitespace-related: the ability to skip whitespace without gobbling up the next token, access to the full source code, access to the current index, access to the current line number. A read-phase equivalent of |
I disagree. The above doesn't display the full API I have in mind. I was thinking of moving many of the 0.7 readtable utility functions to I'm also not sure why you'd need access to the entire source string. Your coffeetable implementation doesn't seem to use anything in the source before the triggering character anyway. I know I haven't thought things all the way through yet and I'd like to hear more useful criticism. |
mhmm, I distinctly remember having a use case for access to the source code, but I'll have to get back to you on that. Also, I didn't mean to say that a |
My mistake. It looks like you used it to look behind one character for ellipsis. As with syntax extensions, I think we'll need limited lookbehind. I agree about consolidation. |
What were the extensions to |
I was thinking const readtable = [];
readtable.push({
token: "~>",
read: function(reader) {
// ...
}
});
export default readtable; But I'm guessing that doing multiple expansion passes so that readers could use syntax macros could be a lot of overhead. As far as const hm = #{ #foo 'foo', #bar 'bar' };
#foo(hm); // 'foo' as well as some sort of lens literal: const hm = #{ #foo 'foo', #bar #{ #baz 'baz' } };
const l = #bar/baz; // maybe #/bar/baz, #bar/baz/ or #/bar/baz/
l(hm); // 'baz' These could potentially work on regular js objects as well const obj = {foo: 'foo', bar: {baz: 'baz'}};
#bar/baz(obj); // 'baz' |
Oh yeah, Have you started hacking on readtables? If not I might start; I have a few other things I want to clean up in the reader anyway. |
I started working from the API back (so the reader that's passed to the transform function). But I haven't started integrating it yet. I wanted to wait on answers to the details. Can readtables make use of syntactic extensions? I guess I could start by just wedging them in there via CLI options and not worry about the other questions for now. |
Is it feasible to implement |
What were your thoughts for cleaning up the reader? |
Let's say no for now, can think about this later.
My original intention was for them to be part of the
Yes definitely, syntax templates should definitely be implementable via a readtable.
I just pushed a branch with some doodling I was doing the other day. Feel free to take or ignore as you see fit. Main things I wanted to do was pull in the tokenizer directly so we can own changes to it rather than rely on the shift parser dep. I was also thinking of changing the Also, I would love if the |
@disnet If you're considering removing the parser dependency anyway I've been thinking about a more radical idea. Now it'd be a bunch of monkey work but what if we implement the reader to delegate completely to a readtable? A table entry could have a predicate (character sequence, regex or function) to match then next token paired with a tokenizer function. The function could return either a string or a token. If it's a string, the reader splices it back into the source and continues from the index before the function was called. Otherwise, the source is consumed and the token is added to the stack. Another possibility is to have either a tokenizer or a transform function in the table entry. Now I haven't thought out all of the implications but the idea keeps coming back to me. |
Most of the work would be in pulling apart the current tokenizer and populating the base readtable with it. |
Yes absolutely! |
@disnet we previously discussed limiting the horizon of macro contexts #537 (comment) What are your thoughts on the same question for reader macros? My understanding is that pre 1.0 readtables allowed access to the entire source of a file. Do we still want everything to be visible? |
@gabejohnson yeah, they definitely should have access to everything. A macro needs to play nice with other macros (and binding forms) so they must be limited but a reader is low-level and get's to do whatever it wants. |
@disnet I'm currently favoring using a zipper as the reader abstraction copying some of the This would give the user a great deal of flexibility; allowing them to transform the matched character sequence in a number of ways: In this example the sequence is replaced with a token. The actual {
matcher: '|>',
tokenizer: (rdr) => rdr.replace({
type: TokenType.IDENTIFIER,
value: 'PIPE_MACRO'
})
} Here, one substring is replaced with another and {
matcher: '|>',
tokenizer: (rdr) => {
rdr = rdr.replace('PIPE_MACRO');
return rdr.left();
}
} The following two examples illustrate using a function/method to remove boilerplate: {
matcher: '|>',
tokenizer: (rdr) => rdr.fromString('PIPE_MACRO')
} Although more convenient, I'm hesitant to include {
matcher: '|>',
tokenizer: (rdr) => ReaderUtils.fromString(rdr, 'PIPE_MACRO');
} I'd particularly like feedback from you @elibarzilay and @jlongster as, I believe, you are both quite familiar with reader macros. |
Another possibility is to have the {
matcher: '|>',
tokenizer: (rdr) => 'PIPE_MACRO'
} or even {
matcher: '|>',
tokenizer: 'PIPE_MACRO'
} |
@gabejohnson 👍 on your plans so far. I think racket has a way of specifying modes for each mapping: terminating vs non-terminating macros, deferring a char to another mapping etc. Does your current design have something like modes? Probably not critical for the first version but just want to make sure the API will have an obvious extension point. |
@disnet I suppose modes in my scheme (no pun intended) are somewhat implicit. For instance, the last example above shows a string as the value of the table entry Terminating macros would be implicit in that the author of the entry can do whatever they want once they match say I honestly hadn't given any thought to non-terminating macros. Though you could write a We could certainly make some convenience functions/methods in the future. In fact maybe I should start w/ |
So I've been thinking about your proposed API a bit more and have become more convinced that the API should be as low-level as possible (this of course applies to everything we put into core Sweet not just readtables). It doesn't have to be nice as long as it is possible to build nice things on top of it. In particular I don't think you want table entries to be strings at all. A char code should be sufficient since As with everything in Sweet we want to let a thousand APIs bloom. I'm still not entirely following how the zipper API works for readtables. Could you explain how zippers work in this context? It seems to me the core things you need to be able to do inside the
Maybe the zipper API you have in mind does all that but I'm not following yet :) |
Restricting As an aside, I was thinking of a scheme to use tries to match keywords and then cache identifiers in the trie was well. Just a thought. Now let me address the zipper API: A
I haven't decided what happens at EOS.
I haven't decided what happens if there are no more tokens to the left. Clojure would return
None of these traversal methods return nodes/characters. The all return a new, immutable focus. That's the traversal API. I do have a slight concern that user's intuition might be violated since |
Now for the node access and tree modification API:
Optionally other zipper methods could be implemented, but I think everything else could be written as utilities (probably overlooking something though).
|
Also, I forgot about
|
As for the token constructors, I've just been hand rolling them, but I see what you mean. Maybe the focus is a property on the reader and you can also call I'll grant you that with only I'd like to easily be able to create a delimiter, jump back/forward in the tree, insert it and return to the "current" position. |
If you used zippers you'd be able to do |
I don't see how. You would just have entries for each char code in
Should be faster than an arbitrary length string match right? Lot of repetition for each entry but we have macros for that :) Or maybe just have a way to declare a range of charcodes? Your proposed zipper API is appealing but can you build it on top of a simpler set of primitives? Strawman proposal cribbed from Racket:
To be clear, I'm not saying we shouldn't do the zipper API; I just want to build the zipper API on top of a set of primitives. Side note: if I understand correctly the zipper API allows you to insert tokens at arbitrary locations in the already read stream. That concerns me since that is combining two concerns: tokenization (what tokens are) and token stream rewriting (where tokens go). Readtables should only be about the former as that allows them to be (relatively) composable (ie mutually unaware readtable authors can both extend the base reader and things mostly work out). |
I don't understand this |
This was based on an (arguably bad) assumption that matching and tokenizing would be separate concerns. Though I suppose nothing is gained by this if a null result from The complexity remarks were based on a complecting of tokenization and rewriting. I suppose it's actually O(nm). If there are n characters in the source then constructing a zipper is O(n) in the worst case and there are m readtable lookups. If there is a zipper created upon each lookup this is O(nm). Obviously, separating them changes things. Now I would argue that creating the token tree is already complecting tokenization and rewriting. But it is a simple transform and makes sense to do at the same time. |
Are you suggesting that there could be a separate token stream rewrite phase? Or simply that it shouldn't be allowed? |
Though I suppose the rewrite phase is during enforestation. There are still things that I would like to do that appear to be difficult w/o moving tokens around arbitrarily. Maybe I'm overlooking something. Suppose I wanted to write a language that looked like the following: #lang whatever
export {
foo: x => ...,
bar: (x, y) => ...,
}
local {
baz: x => ...,
qux: (x, y, z) => ...,
}
import {
quux from './quux',
corge from './corge',
} this would translate to: import quux from './quux';
import corge from './corge';
const baz = x => ...;
const qux = (x, y, z) => ...;
export const foo = x => ...;
export const bar = (x, y) => ...; I can maybe define import { export, local, import } from './whatever' for syntax;
whatever
export {
foo: x => ...,
bar: (x, y) => ...,
}
local {
baz: x => ...,
qux: (x, y, z) => ...,
}
import {
quux from './quux',
corge from './corge',
} |
This is exactly what the lang pragma is all about :) Following Racket the lang pragma will have an API that allows you to install a readtable, set up new syntax and implicit form bindings, and in general rewrite the syntax stream. Your proposed zipper API would probably work really nicely here but I'm pretty sure we want to keep it separate from readtables. |
Sounds good! I'll start with the API you outlined above. I think I have a pretty clear idea of how to go about implementing most of it. The only mode I'm not sure of is |
@disnet for |
Yes indeed! Good catch. |
@disnet two questions:
It could either be a list or a stream of
At first I was thinking that returning |
Yes I think it should.
Actions need a way to consume from the |
You're right. In the fallback scenario I was going to have a special noop token, but if we're being explicit about passing down the action chain then |
Again, apologies for the vague issue, but:
The text was updated successfully, but these errors were encountered: