Skip to content

Fix column definition COLLATE parsing #1986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mvzink
Copy link
Contributor

@mvzink mvzink commented Jul 30, 2025

Since COLLATE has somewhat special parsing (not handled as an infix operator, since the right-hand side is not really an expression), it also needs special handling to not allow parsing it in column definitions, where it should instead be interpreted as a column option.

Comment on lines 1251 to 1255
// We would have exited early in `parse_prefix` before checking for `COLLATE`, and there's
// no infix operator handling for `COLLATE`, so we must return now.
if self.in_column_definition_state() && self.peek_keyword(Keyword::COLLATE) {
return Ok(expr);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I followed this requirement here, if we have no infix handling then i would imagine that this shouldn't be needed (we would end up returning the same expr)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no infix handling for a given token, and we try to parse_infix anyway, we get a No infix parser for token error.

In practice, we avoid this (I would say accidentally) for all dialects other than PostgreSQL, because the default precedence of COLLATE is 0. In PostgreSQL, it is 120.

Consider column options DEFAULT 'foo' COLLATE 'en-US' Without this early return, after parsing 'foo', we will flow through to checking the precedence of the next token. By default, it is 0, which is <= the current precedence (also 0), so we break and return 'foo'. But for PostgreSQL, it will be 120, and we will flow into the infix parsing (i.e. treating COLLATE as an infix operator, which we don't handle because technically it's not).

The result is this:

    2025-08-01T16:56:30.976Z DEBUG [sqlparser::parser] prefix: Value(ValueWithSpan { value: SingleQuotedString("foo"), span: Span(Location(0,0)..Location(0,0)) })
    2025-08-01T16:56:30.976Z DEBUG [sqlparser::dialect::postgresql] get_next_precedence() TokenWithSpan { token: Word(Word { value: "COLLATE", quote_style: None, keyword: COLLATE }), span: Span(Location(0,0)..Location(0,0)) }
    2025-08-01T16:56:30.976Z DEBUG [sqlparser::parser] next precedence: 120
    2025-08-01T16:56:30.976Z DEBUG [sqlparser::parser] infix: TokenWithSpan { token: Word(Word { value: "COLLATE", quote_style: None, keyword: COLLATE }), span: Span(Location(0,0)..Location(0,0)) }

    thread 'test_parse_default_with_collate_column_option' panicked at src/test_utils.rs:157:61:
    CREATE TABLE foo (abc TEXT DEFAULT 'foo' COLLATE 'en_US'): ParserError("No infix parser for token Word(Word { value: \"COLLATE\", quote_style: None, keyword: COLLATE })")

I am not 100% this special case is the best way to fix this, but in my understanding it is necessary so long as we have special handling for COLLATE in parse_prefix; and that, in turn, is necessary so long as we don't parse the RHS as an expression.

I could experiment with treating COLLATE as an infix operator, but I don't really know how that would go; at least PostgreSQL and MySQL don't allow anything other than a single collation name in the righthand side.

@mvzink mvzink force-pushed the push-wsrmmxykuvkn branch from d387576 to 443ac42 Compare August 1, 2025 17:10
Since `COLLATE` has somewhat special parsing (not handled as an infix
operator, since the right-hand side is not really an expression), it
also needs special handling to not allow parsing it in column
definitions, where it should instead be interpreted as a column option.
@mvzink mvzink force-pushed the push-wsrmmxykuvkn branch from 443ac42 to 112be4e Compare August 1, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants