Skip to content

fix(DATAGO-123655): Implement a fix to prevent LLM from embedding math in templates#932

Open
amir-ghasemi wants to merge 4 commits intomainfrom
amir/fix-template
Open

fix(DATAGO-123655): Implement a fix to prevent LLM from embedding math in templates#932
amir-ghasemi wants to merge 4 commits intomainfrom
amir/fix-template

Conversation

@amir-ghasemi
Copy link
Collaborator

This pull request introduces robust protection for custom embed directives (like «math:... | ...») within Liquid templates, preventing conflicts between embed syntax and Liquid's own template syntax. The changes ensure that embed directives are temporarily replaced with placeholders before Liquid rendering, then restored afterward. Comprehensive unit and integration tests are added to verify correctness and safety, including protection against regex-based denial-of-service (ReDoS) attacks.

Embed protection and template rendering:

  • Added _protect_embeds_from_liquid and _restore_embeds_after_liquid functions in liquid_renderer.py to safeguard embed directives from Liquid processing by replacing them with placeholders before rendering and restoring them afterward. This prevents Liquid from misinterpreting embed syntax (such as | and {{ }}) as its own.
  • Updated render_liquid_template to use these protection functions, ensuring that embed directives are preserved and not corrupted during template rendering.

Testing and safety:

  • Added a new test module test_liquid_renderer_embed_protection.py with unit and integration tests for the embed protection logic, covering scenarios with single/multiple embeds, embeds containing Liquid syntax, round-trip protection/restoration, and templates with/without embeds.
  • Included tests for regex safety to prevent ReDoS attacks, verifying that the regex used for embed detection is efficient, respects length limits, and handles edge cases like unclosed or nested delimiters.

@amir-ghasemi amir-ghasemi requested a review from efunneko January 30, 2026 18:36
@sonarqube-solacecloud
Copy link

@amir-ghasemi amir-ghasemi changed the title fix(DATAGO-123655): Implement a temporary fix to prevent LLM from embedding math in templates fix(DATAGO-123655): Implement a fix to prevent LLM from embedding math in templates Jan 30, 2026
Copy link
Collaborator

@efunneko efunneko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is here is good, but it only fixes half the problem. I have seen many instances of:

«math: {{row[1]}} | .2f»

or something like that, where there is a templated value within the math embed. I am not sure if we can 100% solve this. It is like the placeholders need to be something like:

_PLACEHOLDER({{row[1]}})1

And then the replacement is:

«math: $1 | .2f»

Or something like that where the value substituted by the template is replaced within the math embed.

An alternative might be to have another format separator character rather than "|". I think that this is safe to do because math embeds are 'early' embeds that are resolved before we save the artifacts, so we don't need to worry about backwards compatibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants