fix(DATAGO-123655): Implement a fix to prevent LLM from embedding math in templates#932
fix(DATAGO-123655): Implement a fix to prevent LLM from embedding math in templates#932amir-ghasemi wants to merge 4 commits intomainfrom
Conversation
|
efunneko
left a comment
There was a problem hiding this comment.
What is here is good, but it only fixes half the problem. I have seen many instances of:
«math: {{row[1]}} | .2f»
or something like that, where there is a templated value within the math embed. I am not sure if we can 100% solve this. It is like the placeholders need to be something like:
_PLACEHOLDER({{row[1]}})1
And then the replacement is:
«math: $1 | .2f»
Or something like that where the value substituted by the template is replaced within the math embed.
An alternative might be to have another format separator character rather than "|". I think that this is safe to do because math embeds are 'early' embeds that are resolved before we save the artifacts, so we don't need to worry about backwards compatibility




This pull request introduces robust protection for custom embed directives (like
«math:... | ...») within Liquid templates, preventing conflicts between embed syntax and Liquid's own template syntax. The changes ensure that embed directives are temporarily replaced with placeholders before Liquid rendering, then restored afterward. Comprehensive unit and integration tests are added to verify correctness and safety, including protection against regex-based denial-of-service (ReDoS) attacks.Embed protection and template rendering:
_protect_embeds_from_liquidand_restore_embeds_after_liquidfunctions inliquid_renderer.pyto safeguard embed directives from Liquid processing by replacing them with placeholders before rendering and restoring them afterward. This prevents Liquid from misinterpreting embed syntax (such as|and{{ }}) as its own.render_liquid_templateto use these protection functions, ensuring that embed directives are preserved and not corrupted during template rendering.Testing and safety:
test_liquid_renderer_embed_protection.pywith unit and integration tests for the embed protection logic, covering scenarios with single/multiple embeds, embeds containing Liquid syntax, round-trip protection/restoration, and templates with/without embeds.