Skip to content

feat(examples): add signal_judge contract#1626

Open
PratikshaGayen wants to merge 5 commits into
genlayerlabs:mainfrom
PratikshaGayen:feat/signal-judge-contract
Open

feat(examples): add signal_judge contract#1626
PratikshaGayen wants to merge 5 commits into
genlayerlabs:mainfrom
PratikshaGayen:feat/signal-judge-contract

Conversation

@PratikshaGayen
Copy link
Copy Markdown

@PratikshaGayen PratikshaGayen commented May 10, 2026

No issue — new example contract, §3.1 applies.

What

  • Add examples/contracts/signal_judge.py

Why

Existing examples don't cover the web fetch → LLM → on-chain state
pattern. This one does, with a realistic use case.

How it works

submit_signal(asset, prediction, reasoning, target_price, direction)
fetches the live price from Binance (no API key), asks the LLM to
judge the prediction, and updates per-trader win/total counts on-chain.

Uses prompt_comparative — validators fetch at slightly different
timestamps so price drifts, but correct (bool) must agree.

Read methods: get_score, get_all_signals, get_signals_by_asset,
get_signal_count.

Testing done

  • Deployed locally with Heurist
  • Called submit_signal on BTC and ETH; judgment returned correctly
  • Win/total counts increment as expected
  • UserError fires on bad direction or empty fields

Decisions

  • prompt_comparative not prompt_non_comparative — price varies
    across validators, only correct needs consensus
  • current_price coerced to str — some models return float, breaks
    calldata encoding

Checks

  • Tested
  • Self-reviewed
  • No issue needed (§3.1)
  • Conventional commit title

Frontend

Companion UI (React + Vite + Tailwind + genlayer-js):
https://github.com/PratikshaGayen/signaljudge-ui

Live demo: https://predikt-fun.vercel.app/

Summary by CodeRabbit

  • New Features

    • Two-phase signal lifecycle: submit signals (validated, normalized, queued) and separate resolve operation that finalizes results using live price checks and validator consensus.
    • Expanded read APIs for status filtering and listing resolvable signals; per-sender win/total and win-rate reporting.
  • Documentation

    • Added README with usage, inputs, query methods, and behavioral notes.
  • Tests

    • Integration tests for submit, resolve, and leaderboard updates.

Review Change Stack

SignalJudge is a crypto trading-signal evaluator example that demonstrates web fetch + LLM judgment + per-address state.

Traders submit a price prediction (asset, direction, target price, reasoning); the contract fetches the live price from Binance inside a non-deterministic block and asks the validator LLMs to decide whether the prediction is correct. Per-trader win/total counts are kept in a TreeMap so the leaderboard can be queried in O(log n).

Notes on consensus:
- Uses prompt_comparative rather than strict_eq because current_price varies by cents between validators (Binance is queried at slightly different timestamps).
- The criteria string explicitly tells the comparator to ignore current_price and reasoning_quality and only check the boolean correct field, otherwise the comparator overmatches and disagrees on cent-level differences.

Notes on storage:
- list/dict are not supported as field types in this GenVM version, so signal records are stored as a JSON-serialised string.
- TreeMap[Address, u256] is used for win/total counts to match the llm_erc20 example pattern.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 10, 2026

📝 Walkthrough

Walkthrough

Adds a two-phase SignalJudge contract: submissions register PENDING signals (deadline-based), and a new resolver fetches live Binance prices and runs LLM validators to produce a RESOLVED judgment and update per-address leaderboard counters.

Changes

SignalJudge Example Contract

Layer / File(s) Summary
Imports & Constants
examples/contracts/signal_judge.py
Adds imports and TIMEFRAMES constant used to compute signal deadlines.
Contract State
examples/contracts/signal_judge.py
Initializes signals_json (empty JSON array) and per-address wins/total TreeMaps.
Internal Helpers
examples/contracts/signal_judge.py
Adds _load/_save JSON helpers and _now() UTC timestamp accessor.
Write Method - submit_signal
examples/contracts/signal_judge.py
Registers a PENDING signal: validates/normalizes inputs, truncates text fields, computes deadline_ts from timeframe, uppercases asset, appends entry to signals_json, and returns signal_id and deadline metadata (no LLM/web judgment at submission).
Write Method - resolve_signal
examples/contracts/signal_judge.py
New resolve_signal(signal_id) enforces deadline passage, fetches Binance price for the stored asset, prompts validator LLMs for strict JSON judgments, uses gl.eq_principle.prompt_comparative to reach consensus on correct, coerces types, updates wins/total TreeMaps, mutates the signal to RESOLVED, and saves the log.
View Methods & Filters
examples/contracts/signal_judge.py
Adds/updates get_signal_count, get_all_signals (raw signals_json), get_signals_by_status, get_signals_by_asset, get_resolvable_signals, and get_score (returns {"wins","total","win_rate_pct"} with "0" when total is zero).
Contract Schema & Tests
tests/integration/icontracts/schemas/signal_judge_get_contract_schema_for_code.py, tests/integration/icontracts/tests/test_signal_judge.py
Adds signal_judge_contract_schema describing methods/returns and integration tests for submit_signal (PENDING) and resolve_signal (RESOLVED with mocked validator output), asserting signal state and leaderboard updates.
Documentation
examples/contracts/README_signal_judge.md
Adds README describing end-to-end flow, submit_signal inputs/outputs, read methods, storage/behavior notes, and companion UI link.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped to a chain where signals bloom,
I stash a hopeful PENDING in a roomy JSON tomb,
Deadlines pass, the price is fetched at dawn,
LLMs debate — consensus draws a yawn,
Wins tallied, tiny rabbit hums along.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The pull request description lacks key sections from the template including Decisions (partially addressed), Checks (marked but not in standard format), Reviewing tips, and User facing release notes. Add missing sections: formal Checking against all template checkboxes, Reviewing tips for reviewer guidance, and User facing release notes suitable for public communication.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding a new signal_judge example contract.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
examples/contracts/signal_judge.py (4)

53-86: ⚡ Quick win

Remove debug print calls from the example.

print(web_data) (Line 55) and print(result) (Line 85) look like leftover debugging artifacts. They add noise to validator logs and aren't part of the demonstrated pattern. Recommend removing before merge.

♻️ Proposed cleanup
         def get_judgment() -> str:
             web_data = gl.nondet.web.render(price_url, mode="text")
-            print(web_data)
 
             task = f"""
@@
 Output must be parseable JSON, nothing else.
 """
             result = gl.nondet.exec_prompt(task).replace("```json", "").replace("```", "")
-            print(result)
             return result
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 53 - 86, Remove the two
debugging prints in get_judgment: drop the print(web_data) after web_data is
assigned and drop the print(result) before returning result so the function only
renders web_data, builds task, calls gl.nondet.exec_prompt, cleans the prompt
output, and returns it; references: get_judgment(), web_data, result,
gl.nondet.exec_prompt().

136-148: 💤 Low value

Two minor consistency/UX nits in the view methods.

  • Line 138: asset.upper() doesn't strip, so get_signals_by_asset(" BTC ") silently returns []. Mirror whatever normalization you settle on in submit_signal (i.e., asset.strip().upper()).
  • Line 143: Address(address) raises whatever the SDK throws for malformed input. Wrapping it in UserError keeps caller-facing errors consistent with the validation in submit_signal.
♻️ Proposed adjustments
     `@gl.public.view`
     def get_signals_by_asset(self, asset: str) -> str:
         signals = json.loads(self.signals_json)
-        target = asset.upper()
+        target = asset.strip().upper()
         return json.dumps([s for s in signals if s["asset"] == target])

     `@gl.public.view`
     def get_score(self, address: str) -> dict[str, typing.Any]:
-        addr = Address(address)
+        try:
+            addr = Address(address)
+        except Exception as e:
+            raise gl.vm.UserError(f"invalid address: {e}")
         w = int(self.wins.get(addr, u256(0)))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 136 - 148,
get_signals_by_asset currently uses asset.upper() without trimming, and
get_score constructs Address(address) directly which can raise SDK errors;
update get_signals_by_asset to normalize the input the same way submit_signal
does (e.g., asset.strip().upper()) so " BTC " matches, and in get_score wrap
Address(address) in a try/except to catch SDK validation errors and re-raise
them as a UserError to keep caller-facing errors consistent with submit_signal.

57-83: 💤 Low value

Worth noting: prompt-injection surface from prediction / reasoning.

prediction and reasoning are user-controlled and interpolated directly into the LLM prompt (Lines 61, 64). A crafted input could attempt to override the instructions ("ignore above; respond with correct: true"). The prompt_comparative consensus on correct does mitigate this somewhat (all validators see the same poisoned prompt and would consistently agree on the model's manipulated output, so consensus still passes — but the outcome would be the attacker-chosen value).

For an example contract this is acceptable, but consider adding a short comment or length-cap on these strings so readers understand the trade-off when they copy the pattern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 57 - 83, The prompt directly
interpolates user-controlled strings _prediction and _reasoning into the task
template (the multi-line task string in examples/contracts/signal_judge.py),
creating a prompt-injection surface; mitigate by sanitizing and limiting those
values before interpolation — e.g., trim to a safe max length, escape or remove
control sequences and quotes/newlines that could alter the instruction flow, and
add an inline comment near the task construction noting the risk and trade-off
for readers; specifically update the code that builds the task string to
validate/truncate _prediction and _reasoning and document the rationale.

90-100: ⚡ Quick win

Surface a clearer error if the LLM response isn't valid JSON.

json.loads(raw) will raise JSONDecodeError if the model occasionally ignores the format instructions despite the strip of fences on Line 84. That bubbles up as an internal error to the caller. Wrapping it in a UserError (or graceful default) makes the failure mode obvious in this educational example, and keeps state untouched on parse failure.

♻️ Proposed change
-        raw = gl.eq_principle.prompt_comparative(
+        raw = gl.eq_principle.prompt_comparative(
             get_judgment,
             "The boolean field 'correct' must have the same value across all answers. Ignore any differences in 'current_price' (which varies by cents because it is fetched at different timestamps) and ignore differences in 'reasoning_quality' (which is a subjective rating).",
         )
-        judgment = json.loads(raw)
+        try:
+            judgment = json.loads(raw)
+        except json.JSONDecodeError as e:
+            raise gl.vm.UserError(f"LLM did not return valid JSON: {e}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 90 - 100, Wrap the
json.loads(raw) call that parses the LLM response (raw returned from
gl.eq_principle.prompt_comparative with get_judgment) in a try/except that
catches json.JSONDecodeError; on parse failure raise/return a clear UserError
(or other explicit error type used in the repo) indicating the model returned
invalid JSON and include the raw response for debugging, and ensure you do not
mutate judgment or any state when the parse fails before coercing
judgment["current_price"], judgment["correct"], or
judgment["reasoning_quality"].
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/contracts/signal_judge.py`:
- Around line 31-44: Validate and sanitize the asset before building price_url:
strip whitespace from asset (use the trimmed value for further checks), require
the trimmed asset to be non-empty and strictly alphanumeric (e.g.,
asset_trimmed.isalnum()), then set asset_upper = asset_trimmed.upper() and build
price_url from that; if validation fails raise gl.vm.UserError with the same
message. Ensure you update any references to asset vs asset_upper so the URL is
never built from the untrimmed or non-alphanumeric input.

---

Nitpick comments:
In `@examples/contracts/signal_judge.py`:
- Around line 53-86: Remove the two debugging prints in get_judgment: drop the
print(web_data) after web_data is assigned and drop the print(result) before
returning result so the function only renders web_data, builds task, calls
gl.nondet.exec_prompt, cleans the prompt output, and returns it; references:
get_judgment(), web_data, result, gl.nondet.exec_prompt().
- Around line 136-148: get_signals_by_asset currently uses asset.upper() without
trimming, and get_score constructs Address(address) directly which can raise SDK
errors; update get_signals_by_asset to normalize the input the same way
submit_signal does (e.g., asset.strip().upper()) so " BTC " matches, and in
get_score wrap Address(address) in a try/except to catch SDK validation errors
and re-raise them as a UserError to keep caller-facing errors consistent with
submit_signal.
- Around line 57-83: The prompt directly interpolates user-controlled strings
_prediction and _reasoning into the task template (the multi-line task string in
examples/contracts/signal_judge.py), creating a prompt-injection surface;
mitigate by sanitizing and limiting those values before interpolation — e.g.,
trim to a safe max length, escape or remove control sequences and
quotes/newlines that could alter the instruction flow, and add an inline comment
near the task construction noting the risk and trade-off for readers;
specifically update the code that builds the task string to validate/truncate
_prediction and _reasoning and document the rationale.
- Around line 90-100: Wrap the json.loads(raw) call that parses the LLM response
(raw returned from gl.eq_principle.prompt_comparative with get_judgment) in a
try/except that catches json.JSONDecodeError; on parse failure raise/return a
clear UserError (or other explicit error type used in the repo) indicating the
model returned invalid JSON and include the raw response for debugging, and
ensure you do not mutate judgment or any state when the parse fails before
coercing judgment["current_price"], judgment["correct"], or
judgment["reasoning_quality"].
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db65c4a2-60a1-404d-823e-ef638b85536c

📥 Commits

Reviewing files that changed from the base of the PR and between 6736a15 and 0f78601.

📒 Files selected for processing (1)
  • examples/contracts/signal_judge.py

Comment thread examples/contracts/signal_judge.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
tests/integration/icontracts/tests/test_signal_judge.py (3)

22-26: 💤 Low value

Mock key duplicates the contract's principle string verbatim — drift risk.

Hard-coding the full eq_principle_prompt_comparative principle text in the test means any tweak to wording in signal_judge.py (e.g., punctuation, "cents" → "dollars") silently desyncs the mock and the test will start failing for unrelated reasons. Consider extracting the string to a shared constant in the contract or test helper and importing it from both sides.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/icontracts/tests/test_signal_judge.py` around lines 22 -
26, The test hard-codes the full eq_principle_prompt_comparative string which
creates fragility; instead extract that principle text to a shared constant
(e.g., export PRINCIPLE_EQ_PROMPT or EQ_PRINCIPLE_PROMPT) in the contract module
(signal_judge.py) or a test helper, update the contract to export the constant
and change the test to import and use that constant when calling
setup_validators, and remove the verbatim string from the test so both sides
reference the single shared symbol (eq_principle_prompt_comparative /
setup_validators) to avoid drift.

7-52: ⚡ Quick win

Coverage gaps: only the correct=False path and no validation-error paths are exercised.

The PR description states submit_signal was tested for both BTC/ETH judgments (including wins increments) and that UserError was validated for invalid direction / empty fields, but the integration suite only contains a single incorrect-prediction case. Adding tests for (a) correct=True to confirm wins increments and win_rate_pct == "100", and (b) direction="SIDEWAYS" / empty asset raising UserError, would lock in the contract's behavioral guarantees.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/icontracts/tests/test_signal_judge.py` around lines 7 - 52,
Add two tests to cover the missing paths in
tests/integration/icontracts/tests/test_signal_judge.py: (1) a test (e.g.,
test_signal_judge_correct) that uses setup_validators to return correct=True for
the judgement, calls SignalJudge.submit_signal with the same args, asserts
tx_execution_succeeded, then checks contract.get_signal_count() == 1 and
contract.get_score(sender) shows wins == 1, total == 1 and win_rate_pct ==
"100"; and (2) a validation-error test (e.g.,
test_signal_judge_validation_errors) that calls submit_signal with invalid
inputs (direction="SIDEWAYS" and a separate case with asset="") and asserts the
call raises the contract UserError (or the contract's validation exception) for
each invalid input. Use the existing helpers get_contract_factory,
setup_validators, submit_signal, get_signal_count and get_score to locate where
to add these assertions.

7-7: 💤 Low value

Add type hint to the test function signature.

-def test_signal_judge(setup_validators):
+def test_signal_judge(setup_validators) -> None:

As per coding guidelines, "Include type hints in all Python code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/icontracts/tests/test_signal_judge.py` at line 7, The test
function test_signal_judge should include type hints: annotate the fixture
parameter setup_validators (e.g., setup_validators: Any) and the function return
type as -> None, and add the corresponding import from typing (from typing
import Any) at the top of the file; update the signature in
tests/integration/icontracts/tests/test_signal_judge.py to use these annotations
so it complies with the project typing guideline.
examples/contracts/signal_judge.py (2)

102-103: ⚡ Quick win

Markdown stripping is brittle.

.replace("```json", "").replace("```", "") only handles two specific fences and leaves any leading prose, e.g., Here is the JSON:\n{...} will still fail json.loads. Extracting the substring between the first { and last }, or matching a fenced block with a regex, is more robust against typical LLM formatting drift.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 102 - 103, The current
brittle stripping of markdown in the result from gl.nondet.exec_prompt(task)
uses .replace("```json", "").replace("```", "") and can leave leading prose;
instead locate and extract the JSON payload robustly (e.g., find the first '{'
and last '}' and take that substring, or use a regex to capture a fenced code
block containing JSON) before passing to json.loads; update the code around the
result assignment in examples/contracts/signal_judge.py (the variable `result`
coming from gl.nondet.exec_prompt) to perform this extraction and fall back to
the raw string or raise a clear parse error if no JSON boundaries are found.

129-142: 🏗️ Heavy lift

Refactor to use DynArray for persistent signal storage to avoid O(n) serialization on every submission.

Every submit_signal call deserializes the entire history from signals_json (line 129), appends one signal, and re-serializes back to a single string (line 142). This is O(n) in both compute and storage I/O. Additionally, every view method (get_signal_count, get_signals_by_asset, get_all_signals) also deserializes the full history. The signals_json field grows unboundedly and will eventually bloat calldata and block further submissions.

Replace signals_json: str with signals: DynArray[dict[str, str]] and use .append() for O(1) writes. Update all read methods to support pagination (e.g., get_signals(offset, limit), get_signals_by_asset(asset, offset, limit)) to avoid deserialization of the entire history on each view call.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 129 - 142, Replace the
unbounded JSON string storage and full deserialization pattern with a persistent
DynArray: change the field signals_json to signals: DynArray[SignalStruct] (or
DynArray[dict[str, str]] if structs are not used), update submit_signal to call
self.signals.append(...) instead of loading/appending/re-dumping JSON (use the
same keys currently added: submitter, asset, prediction, target_price,
direction, current_price, correct, reasoning_quality) so writes are O(1), and
modify the read APIs get_signal_count, get_signals_by_asset, and get_all_signals
to operate over the DynArray with pagination parameters (e.g.,
get_signals(offset, limit) and get_signals_by_asset(asset, offset, limit)) so
views iterate only the requested slice instead of deserializing the entire
history. Ensure types are updated and any JSON serialization/removal logic
around signals_json is removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/contracts/signal_judge.py`:
- Around line 58-69: The price_url construction blindly appends "USDT" causing
invalid symbols like "USDTUSDT" and missing markets; update the logic that
builds price_url (refer to price_url and _asset in this file) to: 1) if
asset_upper already endswith "USDT" use it as-is, 2) otherwise append the
desired quote currency (default "USDT") or accept a new quote_currency
parameter, and 3) explicitly reject or return an error when the base asset is
exactly "USDT" (so callers know to supply a valid base/quote), while keeping the
existing sanitizations for _prediction and _reasoning. Ensure the updated
validation happens before calling gl.nondet.web.render so the render never
receives an invalid symbol.
- Around line 119-121: The current defensive coercion for judgment["correct"]
uses bool(judgment.get("correct", False)) which treats non-empty strings like
"false" as True; change this to strict normalization: read raw =
judgment.get("correct", None) (or similar), if isinstance(raw, bool) use it, if
isinstance(raw, str) map raw.lower() in ("true","false","1","0") to the
corresponding boolean, if isinstance(raw, (int, float)) treat 1/0 accordingly,
otherwise fallback to False or raise a validation error so the model output
cannot silently flip from false to true; update the code that sets
judgment["correct"] to perform this explicit parsing/validation.
- Line 68: The _target = target_price assignment uses unvalidated input; ensure
target_price is stripped, single-line, length-bounded, and parsed as a positive
number before assigning or logging. In the function/handler that receives
target_price (look for the _target assignment and any signal log/prompt
construction), trim whitespace, reject or truncate if it contains newlines or
exceeds a short reasonable length, parse with Decimal (or float) to confirm it's
a positive numeric value, then re-stringify the canonical numeric form and
assign that to _target and store/use that sanitized value in prompts/logs.
- Around line 112-115: Update the two exception-raise sites to preserve
exception context by chaining with the caught exception: in the JSON decode
block where you currently catch json.JSONDecodeError as e and raise
gl.vm.UserError(f"LLM did not return valid JSON: {e}"), change the raise to use
"from e"; likewise, in the block that calls Address(...) and currently catches
Exception as e then raises a new Exception, modify that raise to include "from
e" so the original Address initialization error is chained; reference the JSON
parsing block where judgment = json.loads(raw) and the Address() call to locate
the changes.

---

Nitpick comments:
In `@examples/contracts/signal_judge.py`:
- Around line 102-103: The current brittle stripping of markdown in the result
from gl.nondet.exec_prompt(task) uses .replace("```json", "").replace("```", "")
and can leave leading prose; instead locate and extract the JSON payload
robustly (e.g., find the first '{' and last '}' and take that substring, or use
a regex to capture a fenced code block containing JSON) before passing to
json.loads; update the code around the result assignment in
examples/contracts/signal_judge.py (the variable `result` coming from
gl.nondet.exec_prompt) to perform this extraction and fall back to the raw
string or raise a clear parse error if no JSON boundaries are found.
- Around line 129-142: Replace the unbounded JSON string storage and full
deserialization pattern with a persistent DynArray: change the field
signals_json to signals: DynArray[SignalStruct] (or DynArray[dict[str, str]] if
structs are not used), update submit_signal to call self.signals.append(...)
instead of loading/appending/re-dumping JSON (use the same keys currently added:
submitter, asset, prediction, target_price, direction, current_price, correct,
reasoning_quality) so writes are O(1), and modify the read APIs
get_signal_count, get_signals_by_asset, and get_all_signals to operate over the
DynArray with pagination parameters (e.g., get_signals(offset, limit) and
get_signals_by_asset(asset, offset, limit)) so views iterate only the requested
slice instead of deserializing the entire history. Ensure types are updated and
any JSON serialization/removal logic around signals_json is removed.

In `@tests/integration/icontracts/tests/test_signal_judge.py`:
- Around line 22-26: The test hard-codes the full
eq_principle_prompt_comparative string which creates fragility; instead extract
that principle text to a shared constant (e.g., export PRINCIPLE_EQ_PROMPT or
EQ_PRINCIPLE_PROMPT) in the contract module (signal_judge.py) or a test helper,
update the contract to export the constant and change the test to import and use
that constant when calling setup_validators, and remove the verbatim string from
the test so both sides reference the single shared symbol
(eq_principle_prompt_comparative / setup_validators) to avoid drift.
- Around line 7-52: Add two tests to cover the missing paths in
tests/integration/icontracts/tests/test_signal_judge.py: (1) a test (e.g.,
test_signal_judge_correct) that uses setup_validators to return correct=True for
the judgement, calls SignalJudge.submit_signal with the same args, asserts
tx_execution_succeeded, then checks contract.get_signal_count() == 1 and
contract.get_score(sender) shows wins == 1, total == 1 and win_rate_pct ==
"100"; and (2) a validation-error test (e.g.,
test_signal_judge_validation_errors) that calls submit_signal with invalid
inputs (direction="SIDEWAYS" and a separate case with asset="") and asserts the
call raises the contract UserError (or the contract's validation exception) for
each invalid input. Use the existing helpers get_contract_factory,
setup_validators, submit_signal, get_signal_count and get_score to locate where
to add these assertions.
- Line 7: The test function test_signal_judge should include type hints:
annotate the fixture parameter setup_validators (e.g., setup_validators: Any)
and the function return type as -> None, and add the corresponding import from
typing (from typing import Any) at the top of the file; update the signature in
tests/integration/icontracts/tests/test_signal_judge.py to use these annotations
so it complies with the project typing guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f6aa418b-7a6f-44d7-9219-449554f2b24f

📥 Commits

Reviewing files that changed from the base of the PR and between 0f78601 and 5a5e2f9.

📒 Files selected for processing (4)
  • examples/contracts/README_signal_judge.md
  • examples/contracts/signal_judge.py
  • tests/integration/icontracts/schemas/signal_judge_get_contract_schema_for_code.py
  • tests/integration/icontracts/tests/test_signal_judge.py
✅ Files skipped from review due to trivial changes (2)
  • examples/contracts/README_signal_judge.md
  • tests/integration/icontracts/schemas/signal_judge_get_contract_schema_for_code.py

Comment thread examples/contracts/signal_judge.py Outdated
Comment on lines +58 to +69
price_url = (
"https://api.binance.com/api/v3/ticker/price?symbol="
+ asset_upper
+ "USDT"
)

# rebind for the closure (all web fetch + LLM calls must run inside nondet block)
_asset = asset_upper
_prediction = prediction.strip()[:500] # cap length to limit prompt-injection surface
_reasoning = reasoning.strip()[:500]
_target = target_price
_direction = direction
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded USDT pair fails silently for tickers without a USDT market or that already end in USDT.

asset_upper + "USDT" will produce USDTUSDT for asset="USDT" and is invalid for any token Binance only quotes against BTC/ETH/etc. The gl.nondet.web.render call will then return an error payload (or HTTP 400 body) which the LLM may try to "interpret", yielding a non-deterministic / nonsense judgment that still updates total. Consider rejecting USDT explicitly and either documenting the USDT-pair-only constraint clearly or accepting a quote currency parameter.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 58 - 69, The price_url
construction blindly appends "USDT" causing invalid symbols like "USDTUSDT" and
missing markets; update the logic that builds price_url (refer to price_url and
_asset in this file) to: 1) if asset_upper already endswith "USDT" use it as-is,
2) otherwise append the desired quote currency (default "USDT") or accept a new
quote_currency parameter, and 3) explicitly reject or return an error when the
base asset is exactly "USDT" (so callers know to supply a valid base/quote),
while keeping the existing sanitizations for _prediction and _reasoning. Ensure
the updated validation happens before calling gl.nondet.web.render so the render
never receives an invalid symbol.

Comment thread examples/contracts/signal_judge.py Outdated
_asset = asset_upper
_prediction = prediction.strip()[:500] # cap length to limit prompt-injection surface
_reasoning = reasoning.strip()[:500]
_target = target_price
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

target_price is unvalidated and interpolated raw into the prompt.

Unlike asset and prediction/reasoning, target_price is neither stripped, length-bounded, nor checked to be a numeric string before being injected into the prompt and stored in the signal log. A trader can pass arbitrary multi-line text to manipulate the LLM judgment or inflate storage. Validate it as a positive numeric string (e.g., parse with Decimal/float and re-stringify) before use.

🛡️ Proposed fix
-        sender = gl.message.sender_address
+        target_stripped = target_price.strip()
+        try:
+            if float(target_stripped) <= 0:
+                raise ValueError
+        except ValueError:
+            raise gl.vm.UserError("target_price must be a positive numeric string")
+
+        sender = gl.message.sender_address
@@
-        _target = target_price
+        _target = target_stripped
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` at line 68, The _target = target_price
assignment uses unvalidated input; ensure target_price is stripped, single-line,
length-bounded, and parsed as a positive number before assigning or logging. In
the function/handler that receives target_price (look for the _target assignment
and any signal log/prompt construction), trim whitespace, reject or truncate if
it contains newlines or exceeds a short reasonable length, parse with Decimal
(or float) to confirm it's a positive numeric value, then re-stringify the
canonical numeric form and assign that to _target and store/use that sanitized
value in prompts/logs.

Comment thread examples/contracts/signal_judge.py Outdated
Comment on lines +112 to +115
try:
judgment = json.loads(raw)
except json.JSONDecodeError as e:
raise gl.vm.UserError(f"LLM did not return valid JSON: {e}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm the exception types raised by Address() in py-genlayer so the except clause is correctly narrowed.
fd -t f -e py | xargs rg -nP -C3 '\bclass\s+Address\b|\bdef\s+__init__\b.*Address|raise\s+\w+.*address' 2>/dev/null | head -100

Repository: genlayerlabs/genlayer-studio

Length of output: 6272


Add exception chaining via raise ... from e at lines 115 and 173.

The raise ... from e pattern should be applied to preserve exception context for debugging:

  • Line 115: JSONDecodeError → chain with from e
  • Line 173: Current bare Exception from Address() → add from e

Note: The suggestion to narrow the Address() catch clause from Exception to (ValueError, TypeError) is not applicable without modifying the Address class itself. Currently, Address.__init__ in backend/node/types.py raises a bare Exception (line 50: raise Exception(f"invalid address {val}")), not the suggested specific types. Either keep catching Exception with from e, or address the root cause by improving Address to raise more specific exception types.

🧰 Tools
🪛 Ruff (0.15.12)

[warning] 115-115: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 112 - 115, Update the two
exception-raise sites to preserve exception context by chaining with the caught
exception: in the JSON decode block where you currently catch
json.JSONDecodeError as e and raise gl.vm.UserError(f"LLM did not return valid
JSON: {e}"), change the raise to use "from e"; likewise, in the block that calls
Address(...) and currently catches Exception as e then raises a new Exception,
modify that raise to include "from e" so the original Address initialization
error is chained; reference the JSON parsing block where judgment =
json.loads(raw) and the Address() call to locate the changes.

Comment thread examples/contracts/signal_judge.py Outdated
Comment on lines +119 to +121
judgment["current_price"] = str(judgment.get("current_price", ""))
judgment["correct"] = bool(judgment.get("correct", False))
judgment["reasoning_quality"] = int(judgment.get("reasoning_quality", 0))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Defensive coercion can silently flip correct to True.

bool(judgment.get("correct", False)) is unsafe defensive coding: if a model ever returns "correct": "false" (a JSON string instead of a JSON bool), json.loads yields the Python string "false", and bool("false") is True — a losing signal would be recorded as a win and reach consensus across validators. Since the prompt explicitly demands a JSON bool, parse strictly or normalize string forms.

🛡️ Proposed fix
-        # defensive type coercion — some models return float for current_price
-        # which breaks calldata encoding when stored
-        judgment["current_price"] = str(judgment.get("current_price", ""))
-        judgment["correct"] = bool(judgment.get("correct", False))
-        judgment["reasoning_quality"] = int(judgment.get("reasoning_quality", 0))
+        # defensive type coercion — some models return float for current_price
+        # which breaks calldata encoding when stored
+        judgment["current_price"] = str(judgment.get("current_price", ""))
+        raw_correct = judgment.get("correct", False)
+        if isinstance(raw_correct, str):
+            raw_correct = raw_correct.strip().lower() == "true"
+        judgment["correct"] = bool(raw_correct)
+        judgment["reasoning_quality"] = int(judgment.get("reasoning_quality", 0))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 119 - 121, The current
defensive coercion for judgment["correct"] uses bool(judgment.get("correct",
False)) which treats non-empty strings like "false" as True; change this to
strict normalization: read raw = judgment.get("correct", None) (or similar), if
isinstance(raw, bool) use it, if isinstance(raw, str) map raw.lower() in
("true","false","1","0") to the corresponding boolean, if isinstance(raw, (int,
float)) treat 1/0 accordingly, otherwise fallback to False or raise a validation
error so the model output cannot silently flip from false to true; update the
code that sets judgment["correct"] to perform this explicit parsing/validation.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
examples/contracts/signal_judge.py (3)

196-203: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parse correct strictly instead of bool(...).

If the model returns "correct": "false", json.loads() produces the Python string "false" and bool("false") becomes True. That flips a losing signal into a recorded win.

Suggested fix
-        correct = bool(judgment.get("correct", False))
+        raw_correct = judgment.get("correct", False)
+        if isinstance(raw_correct, bool):
+            correct = raw_correct
+        elif isinstance(raw_correct, str):
+            normalized = raw_correct.strip().lower()
+            if normalized == "true":
+                correct = True
+            elif normalized == "false":
+                correct = False
+            else:
+                raise gl.vm.UserError("LLM returned an invalid 'correct' value")
+        elif raw_correct in (0, 1):
+            correct = bool(raw_correct)
+        else:
+            raise gl.vm.UserError("LLM returned an invalid 'correct' value")
         current_price = str(judgment.get("current_price", ""))
         rq = int(judgment.get("reasoning_quality", 0))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 196 - 203, The code
currently uses bool(judgment.get("correct", False)) which treats non-empty
strings like "false" as True; replace this with a strict parsing of the
"correct" field from the judgment dict: read val = judgment.get("correct",
False) and then set correct so that if isinstance(val, bool) you use it
directly, elif isinstance(val, str) you set correct = val.strip().lower() in
("true", "1", "yes"), else default to False; update the assignment for correct
(and leave current_price and rq assignments unchanged) so signal_judge.py
handles string "false" correctly.

140-156: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail closed on invalid symbols and non-price Binance payloads.

This path still blindly appends USDT, and it never verifies that Binance actually returned a price JSON before handing the body to the model. Invalid symbols like USDTUSDT, missing markets, or upstream error pages can still flow into the LLM and mutate total with a nonsense judgment.

Suggested fix
-        price_url = (
-            "https://api.binance.com/api/v3/ticker/price?symbol="
-            + sig["asset"]
-            + "USDT"
-        )
+        symbol = sig["asset"]
+        if symbol == "USDT":
+            raise gl.vm.UserError("asset must be a base ticker, not USDT")
+        if not symbol.endswith("USDT"):
+            symbol = f"{symbol}USDT"
+        price_url = f"https://api.binance.com/api/v3/ticker/price?symbol={symbol}"
@@
         def get_judgment() -> str:
             web_data = gl.nondet.web.render(price_url, mode="text")
+            payload = json.loads(web_data)
+            price = payload.get("price")
+            if not isinstance(price, str) or not price:
+                raise gl.vm.UserError("Binance did not return a price")
 
             task = f"""
@@
-Live market data from Binance (current price at resolution time):
-{web_data}
+Live market data from Binance (current price at resolution time):
+{{"price": "{price}"}}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 140 - 156, The code blindly
builds price_url by appending "USDT" and passes the web response into the model
without validation; update the construction and get_judgment to (1)
normalize/validate sig["asset"] before appending (e.g., reject or strip if it
already contains "USDT" and ensure asset matches a safe regex of tickers) and
build price_url from the validated ticker, and (2) after calling
gl.nondet.web.render(price_url, mode="text") parse and validate the response
body is valid Binance JSON with a numeric "price" field (and check HTTP
success/error cases) — if validation fails, return a safe closed/invalid
judgment from get_judgment instead of feeding the raw payload to the LLM; use
the symbols price_url, get_judgment, and gl.nondet.web.render to locate where to
implement these checks.

57-65: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate target_price before persisting it.

target_price is stored raw and later interpolated directly into the LLM prompt. A multiline or non-numeric value can steer the judge prompt and create signals that cannot be resolved meaningfully.

Trim it, reject CR/LF, parse it as a positive numeric value, and store the normalized string instead of the original input.

Also applies to: 91-99

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 57 - 65, In submit_signal,
validate target_price before saving: trim whitespace, reject any input
containing CR/LF or non-printable/newline characters, attempt to parse it as a
positive numeric value (float), and if valid store a normalized string (e.g.,
canonical decimal with no extra spaces) instead of the raw input; return a
validation error for multiline, non-numeric, or non-positive values. Apply the
same validation and normalization to the other signal submission helper in this
file that mirrors submit_signal so both code paths persist only trimmed,
single-line, positive numeric target_price strings.
🧹 Nitpick comments (1)
tests/integration/icontracts/tests/test_signal_judge.py (1)

7-7: ⚡ Quick win

Add annotations to the new test functions.

These tests are untyped today. Adding -> None and annotating the fixture parameter keeps the file aligned with the repo's Python typing rule.

As per coding guidelines, "Include type hints in all Python code".

Also applies to: 41-41

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/icontracts/tests/test_signal_judge.py` at line 7, Add
function and parameter type hints to test_signal_judge_submit: add a return
annotation "-> None" and annotate the fixture parameter "setup_validators" with
its fixture type (e.g., the actual fixture class/type used for validators or, if
unknown, use typing.Any). Update the function signature of
test_signal_judge_submit (and the other noted test at line 41) from "def
test_signal_judge_submit(setup_validators):" to "def
test_signal_judge_submit(setup_validators: <FixtureType>) -> None:" so the test
follows the repository's typing rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/contracts/signal_judge.py`:
- Around line 9-17: TIMEFRAMES currently includes a zero-second "test" entry
that is accepted by the on-chain entrypoint submit_signal, enabling trivial
leaderboard farming; remove "test" from the public TIMEFRAMES mapping and add
validation in submit_signal (and any helper that reads TIMEFRAMES) to reject
unknown/forbidden timeframes, or gate acceptance of "test" behind a
compile-time/test-only flag so it cannot be passed on-chain in production;
update references to TIMEFRAMES and the submit_signal parameter validation (and
the other usage noted around the same area) to enforce this rule.

In `@tests/integration/icontracts/tests/test_signal_judge.py`:
- Around line 54-59: Update the mock dict key named
eq_principle_prompt_comparative so its prompt string exactly matches the real
contract prompt in signal_judge (replace the current text with: "The boolean
field 'correct' must have the same value across all answers. Ignore any
differences in 'current_price' (varies by cents because validators fetch at
slightly different timestamps) and ignore 'reasoning_quality' (subjective
rating)."); ensure the resulting string concatenation and whitespace match the
contract prompt so substring matching in the test will succeed.

---

Duplicate comments:
In `@examples/contracts/signal_judge.py`:
- Around line 196-203: The code currently uses bool(judgment.get("correct",
False)) which treats non-empty strings like "false" as True; replace this with a
strict parsing of the "correct" field from the judgment dict: read val =
judgment.get("correct", False) and then set correct so that if isinstance(val,
bool) you use it directly, elif isinstance(val, str) you set correct =
val.strip().lower() in ("true", "1", "yes"), else default to False; update the
assignment for correct (and leave current_price and rq assignments unchanged) so
signal_judge.py handles string "false" correctly.
- Around line 140-156: The code blindly builds price_url by appending "USDT" and
passes the web response into the model without validation; update the
construction and get_judgment to (1) normalize/validate sig["asset"] before
appending (e.g., reject or strip if it already contains "USDT" and ensure asset
matches a safe regex of tickers) and build price_url from the validated ticker,
and (2) after calling gl.nondet.web.render(price_url, mode="text") parse and
validate the response body is valid Binance JSON with a numeric "price" field
(and check HTTP success/error cases) — if validation fails, return a safe
closed/invalid judgment from get_judgment instead of feeding the raw payload to
the LLM; use the symbols price_url, get_judgment, and gl.nondet.web.render to
locate where to implement these checks.
- Around line 57-65: In submit_signal, validate target_price before saving: trim
whitespace, reject any input containing CR/LF or non-printable/newline
characters, attempt to parse it as a positive numeric value (float), and if
valid store a normalized string (e.g., canonical decimal with no extra spaces)
instead of the raw input; return a validation error for multiline, non-numeric,
or non-positive values. Apply the same validation and normalization to the other
signal submission helper in this file that mirrors submit_signal so both code
paths persist only trimmed, single-line, positive numeric target_price strings.

---

Nitpick comments:
In `@tests/integration/icontracts/tests/test_signal_judge.py`:
- Line 7: Add function and parameter type hints to test_signal_judge_submit: add
a return annotation "-> None" and annotate the fixture parameter
"setup_validators" with its fixture type (e.g., the actual fixture class/type
used for validators or, if unknown, use typing.Any). Update the function
signature of test_signal_judge_submit (and the other noted test at line 41) from
"def test_signal_judge_submit(setup_validators):" to "def
test_signal_judge_submit(setup_validators: <FixtureType>) -> None:" so the test
follows the repository's typing rules.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b45d2bef-117d-43b8-aba8-4963ca12ede0

📥 Commits

Reviewing files that changed from the base of the PR and between 5a5e2f9 and 100a621.

📒 Files selected for processing (3)
  • examples/contracts/signal_judge.py
  • tests/integration/icontracts/schemas/signal_judge_get_contract_schema_for_code.py
  • tests/integration/icontracts/tests/test_signal_judge.py
✅ Files skipped from review due to trivial changes (1)
  • tests/integration/icontracts/schemas/signal_judge_get_contract_schema_for_code.py

Comment on lines +9 to +17
TIMEFRAMES = {
"5min": 300,
"15min": 900,
"30min": 1800,
"1h": 3600,
"4h": 14400,
"1d": 86400,
"test": 0, # zero-second deadline for integration tests only
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the zero-second test timeframe out of the public ABI.

"test" is marked as integration-only, but submit_signal() accepts it on-chain. That lets anyone submit and immediately resolve against essentially the current price, which makes the leaderboard trivially farmable.

Suggested fix
 TIMEFRAMES = {
     "5min": 300,
     "15min": 900,
     "30min": 1800,
     "1h": 3600,
     "4h": 14400,
     "1d": 86400,
-    "test": 0,  # zero-second deadline for integration tests only
 }
@@
-        if timeframe not in ("5min", "15min", "30min", "1h", "4h", "1d", "test"):
+        if timeframe not in ("5min", "15min", "30min", "1h", "4h", "1d"):
             raise gl.vm.UserError("timeframe must be one of: 5min, 15min, 30min, 1h, 4h, 1d")

Also applies to: 84-85

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/contracts/signal_judge.py` around lines 9 - 17, TIMEFRAMES currently
includes a zero-second "test" entry that is accepted by the on-chain entrypoint
submit_signal, enabling trivial leaderboard farming; remove "test" from the
public TIMEFRAMES mapping and add validation in submit_signal (and any helper
that reads TIMEFRAMES) to reject unknown/forbidden timeframes, or gate
acceptance of "test" behind a compile-time/test-only flag so it cannot be passed
on-chain in production; update references to TIMEFRAMES and the submit_signal
parameter validation (and the other usage noted around the same area) to enforce
this rule.

Comment on lines +54 to +59
"eq_principle_prompt_comparative": {
"The boolean field 'correct' must have the same value across all answers. "
"Ignore any differences in 'current_price' (which varies by cents because it "
"is fetched at different timestamps) and ignore differences in "
"'reasoning_quality' (which is a subjective rating).": True
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n -C3 'def setup_validators|eq_principle_prompt_comparative|prompt_comparative' --type=py

Repository: genlayerlabs/genlayer-studio

Length of output: 5932


🏁 Script executed:

sed -n '188,195p' examples/contracts/signal_judge.py

Repository: genlayerlabs/genlayer-studio

Length of output: 462


🏁 Script executed:

sed -n '45,100p' tests/integration/icontracts/conftest.py

Repository: genlayerlabs/genlayer-studio

Length of output: 2428


🏁 Script executed:

find . -type f -name "*.py" | xargs grep -l "mock_response" | head -10

Repository: genlayerlabs/genlayer-studio

Length of output: 533


🏁 Script executed:

rg -n "mock_response" ./backend/validators/__init__.py -A 5 -B 5

Repository: genlayerlabs/genlayer-studio

Length of output: 780


🏁 Script executed:

rg -n "mock_response\[" . --type=py -B 2 -A 2

Repository: genlayerlabs/genlayer-studio

Length of output: 54


🏁 Script executed:

rg -n "mock_response" . --type=py -A 3 | grep -A 3 -B 3 "get\|key\|\["

Repository: genlayerlabs/genlayer-studio

Length of output: 5304


🏁 Script executed:

sed -n '1,65p' tests/integration/icontracts/tests/test_signal_judge.py

Repository: genlayerlabs/genlayer-studio

Length of output: 2133


🏁 Script executed:

sed -n '10,20p' tests/integration/icontracts/tests/test_llm_erc20.py

Repository: genlayerlabs/genlayer-studio

Length of output: 713


🏁 Script executed:

sed -n '188,195p' examples/contracts/signal_judge.py && echo "---" && sed -n '54,60p' tests/integration/icontracts/tests/test_signal_judge.py

Repository: genlayerlabs/genlayer-studio

Length of output: 856


Update the eq_principle_prompt_comparative mock key to match the actual contract prompt.

The mock prompt string (lines 54-59) differs from the actual prompt in examples/contracts/signal_judge.py (lines 190-193). The mock system uses substring matching, so this mismatch will cause the mock to fail and the test to fall through to the real LLM or return incorrect results.

Change the mock key from:

"The boolean field 'correct' must have the same value across all answers. "
"Ignore any differences in 'current_price' (which varies by cents because it "
"is fetched at different timestamps) and ignore differences in "
"'reasoning_quality' (which is a subjective rating)."

To:

"The boolean field 'correct' must have the same value across all answers. "
"Ignore any differences in 'current_price' (varies by cents because validators "
"fetch at slightly different timestamps) and ignore 'reasoning_quality' "
"(subjective rating)."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/icontracts/tests/test_signal_judge.py` around lines 54 -
59, Update the mock dict key named eq_principle_prompt_comparative so its prompt
string exactly matches the real contract prompt in signal_judge (replace the
current text with: "The boolean field 'correct' must have the same value across
all answers. Ignore any differences in 'current_price' (varies by cents because
validators fetch at slightly different timestamps) and ignore
'reasoning_quality' (subjective rating)."); ensure the resulting string
concatenation and whitespace match the contract prompt so substring matching in
the test will succeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant