You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, TLM uses the: "medium" `quality_preset`, "gpt-4.1-mini" base
520
520
`model`, and `max_tokens` is set to 512. You can set custom values for these
@@ -550,12 +550,11 @@ def validate(
550
550
strange prompts or prompts that are too vague/open-ended to receive a clearly defined 'good' response.
551
551
TLM measures consistency via the degree of contradiction between sampled responses that the model considers plausible.
552
552
553
-
use_self_reflection (bool, default = `True`): whether the LLM is asked to reflect on the given response and directly evaluate correctness/confidence.
554
-
Setting this False disables reflection and will reduce runtimes/costs, but potentially also the reliability of trustworthiness scores.
555
-
Reflection helps quantify aleatoric uncertainty associated with challenging prompts
556
-
and catches responses that are noticeably incorrect/bad upon further analysis.
553
+
num_self_reflections(int, default = 3): the number of self-reflections to perform where the LLM is asked to reflect on the given response and directly evaluate correctness/confidence.
554
+
The maximum number of self-reflections currently supported is 3. Lower values will reduce runtimes/costs, but potentially also the reliability of trustworthiness scores.
555
+
Reflection helps quantify aleatoric uncertainty associated with challenging prompts and catches responses that are noticeably incorrect/bad upon further analysis.
557
556
558
-
similarity_measure ({"semantic", "string", "embedding", "embedding_large", "code", "discrepancy"}, default = "semantic"): how the
557
+
similarity_measure ({"semantic", "string", "embedding", "embedding_large", "code", "discrepancy"}, default = "discrepancy"): how the
559
558
trustworthiness scoring's consistency algorithm measures similarity between alternative responses considered plausible by the model.
560
559
Supported similarity measures include - "semantic" (based on natural language inference),
561
560
"embedding" (based on vector embedding similarity), "embedding_large" (based on a larger embedding model),
@@ -574,6 +573,8 @@ def validate(
574
573
- name: Name of the evaluation criteria.
575
574
- criteria: Instructions specifying the evaluation criteria.
576
575
576
+
use_self_reflection (bool, default = `True`): deprecated. Use `num_self_reflections` instead.
577
+
577
578
prompt: The prompt to use for the TLM call. If not provided, the prompt will be
578
579
generated from the messages.
579
580
@@ -582,6 +583,9 @@ def validate(
582
583
rewritten_question: The re-written query if it was provided by the client to Codex from a user to be
583
584
used instead of the original query.
584
585
586
+
tools: Tools to use for the LLM call. If not provided, it is assumed no tools were
587
+
provided to the LLM.
588
+
585
589
extra_headers: Send extra headers
586
590
587
591
extra_query: Add additional query parameters to the request
By default, TLM uses the: "medium" `quality_preset`, "gpt-4.1-mini" base
1088
1093
`model`, and `max_tokens` is set to 512. You can set custom values for these
@@ -1118,12 +1123,11 @@ async def validate(
1118
1123
strange prompts or prompts that are too vague/open-ended to receive a clearly defined 'good' response.
1119
1124
TLM measures consistency via the degree of contradiction between sampled responses that the model considers plausible.
1120
1125
1121
-
use_self_reflection (bool, default = `True`): whether the LLM is asked to reflect on the given response and directly evaluate correctness/confidence.
1122
-
Setting this False disables reflection and will reduce runtimes/costs, but potentially also the reliability of trustworthiness scores.
1123
-
Reflection helps quantify aleatoric uncertainty associated with challenging prompts
1124
-
and catches responses that are noticeably incorrect/bad upon further analysis.
1126
+
num_self_reflections(int, default = 3): the number of self-reflections to perform where the LLM is asked to reflect on the given response and directly evaluate correctness/confidence.
1127
+
The maximum number of self-reflections currently supported is 3. Lower values will reduce runtimes/costs, but potentially also the reliability of trustworthiness scores.
1128
+
Reflection helps quantify aleatoric uncertainty associated with challenging prompts and catches responses that are noticeably incorrect/bad upon further analysis.
1125
1129
1126
-
similarity_measure ({"semantic", "string", "embedding", "embedding_large", "code", "discrepancy"}, default = "semantic"): how the
1130
+
similarity_measure ({"semantic", "string", "embedding", "embedding_large", "code", "discrepancy"}, default = "discrepancy"): how the
1127
1131
trustworthiness scoring's consistency algorithm measures similarity between alternative responses considered plausible by the model.
1128
1132
Supported similarity measures include - "semantic" (based on natural language inference),
1129
1133
"embedding" (based on vector embedding similarity), "embedding_large" (based on a larger embedding model),
@@ -1142,6 +1146,8 @@ async def validate(
1142
1146
- name: Name of the evaluation criteria.
1143
1147
- criteria: Instructions specifying the evaluation criteria.
1144
1148
1149
+
use_self_reflection (bool, default = `True`): deprecated. Use `num_self_reflections` instead.
1150
+
1145
1151
prompt: The prompt to use for the TLM call. If not provided, the prompt will be
1146
1152
generated from the messages.
1147
1153
@@ -1150,6 +1156,9 @@ async def validate(
1150
1156
rewritten_question: The re-written query if it was provided by the client to Codex from a user to be
1151
1157
used instead of the original query.
1152
1158
1159
+
tools: Tools to use for the LLM call. If not provided, it is assumed no tools were
1160
+
provided to the LLM.
1161
+
1153
1162
extra_headers: Send extra headers
1154
1163
1155
1164
extra_query: Add additional query parameters to the request
0 commit comments