Skip to content

Commit ee5c4eb

Browse files
committed
Validate run input types
1 parent 4e0b77c commit ee5c4eb

5 files changed

Lines changed: 117 additions & 8 deletions

File tree

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Chinese documentation: [README_CN.md](./README_CN.md).
1414
- `Result/PushData`
1515
- `Log/Debug`, `Log/Info`, `Log/Warn`, `Log/Error`
1616
- Runtime input injection from `input_schema.json` defaults, `--input`, or `--json`
17-
- Run input validation for required `input_schema.json` fields before the worker starts
17+
- Run input validation for required fields and declared value types in `input_schema.json` before the worker starts
1818
- Platform environment variables:
1919
- `ChromeWs`
2020
- `CDP_ENDPOINT` / `BROWSER_WS_ENDPOINT`
@@ -93,7 +93,7 @@ Validation checks:
9393

9494
CoreClaw installs dependencies from `requirements.txt`, `package.json`, or `go.mod` after upload. The CLI therefore rejects workers that rely on locally installed SDK packages but do not declare those packages for the cloud installer.
9595

96-
At run time, the CLI also validates the actual input assembled from defaults, `--input`, or `--json`. If a field marked `"required": true` is missing or empty, the command fails before creating run artifacts or starting the worker, matching CoreClaw's form-level launch behavior.
96+
At run time, the CLI also validates the actual input assembled from defaults, `--input`, or `--json`. If a field marked `"required": true` is missing or empty, or if a declared input field has the wrong JSON type, the command fails before creating run artifacts or starting the worker, matching CoreClaw's form-level launch behavior.
9797

9898
CoreClaw's docs describe `output_schema.json` for upload-ready projects, but the current platform still accepts older workers without it. The CLI treats a missing `output_schema.json` as a warning, not a blocker. Local `export.ndjson` keeps the full raw result rows when no output schema exists.
9999

@@ -128,7 +128,7 @@ The run starts a local CoreClaw SDK gRPC server on `127.0.0.1:20086`, then execu
128128

129129
Use `--timeout-ms` to cap the whole worker process and `--idle-timeout-ms` to stop a worker that has stopped producing output but still has open Node/Python/Go handles. Durations accept milliseconds, `s`, or `m`.
130130

131-
If the input schema marks a field as required, local runs require a non-empty value for that field. Use `--input input.json` or `--json '{"field":"value"}'` when the schema does not provide a default.
131+
If the input schema marks a field as required, local runs require a non-empty value for that field. Declared fields must also match their schema type, for example `integer` must be an integer, `boolean` must be a boolean, and `array` must be a JSON array. Use `--input input.json` or `--json '{"field":"value"}'` when the schema does not provide a default.
132132

133133
Use `--min-results` for real worker smoke tests. Some existing workers can exit with code `0` after logging an upstream or browser error, so result count is the reliable success gate.
134134

README_CN.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ CoreClaw 官方开发者文档描述了上传就绪的 worker 项目结构、平
1212
- `Result/PushData`
1313
- `Log/Debug``Log/Info``Log/Warn``Log/Error`
1414
-`input_schema.json` 默认值、`--input``--json` 注入运行输入。
15-
- worker 启动前校验实际输入是否满足 `input_schema.json` 的 required 字段
15+
- worker 启动前校验实际输入是否满足 `input_schema.json` 的 required 字段和声明类型
1616
- 平台环境变量:
1717
- `ChromeWs`
1818
- `CDP_ENDPOINT` / `BROWSER_WS_ENDPOINT`
@@ -92,7 +92,7 @@ node ./bin/coreclaw.js validate ./examples/node-hello
9292

9393
CoreClaw 上传后会从 `requirements.txt``package.json``go.mod` 安装依赖。因此 CLI 会拒绝那些本地机器因为已安装 SDK 包而能运行、但云端安装文件没有声明这些包的 worker。
9494

95-
运行时,CLI 还会校验由默认值、`--input``--json` 拼出的实际输入。如果某个字段标记了 `"required": true`但本次输入缺失或为空,命令会在创建 run 产物和启动 worker 前失败,贴近 CoreClaw 表单层的启动行为。
95+
运行时,CLI 还会校验由默认值、`--input``--json` 拼出的实际输入。如果某个字段标记了 `"required": true` 但本次输入缺失或为空,或者声明字段的 JSON 类型不匹配,命令会在创建 run 产物和启动 worker 前失败,贴近 CoreClaw 表单层的启动行为。
9696

9797
官方文档把 `output_schema.json` 描述为上传就绪项目文件,但当前平台仍兼容没有 `output_schema.json` 的老 worker。因此 CLI 把缺失 `output_schema.json` 作为 warning,而不是阻塞错误。没有 output schema 时,本地 `export.ndjson` 会保留完整原始结果行。
9898

@@ -127,7 +127,7 @@ node ./bin/coreclaw.js run ./browser-worker --captcha-solver --require-captcha-s
127127

128128
`--timeout-ms` 用于限制整个 worker 进程运行时间;`--idle-timeout-ms` 用于停止已经不再输出但仍有 Node/Python/Go handle 未退出的 worker。时长支持毫秒、`s``m`
129129

130-
如果 input schema 把某个字段标记为 required,本地 run 也要求该字段有非空值。schema 没有提供 default 时,使用 `--input input.json``--json '{"field":"value"}'` 传入。
130+
如果 input schema 把某个字段标记为 required,本地 run 也要求该字段有非空值。声明字段还必须匹配 schema 类型,例如 `integer` 必须是整数,`boolean` 必须是布尔值,`array` 必须是 JSON 数组。schema 没有提供 default 时,使用 `--input input.json``--json '{"field":"value"}'` 传入。
131131

132132
真实 worker 冒烟测试应使用 `--min-results`。有些 worker 会在上游或浏览器失败后仍以 exit code `0` 退出,因此结果行数才是更可靠的成功门槛。
133133

src/runtime/input.js

Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,11 +63,23 @@ export function inputSchemaInputIssues(input, schema) {
6363

6464
const issues = [];
6565
for (const property of schema.properties) {
66-
if (!property || property.required !== true || typeof property.name !== 'string') {
66+
if (!property || typeof property.name !== 'string') {
6767
continue;
6868
}
69-
if (!Object.prototype.hasOwnProperty.call(input, property.name) || isEmptyInputValue(input[property.name])) {
69+
if (!Object.prototype.hasOwnProperty.call(input, property.name)) {
70+
if (property.required === true) {
71+
issues.push(`required field "${property.name}" is missing or empty`);
72+
}
73+
continue;
74+
}
75+
76+
if (property.required === true && isEmptyInputValue(input[property.name])) {
7077
issues.push(`required field "${property.name}" is missing or empty`);
78+
continue;
79+
}
80+
81+
if (!inputValueMatchesType(input[property.name], property.type)) {
82+
issues.push(`field "${property.name}" must be ${inputTypeLabel(property.type)}`);
7183
}
7284
}
7385
return issues;
@@ -120,6 +132,41 @@ function isEmptyInputValue(value) {
120132
return false;
121133
}
122134

135+
function inputValueMatchesType(value, type) {
136+
switch (normalizeInputType(type)) {
137+
case 'string':
138+
return typeof value === 'string';
139+
case 'integer':
140+
return Number.isInteger(value);
141+
case 'boolean':
142+
return typeof value === 'boolean';
143+
case 'array':
144+
return Array.isArray(value);
145+
case 'object':
146+
return value !== null && typeof value === 'object' && !Array.isArray(value);
147+
default:
148+
return true;
149+
}
150+
}
151+
152+
function inputTypeLabel(type) {
153+
const normalized = normalizeInputType(type);
154+
if (normalized === 'integer') {
155+
return 'an integer';
156+
}
157+
if (normalized === 'array' || normalized === 'object') {
158+
return `an ${normalized}`;
159+
}
160+
return `a ${normalized}`;
161+
}
162+
163+
function normalizeInputType(type) {
164+
if (type === 'number') {
165+
return 'integer';
166+
}
167+
return type;
168+
}
169+
123170
function singularize(key) {
124171
if (key.endsWith('ies')) {
125172
return `${key.slice(0, -3)}y`;

test/run.test.js

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,30 @@ main().catch((error) => {
185185
assert.equal(fs.existsSync(path.join(dir, '.coreclaw')), false);
186186
});
187187

188+
test('runCommand fails before creating run artifacts when input type is invalid', async () => {
189+
const dir = createNodeFixture(`
190+
const coresdk = require('./sdk')
191+
async function main() {
192+
await coresdk.result.pushData({ ok: true })
193+
}
194+
main().catch((error) => {
195+
console.error(error)
196+
process.exit(1)
197+
})
198+
`);
199+
200+
await assert.rejects(
201+
() => runCommand(dir, {
202+
node: process.execPath,
203+
json: '{"items":"not a list"}',
204+
tmpHook: false,
205+
}),
206+
(error) => error instanceof CliError && /field "items" must be an array/.test(error.message),
207+
);
208+
209+
assert.equal(fs.existsSync(path.join(dir, '.coreclaw')), false);
210+
});
211+
188212
test('runCommand fails upload parity when proxy usage is required but worker does not use it', async () => {
189213
const dir = createNodeFixture(`
190214
const coresdk = require('./sdk')

test/schema.test.js

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,44 @@ test('validates actual run input against required schema fields', () => {
7575
);
7676
});
7777

78+
test('validates actual run input types against input schema fields', () => {
79+
const schema = {
80+
properties: [
81+
{ name: 'keyword', type: 'string' },
82+
{ name: 'limit', type: 'integer' },
83+
{ name: 'legacyLimit', type: 'number' },
84+
{ name: 'enabled', type: 'boolean' },
85+
{ name: 'items', type: 'array' },
86+
{ name: 'options', type: 'object' },
87+
],
88+
};
89+
90+
assert.deepEqual(inputSchemaInputIssues({
91+
keyword: 'coreclaw',
92+
limit: 3,
93+
legacyLimit: 4,
94+
enabled: false,
95+
items: [],
96+
options: {},
97+
extra: 'allowed',
98+
}, schema), []);
99+
assert.deepEqual(inputSchemaInputIssues({
100+
keyword: 123,
101+
limit: 1.5,
102+
legacyLimit: '4',
103+
enabled: 'false',
104+
items: {},
105+
options: [],
106+
}, schema), [
107+
'field "keyword" must be a string',
108+
'field "limit" must be an integer',
109+
'field "legacyLimit" must be an integer',
110+
'field "enabled" must be a boolean',
111+
'field "items" must be an array',
112+
'field "options" must be an object',
113+
]);
114+
});
115+
78116
test('validates output schema', () => {
79117
const issues = validateOutputSchema([
80118
{ name: 'url', type: 'string', description: 'URL' },

0 commit comments

Comments
 (0)