Skip to content

Commit fc8c944

Browse files
authored
Extract CDDL definitions (#1723)
* Extract CDDL definitions Needed for w3c/webref#1353. With this update, Reffy now looks for and extracts CDDL content defined in `<pre class="cddl">` block. The logic is vastly similar to the logic used for IDL. Shared code was factored out accordingly. Something specific about CDDL: on top of generating text extracts, the goal is also to create one extract per CDDL module that the spec defines. To associate a `<pre>` block with one or more CDDL module, the code looks for a possible `data-cddl-module` module, or for module names in the `class` attribute (prefixed by `cddl-` or suffixed by `-cddl`). The former isn't used by any spec but is the envisioned mechanism in Bikeshed to define the association, the latter is the convention currently used in the WebDriver BiDi specification. When a spec defines modules, CDDL defined in a `<pre>` block with no explicit module annotation is considered to be defined for all modules (not doing so would essentially mean the CDDL would not be defined for any module, which seems weird). When there is CDDL, the extraction produces: 1. an extract that contains all CDDL definitions: `cddl/[shortname].cddl` 2. one extract per CDDL module: `cddl/[shortname]-[modulename].cddl` (I'm going to assume that no one is ever going to define a module name that would make `[shortname]-[modulename]` collide with the shortname of another spec). Note: some specs that define CDDL do not flag the `<pre>` blocks in any way (Open Screen Protocol, WebAuthn). Extraction won't work for them for now. Also, there are a couple of places in the WebDriver BiDi spec that use a `<pre class="cddl">` block to *reference* a CDDL construct defined elsewhere. Extraction will happily include these references as well, leading to CDDL extracts that contain invalid CDDL. These need fixing in the specs. * Change name of "all" extract, allow CDDL defs for it When a spec defines CDDL modules, the union of all CDDL is now written to a file named `[shortname]-all.cddl` instead of simply `[shortname].cddl`. This is meant to make it slightly clearer that the union of all CDDL file is not necessarily the panacea. For example, it may not contain a useful first rule against which a CBOR data item that would match any of the modules may be validated. In other words, when the crawler produces a `[shortname].cddl` file, that means there's no module. If it doesn't, best is to check the module, with "all" being a reserved module name in the spec that gets interpreted to mean "any module". When a spec defines CDDL modules, it may also define CDDL rules that only appear in the "all" file by specifying `data-cddl-module="all"`. This is useful to define a useful first type in the "all" extract. * Integrate review feedback
1 parent ed90a50 commit fc8c944

9 files changed

+433
-51
lines changed

src/browserlib/extract-cddl.mjs

+125
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
import getCodeElements from './get-code-elements.mjs';
2+
import trimSpaces from './trim-spaces.mjs';
3+
4+
/**
5+
* Extract the list of CDDL definitions in the current spec.
6+
*
7+
* A spec may define more that one CDDL module. For example, the WebDriver BiDi
8+
* spec has CDDL definitions that apply to either of both the local end and the
9+
* remote end. The functions returns an array that lists all CDDL modules.
10+
*
11+
* Each CDDL module is represented as an object with the following keys whose
12+
* values are strings:
13+
* - shortname: the CDDL module shortname. Shortname is "" if the spec does not
14+
* define any module, and "all" for the dump of all CDDL definitions.
15+
* - label: A full name for the CDDL module, when defined.
16+
* - cddl: A dump of the CDDL definitions.
17+
*
18+
* If the spec defines more than one module, the first item in the array is the
19+
* "all" module that contains a dump of all CDDL definitions, regardless of the
20+
* module they are actually defined for (the assumption is that looking at the
21+
* union of all CDDL modules defined in a spec will always make sense, and that
22+
* a spec will never reuse the same rule name with a different definition for
23+
* different CDDL modules).
24+
*
25+
* @function
26+
* @public
27+
* @return {Array} A dump of the CDDL definitions per CDDL module, or an empty
28+
* array if the spec does not contain any CDDL.
29+
*/
30+
export default function () {
31+
// Specs with CDDL are either recent enough that they all use the same
32+
// `<pre class="cddl">` convention, or they don't flag CDDL blocks in any
33+
// way, making it impossible to extract them.
34+
const cddlSelectors = ['pre.cddl:not(.exclude):not(.extract)'];
35+
const excludeSelectors = ['#cddl-index'];
36+
37+
// Retrieve all elements that contains CDDL content
38+
const cddlEls = getCodeElements(cddlSelectors, { excludeSelectors });
39+
40+
// Start by assembling the list of modules
41+
const modules = {};
42+
for (const el of cddlEls) {
43+
const elModules = getModules(el);
44+
for (const name of elModules) {
45+
// "all" does not create a module on its own, that's the name of
46+
// the CDDL module that contains all CDDL definitions.
47+
if (name !== 'all') {
48+
modules[name] = [];
49+
}
50+
}
51+
}
52+
53+
// Assemble the CDDL per module
54+
const mergedCddl = [];
55+
for (const el of cddlEls) {
56+
const cddl = trimSpaces(el.textContent);
57+
if (!cddl) {
58+
continue;
59+
}
60+
// All CDDL appears in the "all" module.
61+
mergedCddl.push(cddl);
62+
let elModules = getModules(el);
63+
if (elModules.length === 0) {
64+
// No module means the CDDL is defined for all modules
65+
elModules = Object.keys(modules);
66+
}
67+
for (const name of elModules) {
68+
// CDDL defined for the "all" module is only defined for it
69+
if (name !== 'all') {
70+
if (!modules[name]) {
71+
modules[name] = [];
72+
}
73+
modules[name].push(cddl);
74+
}
75+
}
76+
}
77+
78+
if (mergedCddl.length === 0) {
79+
return [];
80+
}
81+
82+
const res = [{
83+
name: Object.keys(modules).length > 0 ? 'all' : '',
84+
cddl: mergedCddl.join('\n\n')
85+
}];
86+
for (const [name, cddl] of Object.entries(modules)) {
87+
res.push({ name, cddl: cddl.join('\n\n') });
88+
}
89+
// Remove trailing spaces and use spaces throughout
90+
for (const cddlModule of res) {
91+
cddlModule.cddl = cddlModule.cddl
92+
.replace(/\s+$/gm, '\n')
93+
.replace(/\t/g, ' ')
94+
.trim();
95+
}
96+
return res;
97+
}
98+
99+
100+
/**
101+
* Retrieve the list of CDDL module shortnames that the element references.
102+
*
103+
* This list of modules is either specified in a `data-cddl-module` attribute
104+
* or directly within the class attribute prefixed by `cddl-` or suffixed by
105+
* `-cddl`.
106+
*/
107+
function getModules(el) {
108+
const moduleAttr = el.getAttribute('data-cddl-module');
109+
if (moduleAttr) {
110+
return moduleAttr.split(',').map(str => str.trim());
111+
}
112+
113+
const list = [];
114+
const classes = el.classList.values()
115+
for (const name of classes) {
116+
const match = name.match(/^(.*)-cddl$|^cddl-(.*)$/);
117+
if (match) {
118+
const shortname = match[1] ?? match[2];
119+
if (!list.includes(shortname)) {
120+
list.push(shortname);
121+
}
122+
}
123+
}
124+
return list;
125+
}

src/browserlib/extract-webidl.mjs

+15-50
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
import getGenerator from './get-generator.mjs';
2-
import informativeSelector from './informative-selector.mjs';
3-
import cloneAndClean from './clone-and-clean.mjs';
2+
import getCodeElements from './get-code-elements.mjs';
3+
import trimSpaces from './trim-spaces.mjs';
44

55
/**
66
* Extract the list of WebIDL definitions in the current spec
77
*
88
* @function
99
* @public
10-
* @return {Promise} The promise to get a dump of the IDL definitions, or
11-
* an empty string if the spec does not contain any IDL.
10+
* @return {String} A dump of the IDL definitions, or an empty string if the
11+
* spec does not contain any IDL.
1212
*/
1313
export default function () {
1414
const generator = getGenerator();
@@ -70,56 +70,21 @@ function extractBikeshedIdl() {
7070
* sure that it only extracts elements once.
7171
*/
7272
function extractRespecIdl() {
73-
// Helper function that trims individual lines in an IDL block,
74-
// removing as much space as possible from the beginning of the page
75-
// while preserving indentation. Rules followed:
76-
// - Always trim the first line
77-
// - Remove whitespaces from the end of each line
78-
// - Replace lines that contain spaces with empty lines
79-
// - Drop same number of leading whitespaces from all other lines
80-
const trimIdlSpaces = idl => {
81-
const lines = idl.trim().split('\n');
82-
const toRemove = lines
83-
.slice(1)
84-
.filter(line => line.search(/\S/) > -1)
85-
.reduce(
86-
(min, line) => Math.min(min, line.search(/\S/)),
87-
Number.MAX_VALUE);
88-
return lines
89-
.map(line => {
90-
let firstRealChat = line.search(/\S/);
91-
if (firstRealChat === -1) {
92-
return '';
93-
}
94-
else if (firstRealChat === 0) {
95-
return line.replace(/\s+$/, '');
96-
}
97-
else {
98-
return line.substring(toRemove).replace(/\s+$/, '');
99-
}
100-
})
101-
.join('\n');
102-
};
103-
104-
// Detect the IDL index appendix if there's one (to exclude it)
105-
const idlEl = document.querySelector('#idl-index pre') ||
106-
document.querySelector('.chapter-idl pre'); // SVG 2 draft
107-
108-
let idl = [
73+
const idlSelectors = [
10974
'pre.idl:not(.exclude):not(.extract):not(#actual-idl-index)',
11075
'pre:not(.exclude):not(.extract) > code.idl-code:not(.exclude):not(.extract)',
11176
'pre:not(.exclude):not(.extract) > code.idl:not(.exclude):not(.extract)',
11277
'div.idl-code:not(.exclude):not(.extract) > pre:not(.exclude):not(.extract)',
11378
'pre.widl:not(.exclude):not(.extract)'
114-
]
115-
.map(sel => [...document.querySelectorAll(sel)])
116-
.reduce((res, elements) => res.concat(elements), [])
117-
.filter(el => el !== idlEl)
118-
.filter((el, idx, self) => self.indexOf(el) === idx)
119-
.filter(el => !el.closest(informativeSelector))
120-
.map(cloneAndClean)
121-
.map(el => trimIdlSpaces(el.textContent))
122-
.join('\n\n');
79+
];
12380

124-
return idl;
81+
const excludeSelectors = [
82+
'#idl-index',
83+
'.chapter-idl'
84+
];
85+
86+
const idlElements = getCodeElements(idlSelectors, { excludeSelectors });
87+
return idlElements
88+
.map(el => trimSpaces(el.textContent))
89+
.join('\n\n');
12590
}

src/browserlib/get-code-elements.mjs

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
import informativeSelector from './informative-selector.mjs';
2+
import cloneAndClean from './clone-and-clean.mjs';
3+
4+
/**
5+
* Helper function that returns a set of code elements in document order based
6+
* on a given set of selectors, excluding elements that are within an index.
7+
*
8+
* The function excludes elements defined in informative sections.
9+
*
10+
* The code elements are cloned and cleaned before they are returned to strip
11+
* annotations and other asides.
12+
*/
13+
export default function getCodeElements(codeSelectors, { excludeSelectors = [] }) {
14+
return [...document.querySelectorAll(codeSelectors.join(', '))]
15+
// Skip excluded and elements and those in informative content
16+
.filter(el => !el.closest(excludeSelectors.join(', ')))
17+
.filter(el => !el.closest(informativeSelector))
18+
19+
// Clone and clean the elements
20+
.map(cloneAndClean);
21+
}

src/browserlib/reffy.json

+4
Original file line numberDiff line numberDiff line change
@@ -62,5 +62,9 @@
6262
"href": "./extract-ids.mjs",
6363
"property": "ids",
6464
"needsIdToHeadingMap": true
65+
},
66+
{
67+
"href": "./extract-cddl.mjs",
68+
"property": "cddl"
6569
}
6670
]

src/browserlib/trim-spaces.mjs

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
/**
2+
* Helper function that trims individual lines in a code block, removing as
3+
* much space as possible from the beginning of the page while preserving
4+
* indentation.
5+
*
6+
* Typically useful for CDDL and IDL extracts
7+
*
8+
* Rules followed:
9+
* - Always trim the first line
10+
* - Remove whitespaces from the end of each line
11+
* - Replace lines that contain spaces with empty lines
12+
* - Drop same number of leading whitespaces from all other lines
13+
*/
14+
export default function trimSpaces(code) {
15+
const lines = code.trim().split('\n');
16+
const toRemove = lines
17+
.slice(1)
18+
.filter(line => line.search(/\S/) > -1)
19+
.reduce(
20+
(min, line) => Math.min(min, line.search(/\S/)),
21+
Number.MAX_VALUE);
22+
return lines
23+
.map(line => {
24+
let firstRealChar = line.search(/\S/);
25+
if (firstRealChar === -1) {
26+
return '';
27+
}
28+
else if (firstRealChar === 0) {
29+
return line.replace(/\s+$/, '');
30+
}
31+
else {
32+
return line.substring(toRemove).replace(/\s+$/, '');
33+
}
34+
})
35+
.join('\n');
36+
}

src/lib/specs-crawler.js

+29-1
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,29 @@ async function saveSpecResults(spec, settings) {
251251
return `css/${spec.shortname}.json`;
252252
};
253253

254+
async function saveCddl(spec) {
255+
let cddlHeader = `
256+
; GENERATED CONTENT - DO NOT EDIT
257+
; Content was automatically extracted by Reffy into webref
258+
; (https://github.com/w3c/webref)
259+
; Source: ${spec.title} (${spec.crawled})`;
260+
cddlHeader = cddlHeader.replace(/^\s+/gm, '').trim() + '\n\n';
261+
const res = [];
262+
for (const cddlModule of spec.cddl) {
263+
const cddl = cddlHeader + cddlModule.cddl + '\n';
264+
const filename = spec.shortname +
265+
(cddlModule.name ? `-${cddlModule.name}` : '') +
266+
'.cddl';
267+
await fs.promises.writeFile(
268+
path.join(folders.cddl, filename), cddl);
269+
res.push({
270+
name: cddlModule.name,
271+
file: `cddl/${filename}`
272+
});
273+
}
274+
return res;
275+
};
276+
254277
// Save IDL dumps
255278
if (spec.idl) {
256279
spec.idl = await saveIdl(spec);
@@ -283,9 +306,14 @@ async function saveSpecResults(spec, settings) {
283306
(typeof thing == 'object') && (Object.keys(thing).length === 0);
284307
}
285308

309+
// Save CDDL extracts (text files, multiple modules possible)
310+
if (!isEmpty(spec.cddl)) {
311+
spec.cddl = await saveCddl(spec);
312+
}
313+
286314
// Save all other extracts from crawling modules
287315
const remainingModules = modules.filter(mod =>
288-
!mod.metadata && mod.property !== 'css' && mod.property !== 'idl');
316+
!mod.metadata && !['cddl', 'css', 'idl'].includes(mod.property));
289317
for (const mod of remainingModules) {
290318
await saveExtract(spec, mod.property, spec => !isEmpty(spec[mod.property]));
291319
}

src/lib/util.js

+30
Original file line numberDiff line numberDiff line change
@@ -796,6 +796,36 @@ async function expandSpecResult(spec, baseFolder, properties) {
796796
return;
797797
}
798798

799+
// Treat CDDL extracts separately, one spec may have multiple CDDL
800+
// extracts (actual treatment is similar to IDL extracts otherwise)
801+
if (property === 'cddl') {
802+
if (!spec[property]) {
803+
return;
804+
}
805+
for (const cddlModule of spec[property]) {
806+
if (!cddlModule.file) {
807+
continue;
808+
}
809+
if (baseFolder.startsWith('https:')) {
810+
const url = (new URL(cddlModule.file, baseFolder)).toString();
811+
const response = await fetch(url, { nolog: true });
812+
contents = await response.text();
813+
}
814+
else {
815+
const filename = path.join(baseFolder, cddlModule.file);
816+
contents = await fs.readFile(filename, 'utf8');
817+
}
818+
if (contents.startsWith('; GENERATED CONTENT - DO NOT EDIT')) {
819+
// Normalize newlines to avoid off-by-one slices when we remove
820+
// the trailing newline that was added by saveCddl
821+
contents = contents.replace(/\r/g, '');
822+
const endOfHeader = contents.indexOf('\n\n');
823+
contents = contents.substring(endOfHeader + 2).slice(0, -1);
824+
}
825+
cddlModule.cddl = contents;
826+
}
827+
}
828+
799829
// Only consider properties that link to an extract, i.e. an IDL
800830
// or JSON file in subfolder.
801831
if (!spec[property] ||

tests/crawl-test.json

+3
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
},
2525
"title": "WOFF2",
2626
"algorithms": [],
27+
"cddl": [],
2728
"css": {
2829
"atrules": [],
2930
"properties": [],
@@ -99,6 +100,7 @@
99100
"title": "No Title",
100101
"generator": "respec",
101102
"algorithms": [],
103+
"cddl": [],
102104
"css": {
103105
"atrules": [],
104106
"properties": [],
@@ -224,6 +226,7 @@
224226
},
225227
"title": "[No title found for https://w3c.github.io/accelerometer/]",
226228
"algorithms": [],
229+
"cddl": [],
227230
"css": {
228231
"atrules": [],
229232
"properties": [],

0 commit comments

Comments
 (0)