-
Notifications
You must be signed in to change notification settings - Fork 91
Create an interleave element #2154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@sydb and I believe that:
Advantages:
Disadvantages:
|
Note that we chose the name Thus: <alternate minOccurs="1" maxOccurs="1">
<bag minOccurs="1" maxOccurs="1">
<elementRef key="sic" minOccurs="1" maxOccurs="1"/>
<elementRef key="corr" minOccurs="1" maxOccurs="1"/>
</bag>
<bag minOccurs="1" maxOccurs="1">
<elementRef key="abbr" minOccurs="1" maxOccurs="1"/>
<elementRef key="expan" minOccurs="1" maxOccurs="1"/>
</bag>
<bag minOccurs="1" maxOccurs="1">
<elementRef key="orig" minOccurs="1" maxOccurs="1"/>
<elementRef key="reg" minOccurs="1" maxOccurs="1"/>
</bag>
</alternate> would be a perfectly reasonable customized content model for <bag>
<elementRef key="one"/>
<elementRef key="two"/>
<elementRef key="five"/>
<elementRef key="three"/>
<elementRef key="three"/>
</bag> and thus is precisely equivalent to <bag>
<elementRef key="one" minOccurs="1" maxOccurs="1"/>
<elementRef key="two" minOccurs="1" maxOccurs="1"/>
<elementRef key="five" minOccurs="1" maxOccurs="1"/>
<elementRef key="three" minOccurs="2" maxOccurs="2"/>
</bag> What remains to be seen is how well this maps to the RELAX NG |
Hmmm. My Relax NG antennae are a little skittish here. Can’t |
@ebeshero |
Right. Which is to say that So the semantics are different enough that a fully fledged schema language should have both capabilities. And we have tried to provide them with We do not actually use the “one of each of my children” semantics in the Guidelines at all. (In large part because XML DTDs don’t have this concept. (SGML DTDs did.)) But we should provide the capability to ODD writers. (We thought it was necessary when we added |
Thanks for explaining the distinction! I was thinking of I like the flexibility and constraint of this |
I am cautiously positive about the proposal to introduce
satisfied by both of the following, or by only the first?
|
@lb42 I would say both are OK. My convoluted logic is that if you specify minOccurs = "0" on something, then you can have the absence of that element anywhere in the container. :-) So if you have two of something, either can be anywhere. @sydb, @joeytakeda? |
Arguing from the case where absence is permissible anywhere doesn't really persuade me. Absence is not the same as presence: for one thing, two absences are not distinguishable from one! |
Thank you for that research, @lb42. We don’t have to accept that definition, of course. We could decide that a new “one of each” element that has a child with However, the obvious representation of our new toy in RELAX NG is <bag>
<elementRef key="one" minOccurs="1" maxOccurs="1"/>
<elementRef key="two" minOccurs="1" maxOccurs="2"/>
</bag> would be |
This reminds me that another possible name for this new element might be |
Commenting to bump this back to people's attention, and also because: I've been looking into this a bit, and I'm not sure "bag" is a safe name for what's going on. RelaxNG "Bag" (to my mind) implies "one or more of each child pattern in any order," which is not what RNG interleave does. It means (I think) "one of each child pattern, in any order, and patterns may not overlap". "Set" might be ok, though perhaps it risks causing confusion. Or, as @lb42 suggests, just use "interleave". One other wrinkle that occurs to me: checking the validity of the content of this thing, whatever we call it, will entail expanding references, since you have to ensure there's no overlap. |
After some discussion on the TEI list (introduced by Daniel Schopper here: https://listserv.brown.edu/cgi-bin/wa?A2=ind2303&L=TEI-L&P=741 and continuing) we're liking idea of introducing |
Since this isn't likely to be completed in time for the next release, as a band aid I'm going to fix the description on
|
Council decides on 2023-05-08 F2f that we should:
|
sequence/@preserveOrder
is misleadingsequence/@preserveOrder
is misleading
sequence/@preserveOrder
is misleading
Created stylesheet issue for converting tei:interleave to rng:interleave: [(https://github.com/TEIC/Stylesheets/issues/609)]. |
* New elementSpec and updated TD * Still (possibly) need to address hcayless' point about validity of the construct (i.e. no overlap)
To summarize work so far:
I haven't yet addressed @hcayless 's point about validation though. My inclination is that this is a job for the ODD processor (though we could add some schematron now to catch simple cases), but we should probably take a look at the RelaxNG spec for some guidance as to what we can catch early on. |
@joeytakeda @sydb @martindholmes and all: I'm consulting with @djbpitt who is currently working on a Balisage paper on interleave complexities. From our conversation and our modeling of interleave in simple RelaxNG just now, we are concerned whether "interleave" is really and truly the name we want for this element, if our simple goal is just to permit unordered content. What if one or more of the components of interleave requires a strict order of its contents? Interleave permits a partial ordering, and that can be useful: Say we wanted to interleave ptr and milestone elements, keeping them in a strict order, around opener, p+, closer elements that we expect in an order. That would be a partially ordered context for interleave. Here's an example that's just inspired by the TEI, and I'm just writing it in RelaxNG compact syntax for simplicity:
With this, you can see how interleave is handling contents that are, themselves ordered. I wasn't fully aware of this before we sat down and looked at it, but the result of interleaving these ordered things allows them to mesh, so long as we preserve the ordered sequence of each component of interleave. So with interleave we'd allow something like: <div>
<opener> ..... </opener>
<ptr target="#landingPoint"/>
<p>......</p>
<milestone xml:id="landingPoint"/>
<p>....</p>
<closer>....</closer>
</div> AND <div>
<opener> ..... </opener>
<p>......</p>
<ptr target="#landingPoint"/>
<p>....</p>
<p>....</p>
<closer>...</closer>
<milestone xml:id="landingPoint"/>
</div> (And lots of other variations of that, so long as the So my question for us: Is this what we want to allow for in the TEI with our new Are we prepared for handling the possibility of ordered sequences within interleave? |
Here's my interleave Relax NG and XML file experiment in a zipped folder in case useful. |
As coincidence would have it, at Balisage 2025, Ronald Haentjens Dekker and I will be presenting, together with two other colleagues, on, among other things, Interleave. We are all probably familiar with the ability of the Relax NG Interleave operator ( If the TEI wants to support partially ordered content, Interleave is a useful term because it is already established (even if poorly understood) in that meaning within Relax NG. If the TEI wants to support unordered but not truly interleaved content, it might be clearer to call it unordered. For what it's worth, Elisa mentioned that the term "bag" had been discussed, and that's well established in computational linguistics for unordered collections that may include repetitions. The (non-TEI) example in the current version of my Balisage paper, which was the starting point for Elisa's example above, is: <root>
<section>
<header>Hi, I'm a header!</header>
<annotation-start id="0001"/>
<p>Hi</p>
<ab>Bye</ab>
<ab>Hi again</ab>
<annotation-end id="0001"/>
<p>Bye again</p>
</section>
</root>
This has some limitations, but the essential detail is that the header must come first in the first argument to the interleave operator and the start and end annotation tags must occur in ordered pairs. This, then, is partially ordered, rather than unordered, content. I hope this example is helpful. |
+1 for true interleave from me. |
@ebeshero This suggests to me that our original idea for So on balance, I would prefer to retreat to a proposal for an element |
For what it's worth, one constraint of Relax NG Interleave is that the components of two Interleave operands must be distinct. That is:
is not permitted. |
@djbpitt exactly; it's easy to imagine that a change in P5 that added an element to a macro would have unpredictable effects for ODD-writers who had used those macros in interleave contexts. |
I will reiterate my objection that if this thing will get turned into an |
So, let me summarize where we appear to be:
I think that if our |
[Using the nomenclature oneIIRC, the construct we are replacing ( twoSomething I think I failed to mention above:¹ Since RELAX NG does not have a mechanism for “each of my children (in any order without interleaving)”, we do not have a mechanism to produce a schema that would enforce a
That was easy, only 2 alternatives. Three tokens is not particularly hard:
That is not too bad either, you can verify by eye that it is correct. But an expression with five tokens is somewhat more cumbersome:
That requires 120 alternatives, which gets quite tedious to proofread. But that is not the problem. The problem is the number of alternatives goes up factorially. (As evidence, note the number of alternatives in each of the above: 2! = 4, 3! = 6, 5! = 120.) Thus if you had 10 tokens there would be 3,628,800 alternatives. Nothing can check that manually, but one can imagine it would still work. But an expression of 15 tokens would require 1,307,674,368,000 alternatives, i.e. 1.3 trillion. So you cannot even fit the expression in memory when you have a few hundred terabytes (or tebibytes) of RAM. Of course, there are other ways we could conceive of expressing unordered content. E.g. <tei:bag>
<tei:elementRef key="a"/>
<tei:elementRef key="b"/>
<tei:elementRef key="c"/>
<tei:elementRef key="d"/>
<tei:elementRef key="e"/>
</tei:bag> might resolve down to
plus <sch:assert test="count( tei:a eq 1 )
and
count( tei:b eq 1 )
and
count( tei:c eq 1 )
and
count( tei:d eq 1 )
and
count( tei:e eq 1 )
">A <sch:value-of select="name(.)"/> element must have only 1 each
of <a>, <b>, <c>, <d>, and <e> children (in any order).
</sch:assert> Or it could all be done in Schematron (by adding conclusionI think the arguments in favor of using
notes¹ Or at least, failed to mention here on this ticket; I have mentioned this on other tickets, and I may have said it aloud at a meeting, I do not recall — it has been years. |
Your summary does not match my recollections, partially supported by re-reading the threads on this issue.
I agree completely.
I disagree completely. (Although I do think that for those PureODD constructs that cannot be enforced by an XSD, whatever does happen should be thoroughly and loudly documented. And yes, this may have to be vague in the Guidelines but should be very precise in the documentation of the Stylesheets.) |
I think I persuaded myself that RNG interleave is helpful for TEI use cases to allow for certain kinds of encoding that can be meshed while still preserving required sequences—because it allows them to be distributed across content models. See the files attached earlier in this thread for the kinds of uses that might become realistic examples. But RNG interleave wasn't quite what I expected, and if people try interleaving patterns rather than just elements, they may find themselves surprised unless we provide them good guidance and examples of what happens with validation. |
Uh oh!
There was an error while loading. Please reload this page.
Currently
@preserveOrder
on<sequence>
is an optional attribute and the spec states: "if true, indicates that the order in which component elements of a sequence appear in a document must correspond to the order in which they are given in the content model." However, the default behaviour for<sequence>
is that order is preserved unless@preserveOrder
is false, but the current description makes it seem that the opposite is true (i.e. that order doesn't matter by default); plus, setting tofalse
hasn't worked since at least 2017 per TEIC/Stylesheets#241.Since
@preserveOrder
is implicitly true, I think that it should gain a default value of true so that it states explicitly what's already happening; I would also suggest that the description be changed to reflect that "true" is default and also outline what happens when its value is "false".The text was updated successfully, but these errors were encountered: