Skip to content

Create an interleave element #2154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
joeytakeda opened this issue May 22, 2021 · 29 comments · May be fixed by #2538
Open

Create an interleave element #2154

joeytakeda opened this issue May 22, 2021 · 29 comments · May be fixed by #2538
Assignees
Labels
CouncilResponsibility Status: Pending pending action described in a comment, to return to discussion before further action will be taken TEI: ODD

Comments

@joeytakeda
Copy link
Contributor

joeytakeda commented May 22, 2021

Currently @preserveOrder on <sequence> is an optional attribute and the spec states: "if true, indicates that the order in which component elements of a sequence appear in a document must correspond to the order in which they are given in the content model." However, the default behaviour for <sequence> is that order is preserved unless @preserveOrder is false, but the current description makes it seem that the opposite is true (i.e. that order doesn't matter by default); plus, ​setting to false hasn't worked since at least 2017 per TEIC/Stylesheets#241.

Since @preserveOrder is implicitly true, I think that it should gain a default value of true so that it states explicitly what's already happening; I would also suggest that the description be changed to reflect that "true" is default and also outline what happens when its value is "false".

@martindholmes
Copy link
Contributor

martindholmes commented Jun 18, 2021

@sydb and I believe that:

  • The word sequence MEANS ordered, so we should dispense with @preserveOrder completely;
  • Instead, we should introduce a new <bag> element which would do the same job: in other words, specify that all child items must be present in the number(s) assigned through their @minOccurs and @maxOccurs attributes, but the order of them is not significant.

Advantages:

  • The current processing of <sequence> can be left alone, and its semantics will match its name.
  • Processing of <bag> can be implemented separately and will be easier to maintain.

Disadvantages:

  • There will be some overlap between <bag> and <alternate>, depending on how @minOccurs and @maxOccurs are configured.

@sydb
Copy link
Member

sydb commented Jun 18, 2021

Note that we chose the name <bag> based on the definitions used in the 2nd paragraph of 18.7 Collections as Complex Feature Values. (Yes, value collections are quite different from content models, but the underlying idea that a bag is an unordered collection with duplicates seems to match what we are doing, here.)

Thus:

    <alternate minOccurs="1" maxOccurs="1">
      <bag minOccurs="1" maxOccurs="1">
	<elementRef key="sic" minOccurs="1" maxOccurs="1"/>
	<elementRef key="corr" minOccurs="1" maxOccurs="1"/>
      </bag>
      <bag minOccurs="1" maxOccurs="1">
	<elementRef key="abbr" minOccurs="1" maxOccurs="1"/>
	<elementRef key="expan" minOccurs="1" maxOccurs="1"/>
      </bag>
      <bag minOccurs="1" maxOccurs="1">
	<elementRef key="orig" minOccurs="1" maxOccurs="1"/>
	<elementRef key="reg" minOccurs="1" maxOccurs="1"/>
      </bag>
    </alternate>

would be a perfectly reasonable customized content model for <choice>.
The following content model requires that there be 2 <three> elements.

    <bag>
      <elementRef key="one"/>
      <elementRef key="two"/>
      <elementRef key="five"/>
      <elementRef key="three"/>
      <elementRef key="three"/>
    </bag>

and thus is precisely equivalent to

    <bag>
      <elementRef key="one" minOccurs="1" maxOccurs="1"/>
      <elementRef key="two" minOccurs="1" maxOccurs="1"/>
      <elementRef key="five" minOccurs="1" maxOccurs="1"/>
      <elementRef key="three" minOccurs="2" maxOccurs="2"/>
    </bag>

What remains to be seen is how well this maps to the RELAX NG <interleave> (aka, the & connector). My instinct is that it maps perfectly well when all the children of <bag> are <elementRef>s, but may become problematic when there are child <classRef>s or <macroRef>s.

@ebeshero
Copy link
Member

ebeshero commented Jun 18, 2021

Hmmm. My Relax NG antennae are a little skittish here. Can’t <alternate> handle this successfully if we just get rid of @preserveOrder? I’m wondering if a new element is really necessary. @sydb @martindholmes

@martindholmes
Copy link
Contributor

@ebeshero <alternate> has a really confusing name, because it's both a noun and a verb; the noun better describes its function (either/or). I would love to replace <alternate> with <choice>, or <either>.

@sydb
Copy link
Member

sydb commented Jun 18, 2021

Right. Which is to say that <alternate> means “only one of my children” and <bag> means “one of each of my children”. So a <bag> with only one child would mean exactly the same as an <alternate> with the same child. (But at least in the case of <alternate> that would be invalid: it must have 2 or more children.)

So the semantics are different enough that a fully fledged schema language should have both capabilities. And we have tried to provide them with @preserveOrder, but as @joeytakeda has pointed out, we didn’t quite get it right, and as @martindholmes and I point out, the naming is screwy, anyway.

We do not actually use the “one of each of my children” semantics in the Guidelines at all. (In large part because XML DTDs don’t have this concept. (SGML DTDs did.)) But we should provide the capability to ODD writers. (We thought it was necessary when we added @preserveOrder.)

@ebeshero
Copy link
Member

Thanks for explaining the distinction! I was thinking of <bag> as allowing zero or more of a set of child elements in any order, and imagining that could be done with <alternate>.

I like the flexibility and constraint of this <bag> idea, if the default is as you say one of each of these. I think an ODD customization might also want to revise and make some of the members of the bag optional with @minoccurs set to zero.

@lb42
Copy link
Member

lb42 commented Jun 20, 2021

I am cautiously positive about the proposal to introduce <bag> and remove @preserveOrder. A wrinkle that needs to be resolved however is whether or not the bagginess goes all the way down, i.e. is

<bag>
<elementRef key="one" maxOccurs="1"/>
<elementRef key="two" maxOccurs="2"/>
</bag>

satisfied by both of the following, or by only the first?

<two/><two/><one/>
<two/><one/><two/>

@martindholmes
Copy link
Contributor

@lb42 I would say both are OK. My convoluted logic is that if you specify minOccurs = "0" on something, then you can have the absence of that element anywhere in the container. :-) So if you have two of something, either can be anywhere. @sydb, @joeytakeda?

@lb42
Copy link
Member

lb42 commented Jun 20, 2021

Arguing from the case where absence is permissible anywhere doesn't really persuade me. Absence is not the same as presence: for one thing, two absences are not distinguishable from one!
However, looking up the definitions of bags and sets at https://cs.appstate.edu/~dap/classes/1100/sect2_2.html I learn that "Two bags A and B are equal if the number of occurrences of each element in A is the same as in B."
[a, b, c, c] = [b, c, a, c] = [c, c, b, a]
So that supports your interpretation. I don't like it, but I'll have to accept it!

@sydb
Copy link
Member

sydb commented Jun 22, 2021

Thank you for that research, @lb42. We don’t have to accept that definition, of course. We could decide that a new “one of each” element that has a child with @maxOccurs > 1 requires they (the elements that satisfy those occurrences) be adjacent. But we shouldn’t call it (the new “one of each” element) a <bag>, then.

However, the obvious representation of our new toy in RELAX NG is <interleave>. Thus I imagine the RNC that corresponds to

<bag>
  <elementRef key="one" minOccurs="1" maxOccurs="1"/>
  <elementRef key="two" minOccurs="1" maxOccurs="2"/>
</bag>

would be ( one & ( two, two? ) ), which permits both <two/><two/><one/> and <two/><one/><two/>.

@lb42
Copy link
Member

lb42 commented Jun 22, 2021

This reminds me that another possible name for this new element might be <interleave> of course. Which is what was originally proposed back in 2015 or thereabouts. (a search through the Council list archives for the word "interleave" is quite instructive)

@hcayless
Copy link
Member

Commenting to bump this back to people's attention, and also because:

I've been looking into this a bit, and I'm not sure "bag" is a safe name for what's going on. RelaxNG <interleave> seems more like a set (no order, no duplicate children). You can't have duplicate element defs/refs in an RNG interleave (though you can use, e.g., <zeroOrMore> patterns, so that a content model using it would permit multiple children with the same element name).

"Bag" (to my mind) implies "one or more of each child pattern in any order," which is not what RNG interleave does. It means (I think) "one of each child pattern, in any order, and patterns may not overlap". "Set" might be ok, though perhaps it risks causing confusion. Or, as @lb42 suggests, just use "interleave".

One other wrinkle that occurs to me: checking the validity of the content of this thing, whatever we call it, will entail expanding references, since you have to ensure there's no overlap.

@ebeshero
Copy link
Member

ebeshero commented Mar 5, 2023

After some discussion on the TEI list (introduced by Daniel Schopper here: https://listserv.brown.edu/cgi-bin/wa?A2=ind2303&L=TEI-L&P=741 and continuing) we're liking idea of introducing <interleave>.

@hcayless
Copy link
Member

Since this isn't likely to be completed in time for the next release, as a band aid I'm going to fix the description on @preserveOrder to read:

if false, indicates that component elements of a sequence may occur in any order.

hcayless added a commit that referenced this issue Mar 21, 2023
@ebeshero
Copy link
Member

ebeshero commented May 8, 2023

Council decides on 2023-05-08 F2f that we should:

  • Rename this issue to say we should create interleave element (as per Create an interleave element #2154)
  • Add a Stylesheets issue for converting tei:interleave to rng:interleave
  • Deprecate <tei:sequence preserveOrder="false">
    Allow any of msContents/physDesc/history/additional in any order any number of times, but then create a Schematron Warning to suggest any of these only used once.

@ebeshero ebeshero changed the title Description for sequence/@preserveOrder is misleading Create an interleave element: Description for sequence/@preserveOrder is misleading May 8, 2023
@ebeshero ebeshero changed the title Create an interleave element: Description for sequence/@preserveOrder is misleading Create an interleave element May 8, 2023
@trishaoconnor
Copy link
Contributor

Created stylesheet issue for converting tei:interleave to rng:interleave: [(https://github.com/TEIC/Stylesheets/issues/609)].

joeytakeda added a commit that referenced this issue Mar 17, 2024
* New elementSpec and updated TD
* Still (possibly) need to address hcayless' point about validity of the construct (i.e. no overlap)
@joeytakeda
Copy link
Contributor Author

To summarize work so far:

I haven't yet addressed @hcayless 's point about validation though. My inclination is that this is a job for the ODD processor (though we could add some schematron now to catch simple cases), but we should probably take a look at the RelaxNG spec for some guidance as to what we can catch early on.

@ebeshero ebeshero added this to the Guidelines 4.9.0 milestone Jul 6, 2024
@raffazizzi raffazizzi added Status: Pending pending action described in a comment, to return to discussion before further action will be taken and removed Status: Needs Discussion Status: Go labels Sep 24, 2024
@ebeshero
Copy link
Member

ebeshero commented May 21, 2025

@joeytakeda @sydb @martindholmes and all: I'm consulting with @djbpitt who is currently working on a Balisage paper on interleave complexities. From our conversation and our modeling of interleave in simple RelaxNG just now, we are concerned whether "interleave" is really and truly the name we want for this element, if our simple goal is just to permit unordered content.

What if one or more of the components of interleave requires a strict order of its contents? Interleave permits a partial ordering, and that can be useful:

Say we wanted to interleave ptr and milestone elements, keeping them in a strict order, around opener, p+, closer elements that we expect in an order. That would be a partially ordered context for interleave. Here's an example that's just inspired by the TEI, and I'm just writing it in RelaxNG compact syntax for simplicity:

start = \div
\div = element div {letterContent & annotation}
letterContent = opener, p+, closer
opener = element opener {text}
p = element p {text}
closer = element closer {text}

annotation = ptr, milestone
ptr = element ptr {target, empty}
milestone = element milestone {xmlid, empty}
target = attribute target {xsd:anyURI}
xmlid = attribute xml:id {xsd:ID}

With this, you can see how interleave is handling contents that are, themselves ordered. I wasn't fully aware of this before we sat down and looked at it, but the result of interleaving these ordered things allows them to mesh, so long as we preserve the ordered sequence of each component of interleave. So with interleave we'd allow something like:

<div>
      <opener> ..... </opener>
      <ptr target="#landingPoint"/>
     <p>......</p>
      <milestone xml:id="landingPoint"/>
     <p>....</p>
      <closer>....</closer>
</div>

AND

<div>
      <opener> ..... </opener>
     <p>......</p>
      <ptr target="#landingPoint"/>
     <p>....</p>
     <p>....</p>
      <closer>...</closer>
  <milestone xml:id="landingPoint"/>
</div>

(And lots of other variations of that, so long as the<ptr> element precedes the <milestone>, and as long as <opener> precedes <p> elements and <closer>. The patterns mesh or interleave so long as they don't disturb the mandated sequence of the two component structures.)

So my question for us: Is this what we want to allow for in the TEI with our new <interleave> element? If the answer is, absolutely yes, then huzzah! (As I'm talking to @djbpitt, I do think that expressive capacity is likely useful, so we really do want <interleave>.) But it seems prudent to make sure we're prepared for unintended consequences / complexities.

Are we prepared for handling the possibility of ordered sequences within interleave?
Or is the only thing we want from this is just the ability to express an unordered content model (with no intermingling of internal sequenced stuff)?

@ebeshero
Copy link
Member

Here's my interleave Relax NG and XML file experiment in a zipped folder in case useful.

interleaveExperiment.zip

@djbpitt
Copy link

djbpitt commented May 21, 2025

As coincidence would have it, at Balisage 2025, Ronald Haentjens Dekker and I will be presenting, together with two other colleagues, on, among other things, Interleave. We are all probably familiar with the ability of the Relax NG Interleave operator (&) to represent unordered content, but what is less widely known is that it actually represents partially-ordered content, of which unordered content is sort of a simplified subcategory.

If the TEI wants to support partially ordered content, Interleave is a useful term because it is already established (even if poorly understood) in that meaning within Relax NG. If the TEI wants to support unordered but not truly interleaved content, it might be clearer to call it unordered. For what it's worth, Elisa mentioned that the term "bag" had been discussed, and that's well established in computational linguistics for unordered collections that may include repetitions.

The (non-TEI) example in the current version of my Balisage paper, which was the starting point for Elisa's example above, is:

<root>
  <section>
    <header>Hi, I'm a header!</header>
    <annotation-start id="0001"/>
    <p>Hi</p>
    <ab>Bye</ab>
    <ab>Hi again</ab>
    <annotation-end id="0001"/>
    <p>Bye again</p>
  </section>
</root>
start = root
root = element root { section+ }
section =
  element section {
    (header, (p | ab)+)
    & annotation-pair*
  }
header = element header { text }
ab = element ab { text }
p = element p { text }
annotation-pair = annotation-start, annotation-end
annotation-start = element annotation-start { id, empty }
annotation-end = element annotation-end { id, empty }
id = attribute id { text }

This has some limitations, but the essential detail is that the header must come first in the first argument to the interleave operator and the start and end annotation tags must occur in ordered pairs. This, then, is partially ordered, rather than unordered, content. I hope this example is helpful.

@lb42
Copy link
Member

lb42 commented May 21, 2025

+1 for true interleave from me.

@martindholmes
Copy link
Contributor

@ebeshero This suggests to me that our original idea for <bag> was actually more straightforward. I believe that what the community actually wants is an element allowing for any of the child elements in any order; if larger patterns which have internal order are nested within this structure, then all sorts of complexities arise, as @hcayless and others have pointed out. For one thing, the nested patterns may themselves include elements that are also siblings to the pattern, and that would surely give rise to indeterminate content models.

So on balance, I would prefer to retreat to a proposal for an element <bag> whose only permitted children are <elementRef>s, allowing us to have unordered content models with occurrence constraints but without the pain of trying to figure out what nested patterns could mean or how they could be processed. Using <bag> instead of <interleave> would avoid causing confusion with the RNG <interleave> element (even if we would actually be rendering TEI <bag>s to RNG <interleave>s.

@djbpitt
Copy link

djbpitt commented May 21, 2025

For what it's worth, one constraint of Relax NG Interleave is that the components of two Interleave operands must be distinct. That is:

(x, y, z) & (a, b, z)

is not permitted.

@martindholmes
Copy link
Contributor

@djbpitt exactly; it's easy to imagine that a change in P5 that added an element to a macro would have unpredictable effects for ODD-writers who had used those macros in interleave contexts.

@hcayless
Copy link
Member

I will reiterate my objection that if this thing will get turned into an <interleave>, then it is not a bag.

@martindholmes
Copy link
Contributor

So, let me summarize where we appear to be:

  1. The original request was for some mechanism (other than the less-than-ideal @preserveOrder) to easily create a content model, or a component of one, which includes elements in any order, but with occurrence constraints where required.
  2. We originally specified this as <bag>, but later switched to the idea of <interleave>, even though our intention for this element may be distinct from what RELAX NG means by <interleave>.
  3. @lb42 would like us to have true interleave as in RELAX NG, but I think this may be quite complicated to check and process, because as @hcayless points out, all patterns have to be expanded in order to know whether they overlap or not.
  4. There are (or at least there used to be) problems converting RELAX NG schemas using <interleave> to XSD.

I think that if our <interleave> is not parallel to the RNG <interleave>, then we shouldn't use that name; and I also think that we should try to avoid generating RNG patterns which can't be converted to XSD.

@sydb
Copy link
Member

sydb commented May 21, 2025

[Using the nomenclature <interleave> for an element that supports partial ordering as per RELAX NG and <bag> for an element that supports unordered content as per the SGML ‘&’ connector.]

one

IIRC, the construct we are replacing (preserveOrder="false") used <interleave>. So doing something else is a bit problematic.

two

Something I think I failed to mention above:¹ Since RELAX NG does not have a mechanism for “each of my children (in any order without interleaving)”, we do not have a mechanism to produce a schema that would enforce a <bag>. It is easy enough to do for a small number of tokens (in either DTD or RELAX NG, and probably XSD):

( a & b ) = ( ( a, b ) | ( b, a ) )

That was easy, only 2 alternatives. Three tokens is not particularly hard:

( a & b & c ) = ( 
                  ( a, b, c )
                | ( a, c, b )
                | ( b, a, c )
                | ( b, c, a )
                | ( c, a, b )
                | ( c, b, a )
                )

That is not too bad either, you can verify by eye that it is correct. But an expression with five tokens is somewhat more cumbersome:

( a & b & c & d & e ) = ( 
                        | ( a, b, c, d, e )
                        | ( a, b, c, e, d )
                        | ( a, b, d, c, e )
                        | ( a, b, d, e, c )
                        | ( a, b, e, c, d )
                        | ( a, b, e, d, c )

                        | ( a, c, b, d, e )
                        | ( a, c, b, e, d )
                        | ( a, c, d, b, e )
                        | ( a, c, d, e, b )
                        | ( a, c, e, b, d )
                        | ( a, c, e, d, b )

                        | ( a, d, c, b, e )
                        | ( a, d, c, e, b )
                        | ( a, d, b, c, e )
                        | ( a, d, b, e, c )
                        | ( a, d, e, c, b )
                        | ( a, d, e, b, c )

                        | ( a, e, c, d, b )
                        | ( a, e, c, b, d )
                        | ( a, e, d, c, b )
                        | ( a, e, d, b, c )
                        | ( a, e, b, c, d )
                        | ( a, e, b, d, c )

                        | ( b, a, c, d, e )
                        | ( b, a, c, e, d )
                        | ( b, a, d, c, e )
                        | ( b, a, d, e, c )
                        | ( b, a, e, c, d )
                        | ( b, a, e, d, c )

                        | ( b, c, a, d, e )
                        | ( b, c, a, e, d )
                        | ( b, c, d, a, e )
                        | ( b, c, d, e, a )
                        | ( b, c, e, a, d )
                        | ( b, c, e, d, a )

                        | ( b, d, c, a, e )
                        | ( b, d, c, e, a )
                        | ( b, d, a, c, e )
                        | ( b, d, a, e, c )
                        | ( b, d, e, c, a )
                        | ( b, d, e, a, c )

                        | ( b, e, c, d, a )
                        | ( b, e, c, a, d )
                        | ( b, e, d, c, a )
                        | ( b, e, d, a, c )
                        | ( b, e, a, c, d )
                        | ( b, e, a, d, c )

                        | ( c, b, a, d, e )
                        | ( c, b, a, e, d )
                        | ( c, b, d, a, e )
                        | ( c, b, d, e, a )
                        | ( c, b, e, a, d )
                        | ( c, b, e, d, a )

                        | ( c, a, b, d, e )
                        | ( c, a, b, e, d )
                        | ( c, a, d, b, e )
                        | ( c, a, d, e, b )
                        | ( c, a, e, b, d )
                        | ( c, a, e, d, b )

                        | ( c, d, a, b, e )
                        | ( c, d, a, e, b )
                        | ( c, d, b, a, e )
                        | ( c, d, b, e, a )
                        | ( c, d, e, a, b )
                        | ( c, d, e, b, a )

                        | ( c, e, a, d, b )
                        | ( c, e, a, b, d )
                        | ( c, e, d, a, b )
                        | ( c, e, d, b, a )
                        | ( c, e, b, a, d )
                        | ( c, e, b, d, a )

                        | ( d, b, c, a, e )
                        | ( d, b, c, e, a )
                        | ( d, b, a, c, e )
                        | ( d, b, a, e, c )
                        | ( d, b, e, c, a )
                        | ( d, b, e, a, c )

                        | ( d, c, b, a, e )
                        | ( d, c, b, e, a )
                        | ( d, c, a, b, e )
                        | ( d, c, a, e, b )
                        | ( d, c, e, b, a )
                        | ( d, c, e, a, b )

                        | ( d, a, c, b, e )
                        | ( d, a, c, e, b )
                        | ( d, a, b, c, e )
                        | ( d, a, b, e, c )
                        | ( d, a, e, c, b )
                        | ( d, a, e, b, c )

                        | ( d, e, c, a, b )
                        | ( d, e, c, b, a )
                        | ( d, e, a, c, b )
                        | ( d, e, a, b, c )
                        | ( d, e, b, c, a )
                        | ( d, e, b, a, c )

                        | ( e, b, c, d, a )
                        | ( e, b, c, a, d )
                        | ( e, b, d, c, a )
                        | ( e, b, d, a, c )
                        | ( e, b, a, c, d )
                        | ( e, b, a, d, c )

                        | ( e, c, b, d, a )
                        | ( e, c, b, a, d )
                        | ( e, c, d, b, a )
                        | ( e, c, d, a, b )
                        | ( e, c, a, b, d )
                        | ( e, c, a, d, b )

                        | ( e, d, c, b, a )
                        | ( e, d, c, a, b )
                        | ( e, d, b, c, a )
                        | ( e, d, b, a, c )
                        | ( e, d, a, c, b )
                        | ( e, d, a, b, c )

                        | ( e, a, c, d, b )
                        | ( e, a, c, b, d )
                        | ( e, a, d, c, b )
                        | ( e, a, d, b, c )
                        | ( e, a, b, c, d )
                        | ( e, a, b, d, c )

                        )

That requires 120 alternatives, which gets quite tedious to proofread. But that is not the problem. The problem is the number of alternatives goes up factorially. (As evidence, note the number of alternatives in each of the above: 2! = 4, 3! = 6, 5! = 120.) Thus if you had 10 tokens there would be 3,628,800 alternatives. Nothing can check that manually, but one can imagine it would still work. But an expression of 15 tokens would require 1,307,674,368,000 alternatives, i.e. 1.3 trillion. So you cannot even fit the expression in memory when you have a few hundred terabytes (or tebibytes) of RAM.

Of course, there are other ways we could conceive of expressing unordered content. E.g.

<tei:bag>
  <tei:elementRef key="a"/>
  <tei:elementRef key="b"/>
  <tei:elementRef key="c"/>
  <tei:elementRef key="d"/>
  <tei:elementRef key="e"/>
</tei:bag>

might resolve down to

( ( a | b | c | d | e ),  ( a | b | c | d | e ),  ( a | b | c | d | e ),  ( a | b | c | d | e ),  ( a | b | c | d | e ) )

plus

<sch:assert test="count( tei:a eq 1 )
                  and
                  count( tei:b eq 1 )
                  and
                  count( tei:c eq 1 )
                  and
                  count( tei:d eq 1 )
                  and
                  count( tei:e eq 1 )
                  ">A <sch:value-of select="name(.)"/> element must have only 1 each
                   of &lt;a>, &lt;b>, &lt;c>, &lt;d>, and &lt;e> children (in any order).
</sch:assert>

Or it could all be done in Schematron (by adding count( tei:* eq 5 )).

conclusion

I think the arguments in favor of using <tei:interleave>² to boil down to <rng:interleave> are:

  1. What we are doing is deciding on a new way to express @preserveOrder=false, which (before we started messing with it) boiled down to <rng:interleave>.
  2. There is no way to express unordered required content in DTD or RELAX NG (and I don’t think XSD, but not sure). We would have to use Schematron or write our own validator.
  3. To my knowledge, no user has ever asked for it. Yes, we can imagine cases where it might be useful, or, more precisely, where the partial ordering of <interleave> is not precisely what is desired. But I have never actually tried to write a real (rather than test) schema that cared. As (admittedly somewhat outdated) evidence, consider this quote from Eric van der Vlist’s book, section 6.10: “In theory, it's possible to define a pattern with a meaning of "unordered group" that doesn't interleave child nodes and keeps groups unaltered. ¶ This pattern doesn't exist in RELAX NG for two reasons. First, it helps keep the language as simple as possible. Also, although it is built on top of an abstract mathematical model, RELAX NG is also built on top of the experience of its authors who have wanted to focus on general usages and best practices amongst the XML community. The lack of a "unordered group with no interleaving" hasn't been reported as a real-world limitation so far.”

notes

¹ Or at least, failed to mention here on this ticket; I have mentioned this on other tickets, and I may have said it aloud at a meeting, I do not recall — it has been years.
² We could consider some other name, but that would be re-opening a debate we have already had.

@sydb
Copy link
Member

sydb commented May 22, 2025

So, let me summarize where we appear to be:

Your summary does not match my recollections, partially supported by re-reading the threads on this issue.

  1. [Original request] Right, I think.
  2. [later switched to the idea of <interleave>, even though our intention …] No, I do not think we ever had any intention that a <tei:interleave> would have semantics different from an <rng:interleave>.
  3. [<rng:interleave> will be hard to check] Yes, it will be hard, if not impossible to check. I am not sure that should slow us down at all. (Are there not lots of ways a user can generate an invalid RNG schema from PureODD?)
  4. Not sure about <rng:interleave> → XSD, but there is certainly no <rng:interleave> → DTD.

I think that if our <interleave> is not parallel to the RNG <interleave>, then we shouldn't use that name;

I agree completely.

we should try to avoid generating RNG patterns which can't be converted to XSD.

I disagree completely. (Although I do think that for those PureODD constructs that cannot be enforced by an XSD, whatever does happen should be thoroughly and loudly documented. And yes, this may have to be vague in the Guidelines but should be very precise in the documentation of the Stylesheets.)

@ebeshero
Copy link
Member

ebeshero commented May 22, 2025

I think I persuaded myself that RNG interleave is helpful for TEI use cases to allow for certain kinds of encoding that can be meshed while still preserving required sequences—because it allows them to be distributed across content models. See the files attached earlier in this thread for the kinds of uses that might become realistic examples.

But RNG interleave wasn't quite what I expected, and if people try interleaving patterns rather than just elements, they may find themselves surprised unless we provide them good guidance and examples of what happens with validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CouncilResponsibility Status: Pending pending action described in a comment, to return to discussion before further action will be taken TEI: ODD
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants