Skip to content

status of polyglot html/xml #17

@ThomasWaldmann

Description

@ThomasWaldmann

Current Status of HTML5 Polyglot Markup (from Gemini AI)

In 2026, the concept of Polyglot Markup—documents that simultaneously satisfy both the HTML5 and XML (XHTML) specifications—is considered a retired technical curiosity. While it remains technically possible to write polyglot code, it is no longer an active standard or a recommended practice for modern web development.


1. Official Standard Status

The official W3C specification, Polyglot Markup: A robust profile of the HTML5 vocabulary, was officially retired on March 27, 2018.

  • No Active Development: The W3C and WHATWG no longer maintain guidelines for polyglot documents.
  • The Living Standard: The industry now follows the WHATWG HTML Living Standard, which treats the HTML syntax and the XML (XHTML) syntax as two distinct serializations. The goal of a "unified" syntax has been abandoned in favor of choosing the serialization that fits your specific toolchain.

2. Comparison of Syntax Constraints

Maintaining a polyglot document requires adhering to the "lowest common denominator" of both specs. This creates a very restrictive environment:

Feature Polyglot Requirement (Strict XML) Modern HTML5 (Living Standard)
Self-closing tags Must use <br /> or <hr />. <br> is standard; the / is ignored.
Attribute Minification Forbidden. Must use checked="checked". checked is preferred and valid.
Entity References Only 5 allowed (&amp;, &lt;, &gt;, &quot;, &apos;). Thousands of named entities (e.g., &copy;).
Namespaces Must explicitly declare xmlns="http://www.w3.org/1999/xhtml". Implicitly handled by the parser.
Case Sensitivity Tags and attributes must be lowercase. Case-insensitive (though lowercase is common).
Void Elements Must be closed or self-closed. Never closed (e.g., <img> only).

3. Why It Is No Longer Popular

  • Fragility: If you serve a polyglot document as application/xhtml+xml, any minor syntax error (like an unclosed tag) will cause the browser to stop rendering and display an error. Standard HTML5 is "error-tolerant."
  • Modern Tooling: In the past, developers used polyglot markup so they could use XML tools (like XSLT) to process web pages. Today, Python libraries like lxml (with html5lib) or BeautifulSoup handle standard HTML so well that the XML requirement is unnecessary.
  • JavaScript Compatibility: Modern DOM manipulation and frameworks often generate code that violates strict XML rules, making it difficult to maintain polyglot integrity in dynamic apps.

4. Remaining Use Cases

You might still encounter "polyglot-style" coding in these specific areas:

  • EPUB/E-books: The EPUB 3 standard is based on XHTML, requiring strict XML compliance.
  • Legacy Pipelines: Systems that rely on XSLT for document generation.
  • DevOps/System Scripts: When generating quick reports from a script where you want the output to be easily greppable or parsable by simple XML tools.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions