Mathematical typesetting: MathML in, accessible Formula tags out
The boxesandglue stack can now typeset mathematics. An OpenType-MATH
engine renders presentation MathML to TeX-quality output, and, when
the document output is set toPDF/UA, each formula is tagged as an accessible
Formula structure element.
Write MathML, get a formula
Embed MathML directly in your HTML; htmlbag picks
up <math> as foreign content and typesets it through the engine. No
external converter is needed:
<style>
@font-face { font-family: "LM Math"; src: url("latinmodern-math.otf"); }
math { font-family: "LM Math"; font-size: 12pt; }
</style>
<p>The Pythagorean relation
<math>
<msup><mi>a</mi><mn>2</mn></msup><mo>+</mo>
<msup><mi>b</mi><mn>2</mn></msup><mo>=</mo>
<msup><mi>c</mi><mn>2</mn></msup>
</math>
holds for every right triangle.</p>The engine implements the OpenType MATH table: real per-glyph
extents, style-aware fraction and script shifts, stretchy delimiters
via MathVariants/GlyphAssembly, and big-operator limits in display
mode. Identifier styling follows the MathML convention: a
single-character <mi> defaults to math-italic, a multi-character one
(a function name like sin) stays upright.
Supported presentation elements: <mrow>, <mi>, <mn>, <mo>,
<mfrac> (and binomials via linethickness="0"), <msqrt>,
<mroot>, <msup>, <msub>, <msubsup>, <munder>, <mover>,
<munderover>, plus the transparent containers. See
Mathematics (MathML) for the full list and the
current scope boundary (matrices, <mtext> and extended mathvariant
values are future work).
Accessible by default under PDF/UA
Set the conformance level and every formula is tagged automatically:
| Workflow | enable |
|---|---|
| HTML | <meta name="pdf-format" content="PDF/UA-2"> in the head |
| Markdown frontmatter | format: PDF/UA-2 |
glu CLI |
--format PDF/UA-2 |
Each <math> becomes a Formula structure element carrying:
/Alt— a plain-text fallback (from the MathMLalttextattribute, or the token content concatenated in order)./AF— under PDF/UA-2 — the MathML source embedded as an associated file (application/mathml+xml,/AFRelationship /Supplement), so a MathML-aware reader can speak the math semantically.
Because a formula sits inside a sentence, the renderer splits the
paragraph’s marked content so the reading order is text · formula ·
text, not text · text · formula. This rests on a new reading-order
/K serialization in the structure tree that interleaves marked
content, object references, and child elements by document position.
Showcase
boxesandglue-examples/glu/html/mathml
renders five formulas: Pythagoras, the quadratic formula, a
Gauss-summation in display mode, an indexed n-th root, and a
derivative (all as as PDF/UA-2). pdfa11y reports Verdict: PASS, including
the dedicated MathML checks (the Formula has a math representation, and
the associated file is well-formed MathML with the right subtype and
relationship).
glu boxesandglue-examples/glu/html/mathml/mathml.html
pdfa11y mathml.pdf # Verdict: PASSNew HTML metadata: <meta name="pdf-format">
HTML documents have no frontmatter, so requesting a conformance level
previously meant passing --format on every run. HTML documents can
now declare it inline with <meta name="pdf-format" content="PDF/UA-2">
(mirroring the Markdown format: key), and HTML mode now also derives
the document title from the <title> element. The
glu HTML mode
page has the details.
Reading more
- Mathematics (MathML) — supported elements,
mathvariantrules, scope. - PDF/UA tagging — Mathematics (Formula)
— how the
Formulaelement,/Alt, and the MathML associated file are emitted. - glu HTML mode — conformance metadata and a MathML example.
What’s next
Still missing is a decent interface to Markdown. You might want to write your code in TeX/LaTeX style, for example
\[ \sum_{k=0}^{n} k^2 \]This will be a bit of work, but I am working on it.