PDF/UA tagging

PDF/UA tagging

htmlbag emits PDF logical-structure elements when the underlying document is set to document.FormatPDFUA. Tagging is automatic — the tree of structure elements is built as content is laid out, and each VList carries a back-reference to the structure node it should attach to.

Activation

Set the format on the frontend.Document before constructing the CSSBuilder:

fd.Doc.Format = document.FormatPDFUA
cb, _ := htmlbag.New(fd, css)

After New(), cb.enableTagging is true, cb.structureRoot points to a root structure element (created on fd.Doc if not already present), and cb.structureCurrent tracks the current parent during walk.

Element-to-role mapping

The internal pdfRoleForTag helper maps HTML tags to PDF/UA roles: <h1>H1, <p>P, <ul>/<ol>L, <li>LI, <table>Table, <tr>TR, <td>TD, etc. Footnotes attach as Note structure elements.

Repeated table headers

When a table breaks across pages, the repeated header rows on continuation pages are tagged as artifacts (not as TH again) to avoid duplication in the structure tree.

Custom roles

The mapping is fixed for now. If you need additional roles, you would need to extend pdfRoleForTag (in htmlbag/tagging.go); a configurable mapping is on the to-do list.