htmlbag

htmlbag — HTML/CSS to PDF

htmlbag is the library that turns HTML and CSS into typeset PDF pages. It sits on top of the boxes and glue typesetting engine and is used by both bagme and glu for their HTML rendering modes.

If you only want to render HTML files to PDF from the command line, use glu’s HTML mode. If you write HTML programmatically and want the simplest API, use bagme. The pages here document what htmlbag accepts in its inputs and what its Go API looks like.

Two audiences, two doc threads

This section serves two different readers:

If you write… Read…
HTML/CSS that gets rendered through bagme or glu the content pages below
Go code that imports github.com/boxesandglue/htmlbag directly Library reference

The content pages describe the HTML/CSS subset htmlbag understands — which elements work, which CSS properties propagate, how floats and footnotes behave on a page. The library pages describe the Go-level API — the CSSBuilder type, page-output methods, configuration fields, callbacks, and PDF/UA tagging hooks.

Content overview

  • HTML elements — which tags htmlbag recognises and how they map to typeset content.
  • CSS support — recognised properties, @page rules, selector scope.
  • Inserts: floats and footnotes — page-layer items that get lifted out of the body flow and stacked at the page edges.
  • Tables — including multi-page tables with repeated header rows.
  • Limitations — what htmlbag deliberately does not implement, and what would require future work.

Library reference

Architectural sketch

htmlbag’s pipeline runs in four stages:

  1. HTML parsing (golang.org/x/net/html) — produces a DOM tree.
  2. CSS resolution (csshtml) — matches stylesheet rules against the DOM and writes the resolved properties as !-prefixed attributes.
  3. Tree-to-Text (htmlbag/inheritablestyles.go) — walks the styled DOM and produces a frontend.Text tree, classifying inline vs. block content and lifting markers (footnotes, floats) into a sentinel system.
  4. VList building and page output (htmlbag/vlistbuilder.go and cssbuilder.go) — formats paragraphs, builds boxes, runs a two-pass page assembler that places top-floats, body content, bottom-floats, and footnotes in the right page layers.

The two-pass page assembler is what lets multiple floats coexist with their source paragraphs on a single page; see inserts for the details and the painting order.