htmlbag — HTML/CSS to PDF
htmlbag is the library that turns HTML and CSS into typeset PDF pages.
It sits on top of the boxes and glue typesetting engine and is
used by both bagme and glu for their HTML rendering modes.
If you only want to render HTML files to PDF from the command line, use glu’s HTML mode. If you write HTML programmatically and want the simplest API, use bagme. The pages here document what htmlbag accepts in its inputs and what its Go API looks like.
Two audiences, two doc threads
This section serves two different readers:
| If you write… | Read… |
|---|---|
| HTML/CSS that gets rendered through bagme or glu | the content pages below |
Go code that imports github.com/boxesandglue/htmlbag directly |
Library reference |
The content pages describe the HTML/CSS subset htmlbag understands — which
elements work, which CSS properties propagate, how floats and footnotes
behave on a page. The library pages describe the Go-level API — the
CSSBuilder type, page-output methods, configuration fields, callbacks,
and PDF/UA tagging hooks.
Content overview
- HTML elements — which tags htmlbag recognises and how they map to typeset content.
- CSS support — recognised properties,
@pagerules, selector scope. - Inserts: floats and footnotes — page-layer items that get lifted out of the body flow and stacked at the page edges.
- Tables — including multi-page tables with repeated header rows.
- Limitations — what htmlbag deliberately does not implement, and what would require future work.
Library reference
- Library overview and stability —
CSSBuilder.New,OutputPages,OutputPagesFromText, and the API-stability disclaimer. - Configuration — footnote and float layout knobs, counters, page dimensions.
- Callbacks —
ElementCallback,PageInitCallback. - PDF/UA tagging — accessibility output.
Architectural sketch
htmlbag’s pipeline runs in four stages:
- HTML parsing (
golang.org/x/net/html) — produces a DOM tree. - CSS resolution (
csshtml) — matches stylesheet rules against the DOM and writes the resolved properties as!-prefixed attributes. - Tree-to-Text (
htmlbag/inheritablestyles.go) — walks the styled DOM and produces afrontend.Texttree, classifying inline vs. block content and lifting markers (footnotes, floats) into a sentinel system. - VList building and page output (
htmlbag/vlistbuilder.goandcssbuilder.go) — formats paragraphs, builds boxes, runs a two-pass page assembler that places top-floats, body content, bottom-floats, and footnotes in the right page layers.
The two-pass page assembler is what lets multiple floats coexist with their source paragraphs on a single page; see inserts for the details and the painting order.