textshape

textshape: Text Shaping Library

textshape is a pure Go text shaping engine that implements the OpenType shaping algorithms. It is a port of the HarfBuzz shaping logic and can be used as the text shaping backend for boxes and glue.

The library lives in the github.com/boxesandglue/textshape module and provides the ot package with the main shaping API.

What is Text Shaping?

A font file contains glyphs — the visual representations of characters. But the relationship between characters and glyphs is not one-to-one. When you type the letters “f” and “i” next to each other, a well-designed font may replace them with a single “fi” ligature glyph. When you type “AV”, the font may move the “V” slightly to the left so the two letters don’t appear too far apart (kerning). In Arabic, each letter takes a different form depending on whether it appears at the beginning, middle, or end of a word.

Text shaping is the process that turns a sequence of Unicode characters into a sequence of correctly chosen, correctly positioned glyphs. It is the step between “I have a string” and “I can draw text on screen or in a PDF.”

A shaper is the engine that performs this process. It reads the font’s OpenType tables (GSUB for glyph substitution, GPOS for glyph positioning) and applies them according to the rules of the script and language being used. The shaper needs to know:

  • The font — which glyphs and rules are available
  • The text — a sequence of Unicode codepoints
  • The direction — left-to-right, right-to-left, or vertical
  • The script — Latin, Arabic, Devanagari, etc. (determines which shaping rules apply)
  • The language — optional, for language-specific glyph forms (e.g., Serbian vs. Russian Cyrillic)

The output is a list of glyph IDs (which glyph to draw) and positions (where to draw it, relative to the current cursor). The positions consist of an advance (how far to move the cursor after drawing) and an offset (a shift from the cursor position, used for marks and kerning adjustments).

All of this happens in font units — an abstract coordinate system defined by the font’s units-per-em (upem) value. To get real-world measurements, you scale by fontSize / upem.

Shaping overview 4 input characters become 3 output glyphs. The fi ligature replaces two characters with one glyph. Each glyph carries an advance value (in font units) that moves the cursor to the next position.

Features

  • OpenType text shaping (GSUB, GPOS)
  • Variable font support (fvar, gvar, HVAR)
  • Vertical text layout (vmtx, VORG)
  • Script-specific shaping (Arabic, Indic, Khmer, Myanmar, Hebrew, Thai, …)
  • Synthetic bold and slant
  • Kern table fallback
  • Glyph outline extraction (TrueType and CFF)
  • Font subsetting with variable font instancing

Documentation