Buffer

The Buffer

The Buffer is the central data structure for text shaping. It holds both the input (Unicode text) and the output (positioned glyphs). You fill a buffer with text, set its properties, pass it to Shape(), and then read the results from the same buffer.

Creating a Buffer

buf := ot.NewBuffer()

A new buffer is empty and has sensible defaults:

  • Direction is unset (auto-detected by GuessSegmentProperties)
  • Cluster level is 0 (monotone graphemes)
  • No flags set

Adding Text

There are two ways to add text:

AddString

Adds a UTF-8 string. Each rune gets its own cluster index (starting from 0).

buf.AddString("Hello, World!")

AddCodepoints

Adds Unicode codepoints directly:

codepoints := []ot.Codepoint{0x0048, 0x0065, 0x006C, 0x006C, 0x006F}
buf.AddCodepoints(codepoints)

Both methods assign sequential cluster values. Marks (Unicode category M) get the same cluster treatment as base characters at this stage — cluster merging happens during shaping.

Setting Properties

Direction

The text direction determines how glyphs are laid out:

buf.Direction = ot.DirectionLTR  // Left-to-Right (Latin, Cyrillic, ...)
buf.Direction = ot.DirectionRTL  // Right-to-Left (Arabic, Hebrew, ...)
buf.Direction = ot.DirectionTTB  // Top-to-Bottom (CJK vertical)
buf.Direction = ot.DirectionBTT  // Bottom-to-Top

You can also use:

buf.SetDirection(ot.DirectionRTL)

Direction constants have useful methods:

dir := ot.DirectionRTL
dir.IsHorizontal() // true
dir.IsVertical()   // false
dir.IsForward()    // false (RTL is backward)
dir.IsBackward()   // true
dir.Reverse()      // DirectionLTR

Arabic Shaping Example

The following diagram shows how Arabic shaping works. The input is the word “كتب” (kataba) — three characters in logical order (kaf, ta, ba). The shaper applies contextual forms based on each letter’s position in the word (initial, medial, final) and reorders the glyphs into visual order (right-to-left):

Arabic shaping Each Arabic letter takes a different form depending on its position in the word. The shaper also reverses the glyph order from logical (LTR storage) to visual (RTL display).

Script

The ISO 15924 script tag. This determines which script-specific shaping rules to apply (Arabic joining, Indic reordering, etc.).

buf.Script = ot.MakeTag('L', 'a', 't', 'n')  // Latin
buf.Script = ot.MakeTag('A', 'r', 'a', 'b')  // Arabic
buf.Script = ot.MakeTag('D', 'e', 'v', 'a')  // Devanagari
buf.Script = ot.MakeTag('H', 'a', 'n', 'g')  // Hangul
buf.Script = ot.MakeTag('H', 'a', 'n', 'i')  // CJK ideographs

Language

The OpenType language tag. This selects language-specific rules in the font (e.g., different glyph forms for Turkish vs. Azerbaijani).

buf.Language = ot.MakeTag('T', 'R', 'K', ' ')  // Turkish

GuessSegmentProperties

If you don’t set direction, script, or language manually, call GuessSegmentProperties to auto-detect them from the buffer contents:

buf.AddString("مرحبا")
buf.GuessSegmentProperties()
// Direction is now RTL, Script is Arab

GuessSegmentProperties only fills in properties that are not already set. If you set the direction manually, it will not override it:

buf.Direction = ot.DirectionLTR
buf.AddString("مرحبا")
buf.GuessSegmentProperties() // Keeps LTR, only sets script

Buffer Flags

Flags modify shaping behavior:

buf.Flags = ot.BufferFlagBOT | ot.BufferFlagEOT
Flag Description
BufferFlagDefault No special behavior
BufferFlagBOT Beginning of text — affects Arabic joining at the start
BufferFlagEOT End of text — affects Arabic joining at the end
BufferFlagPreserveDefaultIgnorables Keep default ignorable characters (ZWJ, ZWNJ, …) visible
BufferFlagRemoveDefaultIgnorables Remove default ignorable characters from output
BufferFlagDoNotInsertDottedCircle Don’t insert dotted circle for invalid sequences

For Arabic text that is part of a larger paragraph, BOT and EOT control the joining behavior at the boundaries. Without these flags, the shaper assumes the text continues in both directions.

Context

When shaping a fragment of a larger text, you can provide surrounding context to get correct Arabic joining at the boundaries:

buf.AddString("بسم")

// Characters before and after this fragment
buf.PreContext = []ot.Codepoint{0x0627}  // alef before
buf.PostContext = []ot.Codepoint{0x0627} // alef after

The context characters are not shaped themselves — they only influence the joining forms of the first and last glyphs in the buffer.

Cluster Level

The cluster level controls how clusters (groups of characters that form a single unit) are managed during shaping:

buf.ClusterLevel = 0  // MONOTONE_GRAPHEMES (default): merge marks, monotone order
buf.ClusterLevel = 1  // MONOTONE_CHARACTERS: separate marks, monotone order
buf.ClusterLevel = 2  // CHARACTERS: separate marks, no monotone enforcement

Most applications should use the default (0). Level 1 is useful when you need to track individual characters through shaping (e.g., for cursor positioning).

Clearing and Resetting

buf.Clear() // Remove all glyphs, keep properties (direction, script, ...)
buf.Reset() // Remove all glyphs AND reset all properties to defaults

Reading Results

After Shape(), the buffer contains the shaped glyphs:

// Number of output glyphs
n := buf.Len()

// Iterate over results
for i := 0; i < n; i++ {
    info := buf.Info[i]
    pos := buf.Pos[i]

    // info.GlyphID    — glyph index in the font
    // info.Cluster    — maps back to original text position
    // info.Codepoint  — original Unicode codepoint

    // pos.XAdvance    — horizontal advance (font units)
    // pos.YAdvance    — vertical advance (font units)
    // pos.XOffset     — horizontal offset from current position
    // pos.YOffset     — vertical offset from current position
}

Extracting Glyph IDs

glyphIDs := buf.GlyphIDs() // []GlyphID — just the glyph indices

Reversing

You can reverse the glyph order (useful for RTL rendering on LTR systems):

buf.Reverse()                // Reverse entire buffer
buf.ReverseRange(start, end) // Reverse a range