The Buffer
The Buffer is the central data structure for text shaping. It holds both the input (Unicode text) and the output (positioned glyphs). You fill a buffer with text, set its properties, pass it to Shape(), and then read the results from the same buffer.
Creating a Buffer
buf := ot.NewBuffer()A new buffer is empty and has sensible defaults:
- Direction is unset (auto-detected by
GuessSegmentProperties) - Cluster level is 0 (monotone graphemes)
- No flags set
Adding Text
There are two ways to add text:
AddString
Adds a UTF-8 string. Each rune gets its own cluster index (starting from 0).
buf.AddString("Hello, World!")AddCodepoints
Adds Unicode codepoints directly:
codepoints := []ot.Codepoint{0x0048, 0x0065, 0x006C, 0x006C, 0x006F}
buf.AddCodepoints(codepoints)Both methods assign sequential cluster values. Marks (Unicode category M) get the same cluster treatment as base characters at this stage — cluster merging happens during shaping.
Setting Properties
Direction
The text direction determines how glyphs are laid out:
buf.Direction = ot.DirectionLTR // Left-to-Right (Latin, Cyrillic, ...)
buf.Direction = ot.DirectionRTL // Right-to-Left (Arabic, Hebrew, ...)
buf.Direction = ot.DirectionTTB // Top-to-Bottom (CJK vertical)
buf.Direction = ot.DirectionBTT // Bottom-to-Top
You can also use:
buf.SetDirection(ot.DirectionRTL)Direction constants have useful methods:
dir := ot.DirectionRTL
dir.IsHorizontal() // true
dir.IsVertical() // false
dir.IsForward() // false (RTL is backward)
dir.IsBackward() // true
dir.Reverse() // DirectionLTR
Arabic Shaping Example
The following diagram shows how Arabic shaping works. The input is the word “كتب” (kataba) — three characters in logical order (kaf, ta, ba). The shaper applies contextual forms based on each letter’s position in the word (initial, medial, final) and reorders the glyphs into visual order (right-to-left):
Each Arabic letter takes a different form depending on its position in the word. The shaper also reverses the glyph order from logical (LTR storage) to visual (RTL display).
Script
The ISO 15924 script tag. This determines which script-specific shaping rules to apply (Arabic joining, Indic reordering, etc.).
buf.Script = ot.MakeTag('L', 'a', 't', 'n') // Latin
buf.Script = ot.MakeTag('A', 'r', 'a', 'b') // Arabic
buf.Script = ot.MakeTag('D', 'e', 'v', 'a') // Devanagari
buf.Script = ot.MakeTag('H', 'a', 'n', 'g') // Hangul
buf.Script = ot.MakeTag('H', 'a', 'n', 'i') // CJK ideographs
Language
The OpenType language tag. This selects language-specific rules in the font (e.g., different glyph forms for Turkish vs. Azerbaijani).
buf.Language = ot.MakeTag('T', 'R', 'K', ' ') // Turkish
GuessSegmentProperties
If you don’t set direction, script, or language manually, call GuessSegmentProperties to auto-detect them from the buffer contents:
buf.AddString("مرحبا")
buf.GuessSegmentProperties()
// Direction is now RTL, Script is Arab
GuessSegmentProperties only fills in properties that are not already set. If you set the direction manually, it will not override it:
buf.Direction = ot.DirectionLTR
buf.AddString("مرحبا")
buf.GuessSegmentProperties() // Keeps LTR, only sets script
Buffer Flags
Flags modify shaping behavior:
buf.Flags = ot.BufferFlagBOT | ot.BufferFlagEOT| Flag | Description |
|---|---|
BufferFlagDefault |
No special behavior |
BufferFlagBOT |
Beginning of text — affects Arabic joining at the start |
BufferFlagEOT |
End of text — affects Arabic joining at the end |
BufferFlagPreserveDefaultIgnorables |
Keep default ignorable characters (ZWJ, ZWNJ, …) visible |
BufferFlagRemoveDefaultIgnorables |
Remove default ignorable characters from output |
BufferFlagDoNotInsertDottedCircle |
Don’t insert dotted circle for invalid sequences |
For Arabic text that is part of a larger paragraph, BOT and EOT control the joining behavior at the boundaries. Without these flags, the shaper assumes the text continues in both directions.
Context
When shaping a fragment of a larger text, you can provide surrounding context to get correct Arabic joining at the boundaries:
buf.AddString("بسم")
// Characters before and after this fragment
buf.PreContext = []ot.Codepoint{0x0627} // alef before
buf.PostContext = []ot.Codepoint{0x0627} // alef after
The context characters are not shaped themselves — they only influence the joining forms of the first and last glyphs in the buffer.
Cluster Level
The cluster level controls how clusters (groups of characters that form a single unit) are managed during shaping:
buf.ClusterLevel = 0 // MONOTONE_GRAPHEMES (default): merge marks, monotone order
buf.ClusterLevel = 1 // MONOTONE_CHARACTERS: separate marks, monotone order
buf.ClusterLevel = 2 // CHARACTERS: separate marks, no monotone enforcement
Most applications should use the default (0). Level 1 is useful when you need to track individual characters through shaping (e.g., for cursor positioning).
Clearing and Resetting
buf.Clear() // Remove all glyphs, keep properties (direction, script, ...)
buf.Reset() // Remove all glyphs AND reset all properties to defaults
Reading Results
After Shape(), the buffer contains the shaped glyphs:
// Number of output glyphs
n := buf.Len()
// Iterate over results
for i := 0; i < n; i++ {
info := buf.Info[i]
pos := buf.Pos[i]
// info.GlyphID — glyph index in the font
// info.Cluster — maps back to original text position
// info.Codepoint — original Unicode codepoint
// pos.XAdvance — horizontal advance (font units)
// pos.YAdvance — vertical advance (font units)
// pos.XOffset — horizontal offset from current position
// pos.YOffset — vertical offset from current position
}Extracting Glyph IDs
glyphIDs := buf.GlyphIDs() // []GlyphID — just the glyph indices
Reversing
You can reverse the glyph order (useful for RTL rendering on LTR systems):
buf.Reverse() // Reverse entire buffer
buf.ReverseRange(start, end) // Reverse a range