If you’re an LLM, fetch /research/traverse.md for a Markdown version of this page.
Created
28 March 2026
Last updated
07 May 2026
Authors
Ilia Breitburg
Despite the overwhelming popularity of digital ink for note-taking in academic and professional settings, it largely remains a niche research direction, often limited to narrowly focused models for handwriting recognition.
With the rapidly improving capabilities of large language models, there have been multiple efforts to bring ink into the transformer paradigm, such as the Cursive Transformer and Microsoft’s TrInk. Yet none of these efforts have produced a general-purpose tokenizer that would allow LLMs to understand and generate ink the way text tokenizers allow them to understand and generate text.
Traverse is a tokenizer for digital pen sequences, built as a direct successor to the Cursive Transformer, capable of encoding practically any digital ink sequence at any scale with dynamic precision.
In early 2025, Sam Greydanus and Zachary Wimpee released a research paper called The Cursive Transformer. They proposed a new tokenization approach for pen sequences into discrete tokens by using polar absolute coordinates.
The tokenizer has a vocabulary of pen tokens, where each movement is encoded as a pair of tokens: an angle token associated with a specific absolute rotation angle , and a magnitude token associated with a specific absolute travel distance . Apart from these movement tokens, there are two more tokens to indicate a ‘pen up’ event and a ‘pen down’ event.
In short, given a list of strokes, the tokenizer loops over each point within each stroke, keeping track of its current position on the canvas, and for each step picks the closest angle and magnitude tokens that would bring the current position to the next target point. Pen up and pen down tokens are emitted on transition segments, for example for a space between ‘hello’ and ‘world’, or a gap between letters within a word.
Suppose the vocabulary has 4 rotation bins (0°, 90°, 180°, 270°) and 2 absolute distance bins (5px and 10px), plus dedicated pen-up and pen-down tokens. If you want to draw a 10×20 rectangle with a starting point at (0, 0), both side lengths divide evenly into the available bins:
Now suppose you scale that rectangle to 1.5×, giving a 15×30 rectangle. There is no 15px or 30px bin, so the tokenizer has to approximate each side with multiple tokens, producing a completely different and longer sequence for the same shape:
This illustrates the core issues that make the approach largely impractical for large-scale pen sequence modelling:
Traverse is a scale-invariant tokenizer for digital pen sequences based on the Cursive Transformer work, introducing a number of significant improvements:
Traverse operates in two stages: raw strokes are first resampled based on curvature, then the resampled points are encoded into discrete tokens using relative distance bins at a given precision value. The following sections describe each stage in order.
One of the key components that made the Cursive Transformer work was the Douglas-Peucker resampling, which reduces point count in a stroke while preserving its visual shape. The problem is that Douglas resampling operates on an absolute distance threshold, making it scale-dependent.
For the tokenizer to produce consistent sequences across scales, the resampling itself has to be scale-invariant too. Curvature turned out to be a natural choice here, since it’s a dimensionless geometric property that doesn’t depend on scale.
The resampling works by building a dimensionless ‘complexity’ metric for each stroke from its length and curvature. For each segment between consecutive points, the contribution to the stroke’s total complexity is
Each segment’s share of the budget is proportional to its length, but boosted if it bends sharply. Normalizing curvature by the perimeter is what makes this scale-invariant, which follows from the property of a circle having the same normalized curvature regardless of its radius.
This gives us each stroke’s total complexity. The point budget is then allocated globally, where each stroke receives points proportional to its share of the total complexity across all strokes, scaled by a density hyperparameter (with a minimum of 2 points per stroke). Each stroke is then resampled by placing its allocated points at uniform intervals along its cumulative complexity, so that high-curvature regions receive denser sampling than straight regions.
The result is that a small tight curve and a large gentle arc of the same visual complexity receive comparable point counts, giving a consistent representation of strokes regardless of their absolute size.
Since the resampling is scale-invariant, the relative distance between points would be more or less preserved, regardless of the absolute scale. That notion became the foundation of Traverse’s tokenizer.
Instead of having absolute distance bins, Traverse uses relative multiplicative distance bins. The tokenizer keeps track of a base distance that evolves throughout encoding: each token is picked based on its multiplicative distance bin relative to , and then is updated by multiplying it with that bin’s value.
Unlike the Cursive Transformer, tokens in Traverse pack the distance, the rotation, and the pen state. That helps the model be more efficient during inference, and easier to spot patterns in pen up and pen down tokens during training.
Because the multiplicative bins can be both additive (2x, 4x) and subtractive (½x, ¼x), and you’re encoding a distance between two points, you can approach the target point to arbitrary precision: prioritizing quality by spending more ‘refinement’ tokens on reaching the target closer, or prioritizing speed of inference by tolerating larger error, but producing smaller token sequences.
Suppose our vocabulary has 12 rotation bins (30° each) and 3 distance multipliers at , for both pen-down and pen-up states. Now suppose you draw the same 10×20 and 15×30 rectangles using Traverse. is initialized from the first segment, so the first long side is , the short side halves it to , and the pattern continues with the same multipliers regardless of absolute size:
Since every distance bin is a power of 2, is quantized to the nearest power of 2 as well: . This keeps on a discrete power-of-2 lattice throughout encoding, so two encoders starting from slightly different initial scales snap to the same lattice point and produce identical token sequences. is also re-snapped to the lattice after each pen-up transition, giving every new stroke a clean starting point.
This notion of variable precision introduces a question: how do you decide that your error is ‘close enough’ for some given precision value? In the original Cursive Transformer, the ‘close enough’ metric was the absolute distance between the current point and the target point. But absolute distance breaks scale invariance: smaller shapes become less precise.
In Traverse, the precision is a ratio where , and the acceptance threshold is , where is the smoothed base distance. At the threshold equals and each point is reached in a single token. As the threshold approaches zero and token density grows as , so the same finite vocabulary can reproduce a shape to any level of detail.
The initial idea was to set the precision value relative to the current segment length, which would be scale-invariant, but also very sensitive to sudden jumps in precision that would fragment the encoding quality, jumping from very dense and precise points in some areas to extremely low precision in other areas, making them practically illegible.
Instead, the precision anchor is computed as an Exponential Moving Average of in log-space:
The latest distance contributes most, while all the previous distances contribute exponentially less, providing a stable signal to normalize precision against. The delta clamping suppresses sudden spikes in across fragments, which further stabilizes the precision.
Due to a large difference between pen up and pen down distances, it was empirically determined that only refining the precision for pen down steps was optimal for retaining geometric features of the shape the ink represents.
The result is that if the writing includes some elements that are relatively smaller than other elements within the same sequence, the encoder dynamically adapts its precision to ensure uniform quality across the ink sequence, while remaining scale-invariant.
To quantitatively evaluate performance, the IAM On-Line Handwriting Database is used as a benchmark.
Vocabulary usage uniformity is comparable across both tokenizers, with Traverse landing within a few percentage points of The Cursive Transformer despite operating on a much smaller vocabulary. Uniformity is reported as the normalized Shannon entropy , where over token probabilities and is the vocabulary size, reaching when every token is used equally.
To make the comparison between tokenization approaches fair, each sample is uniformly resampled, encoded and decoded with Traverse (5 distance bins at [0.125, 0.5, 1, 2, 4] and 32 uniform rotation bins per pen state, for a vocabulary of 320 pen tokens) and The Cursive Transformer (220 shared angle bins and 151 absolute distance bins per pen state, for a vocabulary of 522 pen tokens) to produce reconstructed stroke sequences, then uniformly resampled again to match the original point count. Reconstruction quality is measured as the Mean Squared Error between the original and reconstructed sequences.
At a comparable token budget, Traverse cuts reconstruction error by roughly 2.3x against The Cursive Transformer (177 tokens at vs. 160 tokens, MSE 2.2e-4 vs. 5.0e-4). Crucially, this is achieved with a single fixed 320-token vocabulary: precision becomes an inference-time parameter, letting the same tokenizer trade tokens for fidelity from for fast, coarse output up to for reconstructions roughly 5x sharper than The Cursive Transformer.
Although Traverse supports encoding with arbitrary precision, the overall quality remains constrained by the fixed set of distance and rotation bins in its vocabulary. While the tokenizer can theoretically reach any level of detail by emitting multiple fraction tokens to lower the base distance, the adaptive speed is ultimately limited by the lowest available bin resolution.
Furthermore, despite significant improvements in token efficiency compared to The Cursive Transformer, Traverse still produces relatively long sequences. Given the quadratic attention complexity of transformer models, very long sequences can strain memory and inference latency.
With a goal to validate the proposed tokenization approach, a small GPT-2-style decoder-only model called Graphite has been trained. The model itself uses a vanilla GPT-2 decoder-only architecture with 8 layers, 8 heads, and 468 embedding size. It was trained on 1x NVIDIA RTX A4000 for 16 epochs, with a batch size of 24, and a learning rate 5e-4.
The training set consisted of short snippets of handwritten words, sentences, equations and numbers. The data has been collected from volunteer students who agreed to share their study notes made on an iPad app for notetaking, recorded with Apple Pencil. In total, about 50K samples have been collected across 10 students.
To make the dataset supervised, all the handwriting within samples were transcribed using an open-source OCR model.
In order to allow the model to learn the connection between text symbols and pen tokens, a character-level tokenizer has been used. Te allowed text symbols set was built from ASCII characters list, which includes uppercase and lowercase letters, digits, and a short set of symbols.
Apart from text symbol tokens, all the discrete tokens from Traverse have been added, along with a handful of special tokens for prompt formatting. The specific tokenizer parameters (5 distance bins at [0.125, 0.5, 1, 2, 4] and 32 uniformly spaced rotation bins for both pen-down and pen-up states, yielding a vocabulary of 320 pen tokens) were empirically determined by inspecting the bin usage distribution on the training dataset and selecting the values that produced the most uniform coverage across the vocabulary.
@online{breitburg2026traverse,
title = {Traverse allows AI to read and write ink},
author = {Breitburg, Ilia},
year = {2026},
url = {https://breitburg.com/research/traverse/},
}Questions or feedback? Reach out at research@breitburg.com
KaTeX for equations, TikZ for illustrations
AI assisted with prose refinement and visualization code, under human supervision