auto-jrubby is a Typst package that provides automatic Japanese morphological analysis and furigana (ruby) insertion.
It leverages a Rust-based WASM plugin to tokenize text using Lindera (a morphological analysis library) and uses the rubby package to render the furigana.
Features
- Automatic Furigana Generation: Automatically determines readings for Kanji based on context and renders them as ruby text.
- Smart Okurigana Alignment: Intelligent handling of mixed Kanji/Hiragana words (e.g.,
食べるis rendered with rubyたover食, leavingべるuntouched). - Morphological Analysis Table: Visualize the text structure (Part of Speech, Detailed POS, Readings, Base forms) via a formatted data table.
- Customizable Styling: Supports custom ruby sizing and positioning via the
rubbypackage backend. - High Performance: Powered by a Rust WASM plugin using Lindera (IPADIC) for fast and accurate tokenization.
Usage
Basic Furigana
To automatically add readings to Japanese text:
#import "@preview/auto-jrubby:0.1.0": *
#set text(font: "Hiragino Sans", lang: "ja")
#let sample = "ルビ(英語: ruby)は、文章内の任意の文字に対しふりがなや説明、異なる読み方といった役割の本文の横に付属される文字。"
#show-ruby(sample)

Morphological Analysis Table
To debug or display the linguistic structure of the text:
#import "@preview/auto-jrubby:0.1.0": *
#set text(font: "Hiragino Sans", lang: "ja")
#show-analysis-table("すももも桃も桃のうち")

API Reference
show-ruby
Renders the input text with automatic furigana.
#let show-ruby(input-text, size: 0.5em, ruby-func: auto)
Parameters:
input-text(string): The Japanese text to analyze and render.size(length): The font size of the ruby text. Defaults to0.5em.ruby-func(function | auto): A custom ruby function from therubbypackage.- If
auto, it uses the default configuration (get-ruby(size: size)). - If provided, it allows advanced customization of ruby positioning (e.g., specific
posoralignment).
- If
show-analysis-table
Renders a table displaying the morphological breakdown of the text.
#let show-analysis-table(input-text)
Parameters:
input-text(string): The text to analyze.
Table Columns:
- Surface Form (表層形): The word as it appears in the text.
- Part of Speech (品詞): Grammatical category (Noun, Verb, etc.).
- Details (詳細): Sub-category (e.g., Proper Noun, Suffix).
- Reading (読み): Katakana reading.
- Base Form (基本形): The dictionary form of the word.
tokenize
Low-level function that returns the raw JSON data from the WASM plugin. Useful if you want to process the analysis data manually.
#let tokenize(input-text)
Returns: An array of dictionaries containing surface, pos, sub_pos, reading, base, and ruby_segments.
Under the Hood
This package uses Lindera (a Rust port of Kuromoji) with the IPADIC dictionary embedded in the WASM binary.
- The text is passed from Typst to the Rust WASM plugin.
- Lindera tokenizes the text and retrieves readings.
- A custom algorithm aligns the readings with the surface form to separate Okurigana (kana endings of verbs/adjectives) from the Kanji stems.
- Typst receives the structured data and renders it using
rubby.
License
This project is distributed under the AGPL-3.0-or-later License. See LICENSE for details.