glotter is a Typst package for detecting the language of text fragments and applying language-aware settings to content.
The Typst plugin is compiled to WebAssembly using fasttext-pure-rs, a pure Rust implementation of fastText.
Usage
#import "@preview/glotter:0.1.0": *
#let samples = (
ja: "これは日本語の文章です。",
en: "This is an English sentence.",
)
#let print-lang(text) = [
#text $arrow$ #lang(text) \
]
#print-lang(samples.ja)
#print-lang(samples.en)

To retrieve detailed prediction metadata, use detect-info:
#let info = detect-info(
samples.en,
k: 3,
min-margin: 0.12,
)
#let print-info(info) = [
Language: #info.at("lang") \
Probability: #info.at("probability") \
Ambiguous: #info.at("ambiguous")
]
#print-info(info)

To apply the detected language automatically to content:
Typst can apply language-specific text settings when the language is known, but the language must normally be specified manually. auto-text detects the language of a fragment and applies the corresponding text(lang: ...) setting automatically.
#let sample = "שלום עולם"
*Plain text:*
#sample
*Manual Typst:*
#text(lang: "he")[#sample]
*With glotter:*
#auto-text[#sample]

For document-wide use, auto-par can be installed as a paragraph show rule.
Each paragraph is detected independently, so mixed-language documents can be written without wrapping every paragraph in text(lang: ...).
#show par: it => auto-par(it, fallback: "en")
Hello world.
Hallo Welt.
שלום עולם.
مرحبا بالعالم.

Use Cases
glotter can be used directly, but it is also designed as a building block for downstream Typst packages, templates, and document-processing workflows.
Typst already supports language-aware text through text(lang: ...).
glotter provides the missing step: detecting which language should be applied to a given text fragment.
Multilingual templates
Templates can use glotter to accept multilingual content without requiring users to annotate every fragment manually.
#let localized-abstract(body) = {
let info = detect-info(body, fallback: "en")
let l = info.at("lang")
let rtl-lang = ("ar", "fa", "he", "ur").contains(l)
let title = if l == "ja" {
"概要"
} else if l == "ar" {
"ملخص"
} else if l == "de" {
"Zusammenfassung"
} else if l == "fr" {
"Résumé"
} else {
"Abstract"
}
block(
width: 10cm,
inset: 8pt,
stroke: 0.5pt + luma(180),
radius: 4pt,
)[
#set text(lang: l, dir: if rtl-lang { rtl } else { ltr })
#set align(if rtl-lang { right } else { left })
#strong[#title] \
#auto-text(fallback: "en")[#body]
]
}
#localized-abstract[This package detects the language of text fragments.]
#localized-abstract[تقوم هذه الحزمة بتحديد لغة أجزاء النص.]

Language-dependent rendering
Downstream packages can use detect-info to branch on the detected language.
#import "@preview/note-me:0.6.0": *
#let localized-note(body) = {
let info = detect-info(body, fallback: "en")
let l = info.at("lang")
let title = if l == "ja" {
"注意"
} else if l == "de" {
"Hinweis"
} else if l == "fr" {
"Remarque"
} else {
"Note"
}
note(title: title)[
#auto-text(fallback: "en")[#body]
]
}
#localized-note[This is an English note.]
#localized-note[これは日本語の注記です。]
#localized-note[Dies ist ein deutscher Hinweis.]

API
detect(input, k: 1, threshold: 0.0): returns raw predictions as dictionaries withlang,label, andprobability.detect-info(input, k: 3, threshold: 0.0, min-margin: 0.12, fallback: "und"): returns a dictionary containing the final language, top prediction, all predictions, probability margin, ambiguity flag, fallback value, and normalized input.lang(input, ...): returns a language code.is-lang(input, expected, ...): checks whether the detected language matches a given string or any string in an array.is-cjk(input, ...),is-rtl(input, ...),is-latin(input, ...): convenience script-family checks.auto-text(body, fallback: "en", debug: false, ...): detects the language ofbodyand wraps it intext(lang: detected-language); any additional named arguments are forwarded totext(...).auto-par(it, fallback: "en"): detects a paragraph and applies the detected language; intended for use with#show par.debug-info(info): renders a compact diagnostic label for adetect-inforesult.supported-languages: array of the 176 language codes supported by the embedded fastText language identification model.supported-language-count: number of language codes supported by the embedded fastText language identification model.is-supported-language(language): checks whether a language code or fastText label is supported.
The embedded model expects UTF-8 text.
fastText Model Provenance
glotter.wasm embeds data from the fastText language identification model lid.176.ftz, the compressed variant of fastText’s 176-language identification model.
Official source: https://fasttext.cc/docs/en/language-identification.html
The official fastText documentation states that the language identification models were trained on data from Wikipedia, Tatoeba, and SETimes, and are distributed under CC-BY-SA-3.0.
License
glotter is distributed under MIT AND CC-BY-SA-3.0.
This package does not include the standalone fastText model file lid.176.ftz.
However, this package includes glotter.wasm, which embeds model data from lid.176.ftz. For that reason, the package contains material under both:
- MIT License for the Typst package and plugin code.
- Creative Commons Attribution-ShareAlike 3.0 Unported for the embedded fastText model data.
The package code is MIT licensed. The embedded fastText model data in glotter.wasm is licensed under CC-BY-SA-3.0. See NOTICE.md for attribution, checksums, modification status, and redistribution notes. License texts are provided in LICENSE-MIT and LICENSE-CC-BY-SA-3.0.