An intelligent, context-aware plugin for Neovim that automatically detects the natural language of the current buffer and adjusts the spelllang option.
Unlike naive language detectors, it uses Tree-sitter to extract only the relevant text (comments, strings, and prose), ignoring code syntax to prevent false positives. It employs the Trigram-based text categorization for high accuracy even on short texts.
This is a simple plugin to address a simple need. But it gets the job done.
My main use is for hopping between files in different languages. Something non english speaking programmers do a lot between english and a mother language. So, to avoid annoying set spelllang every time — or lots and lots of even more annoying spelling errors — autolang.nvim seems a decent relief for a somewhat scratch-able itch.
- Uses Tree-sitter to analyze only comments, docstrings, and prose. It ignores keywords like
function,import, orvarthat confuse standard detectors. - Uses Trigram N-Grams profiles (based on the Cavnar & Trenkle 1994 paper) to distinguish between linguistically close languages (e.g., pt_BR vs. pt_PT).
- Fast startup with “Fail-Fast” Unicode script detection (e.g., immediately identifies Chinese/CJK or Russian/Cyrillic without running complex analysis).
- Bundled with profiles for over 60 languages and dialects.
- Written in Pure Lua. No Python, Node, or CLI tools required.
- Supports interactive mode (ask before changing language).
{
"suderio/autolang.nvim", -- Replace with local path or git repo
event = { "BufReadPost", "BufWritePost" },
config = function()
require("autolang").setup({
-- Your custom config here (optional)
})
end
}use {
"suderio/autolang.nvim",
config = function()
require("autolang").setup()
end
}Plug 'suderio/autolang.nvim'
" Add to your init.lua or init.vim:
" lua require('autolang').setup()The configuration maps the detected language profile (keys) to your Neovim spelllang setting (values).
require("autolang").setup({
-- Enable auto-detection
auto_detect = true,
-- Interactive Mode:
-- false: Changes spelllang silently (default)
-- true: Opens a prompt asking if you want to change the language
interactive = false,
-- How many lines of "human text" to analyze.
-- Since we use Tree-sitter to strip code, 50 lines is usually enough.
-- Sometimes it is useful to change it to 100.
lines_to_check = 50,
-- Limit the detection to specific languages.
-- OPTIONAL: If nil, checks against ALL languaages (60+) defined in 'lang_mapping'.
-- This is likely the best way to improve performance.
limit_languages = { "en", "pt_BR" },
})
The plugin understands the structure of these files. It extracts text from specific nodes (e.g., `comment`, `string`, `paragraph`) and ignores code.
| Category | Filetypes |
| Markup & Prose | markdown, typst, org, html, xml, latex, gitcommit |
| Web Dev | javascript, typescript, tsx, javascriptreact, css, scss, json, yaml, toml |
| Backend/Sys | lua, python, rust, go, c, cpp, java, bash |
The complete list of supportedd filetypes can be found at https://github.com/suderio/autolang.nvim/tree/queries/.
You can add or change support for languages yourself. See `:h autolang-queries` in Nvim.
Note: If a filetype is not listed or Tree-sitter is missing, the plugin falls back to analyzing the raw lines of the file.
The plugin includes pre-calculated statistical profiles for 64 languages and dialects. Use the Code column as the key in your `lang_mapping` configuration.
| Code | Language | Code | Language | Code | Language |
af | Afrikaans | ha | Hausa | pt | Portuguese (General) |
ar | Arabic | haw | Hawaiian | pt_BR | Portuguese (Brazil) |
az | Azerbaijani | hi | Hindi | pt_PT | Portuguese (Portugal) |
bg | Bulgarian | hr | Croatian | ro | Romanian |
ca | Catalan | hu | Hungarian | ru | Russian |
ceb | Cebuano | id | Indonesian | sk | Slovak |
cs | Czech | is | Icelandic | sl | Slovenian |
cy | Welsh | it | Italian | so | Somali |
da | Danish | kk | Kazakh | sq | Albanian |
de | German | ky | Kyrgyz | sr | Serbian |
en | English | la | Latin | ss | Swati |
es | Spanish | lt | Lithuanian | st | Southern Sotho |
et | Estonian | lv | Latvian | sv | Swedish |
eu | Basque | mk | Macedonian | sw | Swahili |
fa | Persian | mn | Mongolian | tl | Tagalog |
fi | Finnish | nb | Norwegian (Bokmål) | tlh | Klingon |
fr | French | ne | Nepali | tn | Tswana |
nl | Dutch | tr | Turkish | ts | Tsonga |
nr | Southern Ndebele | uk | Ukrainian | ur | Urdu |
nso | Northern Sotho | uz | Uzbek | ve | Venda |
pl | Polish | xh | Xhosa | zu | Zulu |
ps | Pashto |
- When you open a buffer, `autolang` asks Tree-sitter for “content nodes”. In a Python file, it gets docstrings; in Markdown, paragraphs.
- It scans the text for exclusive scripts. If it finds CJK characters, it detects Chinese/Japanese immediately. If Cyrillic, it detects Russian/Ukrainian.
- If the text is Latin-based, it calculates the frequency of 3-letter sequences (trigrams) and compares the “distance” against the loaded language profiles. The profile with the lowest distance wins.
:AutolangDetect: Forces a manual detection on the current buffer.:AutolangEnable: Enables auto-detection globally.:AutolangDisable: Disables auto-detection globally.:checkhealth autolang: Checks if Tree-sitter is installed and if trigram files are present.
If the detection seems inaccurate:
- Run
:checkhealth autolangto ensure dependencies are met. - If you are editing a code file (e.g., Python), ensure you have the relevant Tree-sitter parser installed:
:TSInstall python. - If the file is very short (< 30 characters), detection is skipped to avoid false positives.
- Org file processing is very slow. Looks like the complexity of its Tree-sitter is the culprit.
- Algorithm based on N-Gram-Based Text Categorization by William B. Cavnar and John M. Trenkle (1994).
- Trigram data adapted from the kent37/guess-language project.
