Autolang.nvim

An intelligent, context-aware plugin for Neovim that automatically detects the natural language of the current buffer and adjusts the spelllang option.

Unlike naive language detectors, it uses Tree-sitter to extract only the relevant text (comments, strings, and prose), ignoring code syntax to prevent false positives. It employs the Trigram-based text categorization for high accuracy even on short texts.

Motivation

This is a simple plugin to address a simple need. But it gets the job done.

My main use is for hopping between files in different languages. Something non english speaking programmers do a lot between english and a mother language. So, to avoid annoying set spelllang every time — or lots and lots of even more annoying spelling errors — autolang.nvim seems a decent relief for a somewhat scratch-able itch.

Features

Uses Tree-sitter to analyze only comments, docstrings, and prose. It ignores keywords like function, import, or var that confuse standard detectors.
Uses Trigram N-Grams profiles (based on the Cavnar & Trenkle 1994 paper) to distinguish between linguistically close languages (e.g., pt_BR vs. pt_PT).
Fast startup with “Fail-Fast” Unicode script detection (e.g., immediately identifies Chinese/CJK or Russian/Cyrillic without running complex analysis).
Bundled with profiles for over 60 languages and dialects.
Written in Pure Lua. No Python, Node, or CLI tools required.
Supports interactive mode (ask before changing language).

Installation

Lazy.nvim (Recommended)

{
  "suderio/autolang.nvim", -- Replace with local path or git repo
  event = { "BufReadPost", "BufWritePost" },
  config = function()
    require("autolang").setup({
        -- Your custom config here (optional)
                             })
  end
}

Packer.nvim

use {
  "suderio/autolang.nvim",
  config = function()
    require("autolang").setup()
  end
}

Vim-Plug

Plug 'suderio/autolang.nvim'
" Add to your init.lua or init.vim:
" lua require('autolang').setup()

Configuration

The configuration maps the detected language profile (keys) to your Neovim spelllang setting (values).

require("autolang").setup({
    -- Enable auto-detection
    auto_detect = true,

    -- Interactive Mode:
    -- false: Changes spelllang silently (default)
    -- true: Opens a prompt asking if you want to change the language
    interactive = false,

    -- How many lines of "human text" to analyze.
    -- Since we use Tree-sitter to strip code, 50 lines is usually enough.
    -- Sometimes it is useful to change it to 100.
    lines_to_check = 50,

    -- Limit the detection to specific languages.
    -- OPTIONAL: If nil, checks against ALL languaages (60+) defined in 'lang_mapping'.
    -- This is likely the best way to improve performance.
    limit_languages = { "en", "pt_BR" },
                         })

Supported Capabilities

1. Context-Aware Filetypes (Tree-sitter)

The plugin understands the structure of these files. It extracts text from specific nodes (e.g., `comment`, `string`, `paragraph`) and ignores code.

Category	Filetypes
Markup & Prose	`markdown`, `typst`, `org`, `html`, `xml`, `latex`, `gitcommit`
Web Dev	`javascript`, `typescript`, `tsx`, `javascriptreact`, `css`, `scss`, `json`, `yaml`, `toml`
Backend/Sys	`lua`, `python`, `rust`, `go`, `c`, `cpp`, `java`, `bash`

The complete list of supportedd filetypes can be found at https://github.com/suderio/autolang.nvim/tree/queries/.

You can add or change support for languages yourself. See `:h autolang-queries` in Nvim.

Note: If a filetype is not listed or Tree-sitter is missing, the plugin falls back to analyzing the raw lines of the file.

2. Supported Languages (Trigram Profiles)

The plugin includes pre-calculated statistical profiles for 64 languages and dialects. Use the Code column as the key in your `lang_mapping` configuration.

Code	Language	Code	Language	Code	Language
`af`	Afrikaans	`ha`	Hausa	`pt`	Portuguese (General)
`ar`	Arabic	`haw`	Hawaiian	`pt_BR`	Portuguese (Brazil)
`az`	Azerbaijani	`hi`	Hindi	`pt_PT`	Portuguese (Portugal)
`bg`	Bulgarian	`hr`	Croatian	`ro`	Romanian
`ca`	Catalan	`hu`	Hungarian	`ru`	Russian
`ceb`	Cebuano	`id`	Indonesian	`sk`	Slovak
`cs`	Czech	`is`	Icelandic	`sl`	Slovenian
`cy`	Welsh	`it`	Italian	`so`	Somali
`da`	Danish	`kk`	Kazakh	`sq`	Albanian
`de`	German	`ky`	Kyrgyz	`sr`	Serbian
`en`	English	`la`	Latin	`ss`	Swati
`es`	Spanish	`lt`	Lithuanian	`st`	Southern Sotho
`et`	Estonian	`lv`	Latvian	`sv`	Swedish
`eu`	Basque	`mk`	Macedonian	`sw`	Swahili
`fa`	Persian	`mn`	Mongolian	`tl`	Tagalog
`fi`	Finnish	`nb`	Norwegian (Bokmål)	`tlh`	Klingon
`fr`	French	`ne`	Nepali	`tn`	Tswana
`nl`	Dutch	`tr`	Turkish	`ts`	Tsonga
`nr`	Southern Ndebele	`uk`	Ukrainian	`ur`	Urdu
`nso`	Northern Sotho	`uz`	Uzbek	`ve`	Venda
`pl`	Polish	`xh`	Xhosa	`zu`	Zulu
`ps`	Pashto

How it Works

When you open a buffer, `autolang` asks Tree-sitter for “content nodes”. In a Python file, it gets docstrings; in Markdown, paragraphs.
It scans the text for exclusive scripts. If it finds CJK characters, it detects Chinese/Japanese immediately. If Cyrillic, it detects Russian/Ukrainian.
If the text is Latin-based, it calculates the frequency of 3-letter sequences (trigrams) and compares the “distance” against the loaded language profiles. The profile with the lowest distance wins.

Commands

:AutolangDetect : Forces a manual detection on the current buffer.
:AutolangEnable : Enables auto-detection globally.
:AutolangDisable : Disables auto-detection globally.
:checkhealth autolang : Checks if Tree-sitter is installed and if trigram files are present.

Troubleshooting

If the detection seems inaccurate:

Run :checkhealth autolang to ensure dependencies are met.
If you are editing a code file (e.g., Python), ensure you have the relevant Tree-sitter parser installed: :TSInstall python.
If the file is very short (< 30 characters), detection is skipped to avoid false positives.

Known Problems

Org file processing is very slow. Looks like the complexity of its Tree-sitter is the culprit.

Credits

Algorithm based on N-Gram-Based Text Categorization by William B. Cavnar and John M. Trenkle (1994).
Trigram data adapted from the kent37/guess-language project.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
assets		assets
doc		doc
lua/autolang		lua/autolang
queries		queries
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.org		README.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autolang.nvim

Motivation

Features

Installation

Lazy.nvim (Recommended)

Packer.nvim

Vim-Plug

Configuration

Supported Capabilities

1. Context-Aware Filetypes (Tree-sitter)

2. Supported Languages (Trigram Profiles)

How it Works

Commands

Troubleshooting

Known Problems

Credits

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autolang.nvim

Motivation

Features

Installation

Lazy.nvim (Recommended)

Packer.nvim

Vim-Plug

Configuration

Supported Capabilities

1. Context-Aware Filetypes (Tree-sitter)

2. Supported Languages (Trigram Profiles)

How it Works

Commands

Troubleshooting

Known Problems

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages