Skip to content

suderio/autolang.nvim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autolang.nvim

An intelligent, context-aware plugin for Neovim that automatically detects the natural language of the current buffer and adjusts the spelllang option.

Unlike naive language detectors, it uses Tree-sitter to extract only the relevant text (comments, strings, and prose), ignoring code syntax to prevent false positives. It employs the Trigram-based text categorization for high accuracy even on short texts.

assets/autolang.gif

Motivation

This is a simple plugin to address a simple need. But it gets the job done.

My main use is for hopping between files in different languages. Something non english speaking programmers do a lot between english and a mother language. So, to avoid annoying set spelllang every time — or lots and lots of even more annoying spelling errors — autolang.nvim seems a decent relief for a somewhat scratch-able itch.

Features

  • Uses Tree-sitter to analyze only comments, docstrings, and prose. It ignores keywords like function, import, or var that confuse standard detectors.
  • Uses Trigram N-Grams profiles (based on the Cavnar & Trenkle 1994 paper) to distinguish between linguistically close languages (e.g., pt_BR vs. pt_PT).
  • Fast startup with “Fail-Fast” Unicode script detection (e.g., immediately identifies Chinese/CJK or Russian/Cyrillic without running complex analysis).
  • Bundled with profiles for over 60 languages and dialects.
  • Written in Pure Lua. No Python, Node, or CLI tools required.
  • Supports interactive mode (ask before changing language).

Installation

Lazy.nvim (Recommended)

{
  "suderio/autolang.nvim", -- Replace with local path or git repo
  event = { "BufReadPost", "BufWritePost" },
  config = function()
    require("autolang").setup({
        -- Your custom config here (optional)
                             })
  end
}

Packer.nvim

use {
  "suderio/autolang.nvim",
  config = function()
    require("autolang").setup()
  end
}

Vim-Plug

Plug 'suderio/autolang.nvim'
" Add to your init.lua or init.vim:
" lua require('autolang').setup()

Configuration

The configuration maps the detected language profile (keys) to your Neovim spelllang setting (values).

require("autolang").setup({
    -- Enable auto-detection
    auto_detect = true,

    -- Interactive Mode:
    -- false: Changes spelllang silently (default)
    -- true: Opens a prompt asking if you want to change the language
    interactive = false,

    -- How many lines of "human text" to analyze.
    -- Since we use Tree-sitter to strip code, 50 lines is usually enough.
    -- Sometimes it is useful to change it to 100.
    lines_to_check = 50,

    -- Limit the detection to specific languages.
    -- OPTIONAL: If nil, checks against ALL languaages (60+) defined in 'lang_mapping'.
    -- This is likely the best way to improve performance.
    limit_languages = { "en", "pt_BR" },
                         })

Supported Capabilities

1. Context-Aware Filetypes (Tree-sitter)

The plugin understands the structure of these files. It extracts text from specific nodes (e.g., `comment`, `string`, `paragraph`) and ignores code.

CategoryFiletypes
Markup & Prosemarkdown, typst, org, html, xml, latex, gitcommit
Web Devjavascript, typescript, tsx, javascriptreact, css, scss, json, yaml, toml
Backend/Syslua, python, rust, go, c, cpp, java, bash

The complete list of supportedd filetypes can be found at https://github.com/suderio/autolang.nvim/tree/queries/.

You can add or change support for languages yourself. See `:h autolang-queries` in Nvim.

Note: If a filetype is not listed or Tree-sitter is missing, the plugin falls back to analyzing the raw lines of the file.

2. Supported Languages (Trigram Profiles)

The plugin includes pre-calculated statistical profiles for 64 languages and dialects. Use the Code column as the key in your `lang_mapping` configuration.

CodeLanguageCodeLanguageCodeLanguage
afAfrikaanshaHausaptPortuguese (General)
arArabichawHawaiianpt_BRPortuguese (Brazil)
azAzerbaijanihiHindipt_PTPortuguese (Portugal)
bgBulgarianhrCroatianroRomanian
caCatalanhuHungarianruRussian
cebCebuanoidIndonesianskSlovak
csCzechisIcelandicslSlovenian
cyWelshitItaliansoSomali
daDanishkkKazakhsqAlbanian
deGermankyKyrgyzsrSerbian
enEnglishlaLatinssSwati
esSpanishltLithuanianstSouthern Sotho
etEstonianlvLatviansvSwedish
euBasquemkMacedonianswSwahili
faPersianmnMongoliantlTagalog
fiFinnishnbNorwegian (Bokmål)tlhKlingon
frFrenchneNepalitnTswana
nlDutchtrTurkishtsTsonga
nrSouthern NdebeleukUkrainianurUrdu
nsoNorthern SothouzUzbekveVenda
plPolishxhXhosazuZulu
psPashto

How it Works

  1. When you open a buffer, `autolang` asks Tree-sitter for “content nodes”. In a Python file, it gets docstrings; in Markdown, paragraphs.
  2. It scans the text for exclusive scripts. If it finds CJK characters, it detects Chinese/Japanese immediately. If Cyrillic, it detects Russian/Ukrainian.
  3. If the text is Latin-based, it calculates the frequency of 3-letter sequences (trigrams) and compares the “distance” against the loaded language profiles. The profile with the lowest distance wins.

Commands

  • :AutolangDetect : Forces a manual detection on the current buffer.
  • :AutolangEnable : Enables auto-detection globally.
  • :AutolangDisable : Disables auto-detection globally.
  • :checkhealth autolang : Checks if Tree-sitter is installed and if trigram files are present.

Troubleshooting

If the detection seems inaccurate:

  1. Run :checkhealth autolang to ensure dependencies are met.
  2. If you are editing a code file (e.g., Python), ensure you have the relevant Tree-sitter parser installed: :TSInstall python.
  3. If the file is very short (< 30 characters), detection is skipped to avoid false positives.

Known Problems

  • Org file processing is very slow. Looks like the complexity of its Tree-sitter is the culprit.

Credits

Packages

 
 
 

Contributors

Languages