Pali API

A self-hosted REST API for Pali language analysis, powered by the Digital Pali Dictionary (DPD).

Two services in one:

/dpd/ — Dictionary lookup: words, inflections, declensions, compounds, roots, suttas, and full-text search
/analyzer/ — Morphological analyzer: tokenize sentences, resolve sandhi, detect compounds, disambiguate grammar, and produce interlinear glosses

Backed by DPD's SQLite database (88,600+ headwords, 1.28M inflected forms, 753 roots) and Node.js 24's built-in node:sqlite module. No external database services required.

Live API: api.paa.li

Quick Start

# Prerequisites: Node.js >= 24, pnpm

pnpm install
pnpm setup     # Downloads DPD database (~144MB download, ~1.9GB extracted)
pnpm dev       # Starts at http://0.0.0.0:8080

Configuration

Copy .env.example to .env:

PORT=8080
HOST=0.0.0.0
DB_PATH=./data/dpd.db

Analyzer API — `/analyzer/`

The analyzer performs morphological analysis of Pali text. It tokenizes sentences, looks up inflected forms, resolves sandhi (phonological merging), detects compounds, and uses 17 disambiguation heuristics to rank the best grammatical reading.

`POST /analyzer/sentence`

Full sentence analysis — the main endpoint. Tokenizes, analyzes each word, and applies sentence-level disambiguation.

curl -X POST https://api.paa.li/analyzer/sentence \
  -H "Content-Type: application/json" \
  -d '{"text": "Naro ca devo ca gacchanti."}'

{
  "original": "Naro ca devo ca gacchanti.",
  "tokens": [
    {
      "surface": "Naro",
      "analyses": [{
        "lemma": "nara",
        "pos": "noun",
        "inflection": "masc nom sg",
        "tag": "[N.m.sg.nom]",
        "gloss": "man; person"
      }],
      "isPunctuation": false,
      "isSandhi": false,
      "isCompound": false
    },
    {
      "surface": "ca",
      "analyses": [{
        "lemma": "ca",
        "pos": "ind",
        "tag": "[Ind.conj]",
        "gloss": "and"
      }]
    },
    {
      "surface": "devo",
      "analyses": [{
        "lemma": "deva",
        "pos": "noun",
        "inflection": "masc nom sg",
        "tag": "[N.m.sg.nom]",
        "gloss": "god; deity"
      }]
    },
    {
      "surface": "ca",
      "analyses": [{ "tag": "[Ind.conj]" }]
    },
    {
      "surface": "gacchanti",
      "analyses": [{
        "lemma": "gacchati",
        "pos": "verb",
        "inflection": "pr 3rd pl",
        "tag": "[V.pres.act.3pl]",
        "gloss": "goes; walks; moves"
      }]
    },
    { "surface": ".", "isPunctuation": true }
  ]
}

`GET /analyzer/word/:word`

Single-word analysis without sentence-level disambiguation. Returns all possible readings.

curl https://api.paa.li/analyzer/word/dhammassa

Returns analyses for all matching headwords (dative singular, genitive singular, etc.).

`GET /analyzer/sandhi/:form`

Sandhi resolution only. Splits a merged form into its constituent words.

curl https://api.paa.li/analyzer/sandhi/tatrāyaṃ

{
  "query": "tatrāyaṃ",
  "original": "tatrāyaṃ",
  "parts": ["tatra", "ayaṃ"],
  "analyses": [
    [{ "lemma": "tatra", "tag": "[Ind.spat]", "gloss": "there" }],
    [{ "lemma": "ayaṃ", "tag": "[Pron.m.sg.nom]", "gloss": "this" }]
  ]
}

`GET /analyzer/compound/:word`

Compound analysis — identifies compound type and breaks it into components.

curl https://api.paa.li/analyzer/compound/dhammacakka

{
  "query": "dhammacakka",
  "compoundType": "Kammadhāraya",
  "construction": "dhamma + cakka",
  "components": [
    ["dhamma", "adj", "of such nature"],
    ["cakka", "adj", "having a wheel"]
  ]
}

`GET /analyzer/tag`

Generate a textbook-style grammar tag from parameters.

curl 'https://api.paa.li/analyzer/tag?pos=noun&inflection=masc+nom+sg&lemma=nara'

{ "tag": "[N.m.sg.nom]", "pos": "noun", "inflection": "masc nom sg", "lemma": "nara" }

Grammar Tag Format

Tags follow textbook Pali grammar notation:

Tag	Meaning
`[N.m.sg.nom]`	Noun, masculine, singular, nominative
`[N.f.pl.acc]`	Noun, feminine, plural, accusative
`[V.pres.act.3sg]`	Verb, present tense, active voice, 3rd person singular
`[V.aor.act.3pl]`	Verb, aorist, active voice, 3rd person plural
`[PP.m.sg.nom]`	Past participle, masculine, singular, nominative
`[Part.m.sg.nom]`	Present participle, masculine, singular, nominative
`[FPP.m.sg.nom]`	Future passive participle
`[Ger]`	Gerund / absolutive
`[Inf]`	Infinitive
`[Ind.conj]`	Indeclinable, conjunction
`[Ind.neg]`	Indeclinable, negative particle
`[Ind.emph]`	Indeclinable, emphatic particle
`[Kammadh.n.sg.acc]`	Kammadhāraya compound, neuter, singular, accusative

Disambiguation Heuristics

When a word has multiple possible readings, the analyzer ranks them using:

Indeclinable priority — particles (ca, na, eva) are preferred over rare noun readings
POS coherence — avoids duplicate verb readings when another token is the verb
Verb-number agreement — singular verb → prefer singular nominative for subject
Case governance — verb's +acc/+dat boosts matching case readings
Transitivity — transitive verbs boost accusative objects
Passive agent — passive voice boosts instrumental readings
Gen/dat proximity — genitive near nouns, dative near verbs
Position — sentence-initial favors nominative
Frequency — higher EBT corpus frequency ranks higher
Subject-predicate agreement — PP/adj predicates agree with subject gender/number

Sandhi Resolution

The analyzer resolves sandhi (phonological merging) using three strategies:

Deconstructor — DPD's own sandhi split data (highest accuracy)
Particle stripping — detects common suffixed particles (-ti → iti, -pi → api, -va → eva)
Rule-based engine — reverses vowel sandhi, niggahīta assimilation, consonant gemination, and elision

Dictionary API — `/dpd/`

Word Lookup

curl https://api.paa.li/dpd/words/dhamma       # Headword lookup
curl https://api.paa.li/dpd/words/dhammassa     # Inflected form → resolves to headwords
curl https://api.paa.li/dpd/words/id/34626      # By ID

Sub-endpoints

Endpoint	Returns
`GET /dpd/words/:word/grammar`	POS, grammatical description, verb type
`GET /dpd/words/:word/meanings`	Primary, literal, and Buddhadatta meanings
`GET /dpd/words/:word/etymology`	Root, Sanskrit cognate, derivation, construction
`GET /dpd/words/:word/examples`	Sutta citations with source and Pali text
`GET /dpd/words/:word/related`	Synonyms, antonyms, variants, word families
`GET /dpd/words/:word/construction`	Compound type, construction, derivative, suffix
`GET /dpd/words/:word/declension`	Full declension/conjugation table
`GET /dpd/words/:word/suttas`	All sutta references

Verb Conjugation

Returns a structured conjugation table for verb entries, organized by tense → person → number. Accepts both headwords and inflected forms.

curl https://api.paa.li/dpd/verb/gacchati      # Headword
curl https://api.paa.li/dpd/verb/gacchanti      # Inflected form → resolves to headword

{
  "query": "gacchati",
  "results": [
    {
      "id": 12345,
      "lemma": "gacchati",
      "pos": "pr",
      "grammar": "...",
      "meaning": "goes; walks",
      "pattern": "ati pr",
      "conjugation": {
        "present": {
          "third": { "singular": "gacch<b>ati</b>", "plural": "gacch<b>anti</b>" },
          "second": { "singular": "gacch<b>asi</b>", "plural": "gacch<b>atha</b>" },
          "first": { "singular": "gacch<b>āmi</b>", "plural": "gacch<b>āma</b>" }
        },
        "imperative": { ... },
        "optative": { ... }
      }
    }
  ]
}

Non-verb words return 404: GET /dpd/verb/dhamma → { "error": "No verb entries found" }

Noun Declension

Returns a structured declension table for noun/adjective entries, organized by case → number. Accepts headwords and inflected forms.

curl https://api.paa.li/dpd/noun/dhamma         # Headword
curl https://api.paa.li/dpd/noun/dhammassa       # Inflected form → resolves to headword

{
  "query": "dhamma",
  "results": [
    {
      "id": 34626,
      "lemma": "dhamma",
      "pos": "masc",
      "grammar": "...",
      "meaning": "teaching; nature; ...",
      "pattern": "a masc",
      "declension": {
        "nominative": { "singular": "dhamm<b>o</b>", "plural": ["dhamm<b>ā</b>", "dhamm<b>āse</b>"] },
        "accusative": { "singular": "dhamm<b>aṃ</b>", "plural": "dhamm<b>e</b>" },
        "instrumental": { ... },
        "dative": { ... },
        "ablative": { ... },
        "genitive": { ... },
        "locative": { ... },
        "vocative": { ... }
      }
    }
  ]
}

Non-noun words return 404: GET /dpd/noun/gacchati → { "error": "No noun entries found" }

Search

curl 'https://api.paa.li/dpd/search?q=suffering&mode=english'  # English meaning search
curl 'https://api.paa.li/dpd/search?q=bodhi&mode=prefix'       # Prefix match
curl 'https://api.paa.li/dpd/search?q=dhama&mode=fuzzy'        # Fuzzy/typo-tolerant
curl 'https://api.paa.li/dpd/search?q=√bhū&mode=root'          # By root

Mode	Description
`exact`	Exact match in lookup table (handles inflected forms)
`prefix`	Headwords starting with the term
`fuzzy`	Approximate match using edit distance
`english`	Full-text search across English meanings (FTS5)
`root`	All words derived from a Pali root

Compounds, Roots, Suttas

curl https://api.paa.li/dpd/compounds/dhammacakka   # Compound deconstruction
curl https://api.paa.li/dpd/roots                    # All 753 roots
curl https://api.paa.li/dpd/roots/√bhū               # Root detail + derived words + categorized
curl https://api.paa.li/dpd/suttas                   # All sutta sources
curl https://api.paa.li/dpd/suttas/DN1               # Words citing a sutta

The root detail endpoint (/dpd/roots/:root) includes a categorized field that groups derived words by POS category (verbs, nouns, adjectives, participles, indeclinables, other) with counts.

Browse, Stats, Health

curl 'https://api.paa.li/dpd/browse?letter=b&limit=10'
curl https://api.paa.li/dpd/health
curl https://api.paa.li/dpd/stats

Diacritical normalization: ṁ (dot above) and ṃ (dot below) are treated as interchangeable across all endpoints.

Project Structure

pali-api/
├── src/
│   ├── index.ts                    # Express server, route mounting
│   ├── config.ts                   # Environment configuration
│   ├── db/
│   │   └── connection.ts           # SQLite connection (node:sqlite)
│   ├── models/
│   │   └── types.ts                # All TypeScript interfaces
│   ├── dpd/                        # Dictionary API (/dpd/)
│   │   ├── routes/
│   │   │   ├── index.ts            # DPD route aggregator
│   │   │   ├── words.ts, search.ts, roots.ts, compounds.ts, suttas.ts, health.ts
│   │   └── services/
│   │       ├── word-service.ts     # Word lookup + inflection resolution
│   │       ├── search-service.ts   # Multi-mode search
│   │       ├── root-service.ts     # Root queries
│   │       ├── declension-service.ts # Inflection table parser
│   │       ├── compound-service.ts # Compound deconstruction (DPD)
│   │       └── sutta-service.ts    # Sutta cross-references
│   ├── analyzer/                   # Morphological Analyzer API (/analyzer/)
│   │   ├── routes/
│   │   │   └── index.ts            # /analyzer/* routes
│   │   └── services/
│   │       ├── analyzer-service.ts # Pipeline orchestrator
│   │       ├── tokenizer.ts        # Sentence → WordToken[]
│   │       ├── lookup-service.ts   # DB queries for analyzer
│   │       ├── tagger.ts           # Grammar tag generation
│   │       ├── sandhi-service.ts   # Deconstructor-based sandhi
│   │       ├── sandhi-rules.ts     # Rule-based sandhi engine
│   │       ├── compound-analyzer.ts # Compound detection
│   │       └── disambiguator.ts    # 17 disambiguation heuristics
│   ├── utils/
│   │   └── normalize.ts            # Pali diacritical normalization
│   └── middleware/
│       ├── cors.ts
│       └── error-handler.ts
├── scripts/
│   └── setup-db.ts                 # Database download + FTS index builder
├── package.json
├── tsconfig.json
└── .env.example

Tech Stack

Runtime: Node.js 24 with built-in node:sqlite
Language: TypeScript
Framework: Express.js 5
Database: SQLite (DPD's own database, read-only)
Search: SQLite FTS5 for English meaning full-text search

Scripts

Command	Description
`pnpm setup`	Download DPD database and build FTS5 search index
`pnpm dev`	Start dev server with auto-reload
`pnpm build`	Compile TypeScript to `dist/`
`pnpm start`	Run compiled production build

Data Source

Digital Pali Dictionary by the DPD project, licensed under CC BY-NC-SA 4.0.

88,613 headwords
1,277,984 lookup entries (inflected forms)
753 roots
5,026 sutta sources referenced
25,917 compound words

License

ISC (server code). Dictionary data is CC BY-NC-SA 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude		.claude
.omc/state		.omc/state
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pali API

Quick Start

Configuration

Analyzer API — `/analyzer/`

`POST /analyzer/sentence`

`GET /analyzer/word/:word`

`GET /analyzer/sandhi/:form`

`GET /analyzer/compound/:word`

`GET /analyzer/tag`

Grammar Tag Format

Disambiguation Heuristics

Sandhi Resolution

Dictionary API — `/dpd/`

Word Lookup

Sub-endpoints

Verb Conjugation

Noun Declension

Search

Compounds, Roots, Suttas

Browse, Stats, Health

Project Structure

Tech Stack

Scripts

Data Source

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pali API

Quick Start

Configuration

Analyzer API — /analyzer/

POST /analyzer/sentence

GET /analyzer/word/:word

GET /analyzer/sandhi/:form

GET /analyzer/compound/:word

GET /analyzer/tag

Grammar Tag Format

Disambiguation Heuristics

Sandhi Resolution

Dictionary API — /dpd/

Word Lookup

Sub-endpoints

Verb Conjugation

Noun Declension

Search

Compounds, Roots, Suttas

Browse, Stats, Health

Project Structure

Tech Stack

Scripts

Data Source

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Analyzer API — `/analyzer/`

`POST /analyzer/sentence`

`GET /analyzer/word/:word`

`GET /analyzer/sandhi/:form`

`GET /analyzer/compound/:word`

`GET /analyzer/tag`

Dictionary API — `/dpd/`

Packages