Skip to content
This repository was archived by the owner on Jun 8, 2023. It is now read-only.
/ langdetect Public archive
forked from Mimino666/langdetect

Port of Google's language-detection library to Python.

License

Notifications You must be signed in to change notification settings

pelucid/langdetect

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

langdetect

This is a GI Internal Fork of the PyPi-linked langdetect repo (that one is itself a port of Nakatani Shuyo's language-detection library (version from 03/03/2014) to Python). The reason for the fork is that in order to deploy this package on EMR, we had to wrap several of the auxilliary files into python scripts.

Installation

$ pip install git+ssh://[email protected]/pelucid/langdetect.git

Supported Python versions 2.7, 3.4+.

Languages

langdetect supports 55 languages out of the box (ISO 639-1 codes):

af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he,
hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl,
pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw

Basic usage

To detect the language of the text:

>>> from langdetect import detect
>>> detect("War doesn't show who's right, just who's left.")
'en'
>>> detect("Ein, zwei, drei, vier")
'de'

To find out the probabilities for the top languages:

>>> from langdetect import detect_langs
>>> detect_langs("Otec matka syn.")
[sk:0.572770823327, pl:0.292872522702, cs:0.134356653968]

NOTE

Language detection algorithm is non-deterministic, which means that if you try to run it on a text which is either too short or too ambiguous, you might get different results everytime you run it.

To enforce consistent results, call following code before the first language detection:

from langdetect import DetectorFactory
DetectorFactory.seed = 0

Anything else?

For more details, please see the original repo that this was forked from

About

Port of Google's language-detection library to Python.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%