Skip to content
This repository was archived by the owner on Dec 22, 2017. It is now read-only.
/ stopwords Public archive

R package that contains stopwords for multiple languages

Notifications You must be signed in to change notification settings

koheiw/stopwords

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-lingual stopwords package in R

stopwords is an R package that provides easy access to stopwords in more than 50 languages in the Stopwords ISO library. This package should be used conjunction with packages such as quanteda to perform text analysis in many different languages.

Supported languages

Currently supported languages are the following:

  • Afrikaans [af]
  • Arabic [ar]
  • Armenian [hy]
  • Basque [eu]
  • Bengali [bn]
  • Breton [br]
  • Bulgarian [bg]
  • Catalan; Valencian [ca]
  • Chinese [zh]
  • Croatian [hr]
  • Czech [cs]
  • Danish [da]
  • Dutch; Flemish [nl]
  • English [en]
  • Esperanto [eo]
  • Estonian [et]
  • Finnish [fi]
  • French [fr]
  • Galician [gl]
  • German [de]
  • Greek, Modern (1453-) [el]
  • Hausa [ha]
  • Hebrew [he]
  • Hindi [hi]
  • Hungarian [hu]
  • Indonesian [id]
  • Irish [ga]
  • Italian [it]
  • Japanese [ja]
  • Korean [ko]
  • Kurdish [ku]
  • Latin [la]
  • Lithuanian [lt]
  • Latvian [lv]
  • Malay [ms]
  • Marathi [mr]
  • Norwegian [no]
  • Persian [fa]
  • Polish [pl]
  • Portuguese [pt]
  • Romanian; Moldavian; Moldovan [ro]
  • Russian [ru]
  • Slovak [sk]
  • Slovenian [sl]
  • Somali [so]
  • Sotho, Southern [st]
  • Spanish; Castilian [es]
  • Swahili [sw]
  • Swedish [sv]
  • Thai [th]
  • Tagalog [tl]
  • Turkish [tr]
  • Ukrainian [uk]
  • Urdu [ur]
  • Vietnamese [vi]
  • Yoruba [yo]
  • Zulu [zu]

How to install

Please just execute the following command to install:

devtools::install_github("koheiw/stopwords")

How to use

The interface of the stopwords package is designed to be consistent with quanteda, but the words are considerably different:

head(quanteda::stopwords('english'), 10)
##  [1] "i"         "me"        "my"        "myself"    "we"       
##  [6] "our"       "ours"      "ourselves" "you"       "your"
head(stopwords::stopwords('en'), 10)
##  [1] "'ll"       "'tis"      "'twas"     "'ve"       "10"       
##  [6] "39"        "a"         "a's"       "able"      "ableabout"

About

R package that contains stopwords for multiple languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages