Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions documents/how_to_add_new_language.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ The following steps with help you identify files that need to be added or change
NOTE: Take a look at [PR #40](https://github.com/unicode-org/inflection/pull/40) and [PR #111](https://github.com/unicode-org/inflection/pull/111) for example on how to add initial language support based on dictionary lookup only.
In general, to bootstrap your progress look for grammatically similar language that's already supported, e.g. if you are adding Serbian look for existing Russian implementation.
This will help you find most of the files you need to add/change and will speed up implementation of the rules and lexicons.
We recommend you spend around a week researching the language and all the different components of the language before even beginning to modify and add the files below. Look at all the files in the project such as tokenizers, configuration files, grammar files, and different lookup functions to see what you need. This will save you a lot of time in the end. We highly suggest you stray away from hardcoded logic and rely on the Dictionary Lookup. Look at all the grammemes, tokenizer logic, and multi-word phrase handling.

Before you add new language support, go to the README.md in the inflection subfolder (inflection/inflection/README.md), build the project, and make sure all the tests run on your computer.

## Mark your language as supported
* UPDATE: inflection/src/inflection/util/LocaleUtils.hpp
Expand All @@ -29,13 +32,13 @@ TODO: We need to expand what each of these do.
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer.hpp
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer.cpp
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.hpp
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.hpp
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.cpp
* UPDATE: inflection/src/inflection/grammar/synthesis/GrammarSynthesizerFactory.cpp
* UPDATE: inflection/src/inflection/grammar/synthesis/fwd.hpp

## Add language specific properties for lists, quantities and related topics
* ADD: inflection/src/inflection/dialog/language/*Xx*CommonConceptFactory.hpp
* ADD: inflection/src/inflection/dialog/language/*Xx*CommonConceptFactory.hpp
* ADD: inflection/src/inflection/dialog/language/*Xx*CommonConceptFactory.cpp
* UPDATE: inflection/src/inflection/dialog/language/fwd.hpp

## Define and create lexion
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ dictionary_he.lst filter=lfs diff=lfs merge=lfs -text
dictionary_hi.lst filter=lfs diff=lfs merge=lfs -text
dictionary_it.lst filter=lfs diff=lfs merge=lfs -text
dictionary_ko.lst filter=lfs diff=lfs merge=lfs -text
dictionary_ml.lst filter=lfs diff=lfs merge=lfs -text
dictionary_nb.lst filter=lfs diff=lfs merge=lfs -text
dictionary_nl.lst filter=lfs diff=lfs merge=lfs -text
dictionary_pt.lst filter=lfs diff=lfs merge=lfs -text
Expand All @@ -23,6 +24,7 @@ inflectional_fr.xml filter=lfs diff=lfs merge=lfs -text
inflectional_he.xml filter=lfs diff=lfs merge=lfs -text
inflectional_hi.xml filter=lfs diff=lfs merge=lfs -text
inflectional_it.xml filter=lfs diff=lfs merge=lfs -text
inflectional_ml.xml filter=lfs diff=lfs merge=lfs -text
inflectional_nb.xml filter=lfs diff=lfs merge=lfs -text
inflectional_nl.xml filter=lfs diff=lfs merge=lfs -text
inflectional_pt.xml filter=lfs diff=lfs merge=lfs -text
Expand Down
Git LFS file not shown
Git LFS file not shown
91 changes: 91 additions & 0 deletions inflection/resources/org/unicode/inflection/features/grammar.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1624,6 +1624,97 @@
</category>
</grammar>
</language>
<language id="ml">
<grammar>
<category name="case">
<grammeme name="nominative"/> <!-- no explicit marker; subject form -->
<grammeme name="accusative"/> <!-- -യെ, -ായെ, marks direct object -->
<grammeme name="genitive"/> <!-- -ന്റെ, -യുടെ (possessive) -->
<grammeme name="dative"/> <!-- -ക്ക്, -ന് (to/for) -->
<grammeme name="instrumental"/> <!-- -ആല് (by means of) -->
<grammeme name="locative"/> <!-- -യില് (in/at) -->
<grammeme name="sociative"/> <!-- -ഓടു് (along with) -->
</category>
<category name="number">
<grammeme name="singular"/>
<grammeme name="plural"/>
</category>
<category name="animacy">
<grammeme name="animate"/>
<grammeme name="inanimate"/>
<grammeme name="human"/>
</category>
<category name="person">
<restrictions>
<restriction name="pos" value="pronoun"/>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="first"/>
<grammeme name="second"/>
<grammeme name="third"/>
</category>
<category name="gender">
<restrictions>
<restriction name="pos" value="pronoun"/>
<restriction name="pos" value="verb"/>
<restriction name="pos" value="noun"/>
</restrictions>
<grammeme name="masculine"/>
<grammeme name="feminine"/>
<grammeme name="neuter"/> <!-- e.g. for objects or animals -->
</category>
<category name="tense">
<restrictions>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="past"/>
<grammeme name="present"/>
<grammeme name="future"/>
</category>
<category name="form">
<grammeme name="infinitive"/>
<grammeme name="participle"/>
</category>
<category name="determination">
<restrictions>
<restriction name="pos" value="pronoun"/>
<restriction name="case" value="genitive"/>
</restrictions>
<grammeme name="independent"/> <!-- e.g. mine -->
<grammeme name="dependent"/> <!-- e.g. my {object} -->
</category>
<category name="mood">
<restrictions>
<restriction name="pos" value="verb"/>
</restrictions>
<grammeme name="indicative"/>
<grammeme name="imperative"/>
<grammeme name="subjunctive"/>
</category>
<category name="pronounType">
<restrictions>
<restriction name="pos" value="pronoun"/>
</restrictions>
<grammeme name="personal"/> <!-- regular pronouns like ഞാൻ, നീ -->
<grammeme name="reflexive"/> <!-- e.g. താൻ, തങ്ങൾ -->
</category>
<category name="formality">
<restrictions>
<restriction name="pos" value="verb"/>
<restriction name="pos" value="pronoun"/>
</restrictions>
<grammeme name="formal"/>
<grammeme name="informal"/>
</category>
<category name="clusivity">
<restrictions>
<restriction name="pos" value="pronoun"/>
</restrictions>
<grammeme name="inclusive"/>
<grammeme name="exclusive"/>
</category>
</grammar>
</language>
<language id="ms">
<grammar>
<category name="clusivity">
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
അവൻ,third,singular,nominative,masculine
അവൾ,third,singular,nominative,feminine
അത്,third,singular,nominative,neuter
അവനെ,third,singular,accusative,masculine
അവന്റെ,third,singular,genitive,masculine,determination=dependent
അവന്റെത്,third,singular,genitive,masculine,determination=independent
അവളെ,third,singular,accusative,feminine
അവളുടെ,third,singular,genitive,feminine,determination=dependent
അവളുടേതു്,third,singular,genitive,feminine,determination=independent
അതിനെ,third,singular,accusative,neuter
അതിന്റെ,third,singular,genitive,neuter,determination=dependent
അതിന്റേതു്,third,singular,genitive,neuter,determination=independent
അവനിൽ,third,singular,locative,masculine
അവനാൽ,third,singular,instrumental,masculine
അവനോടു്,third,singular,sociative,masculine
അവളിൽ,third,singular,locative,feminine
അവളാൽ,third,singular,instrumental,feminine
അവളോടു്,third,singular,sociative,feminine
അതിൽ,third,singular,locative,neuter
അതാൽ,third,singular,instrumental,neuter
അതോടു്,third,singular,sociative,neuter
അവർ,third,plural,nominative
അവരെ,third,plural,accusative
അവരുടെ,third,plural,genitive,determination=dependent
അവരുടേതു്,third,plural,genitive,determination=independent
അവരിൽ,third,plural,locative
അവരാൽ,third,plural,instrumental
അവരോടു്,third,plural,sociative
നീ,second,singular,nominative,informal
താങ്കൾ,second,singular,nominative,formal
നിനക്ക്,second,singular,dative,informal
താങ്കൾക്ക്,second,singular,dative,formal
നിനെ,second,singular,accusative,informal
താങ്കളെ,second,singular,accusative,formal
നിന്റെ,second,singular,genitive,informal,determination=dependent
നിന്റേതു്,second,singular,genitive,informal,determination=independent
താങ്കളുടെ,second,singular,genitive,formal,determination=dependent
താങ്കളുടേതു്,second,singular,genitive,formal,determination=independent
നിനിൽ,second,singular,locative,informal
നിനാൽ,second,singular,instrumental,informal
നിനോടു്,second,singular,sociative,informal
താങ്കളിൽ,second,singular,locative,formal
താങ്കളാൽ,second,singular,instrumental,formal
താങ്കളോടു്,second,singular,sociative,formal
നിങ്ങൾ,second,plural,nominative,formal
നിങ്ങളെ,second,plural,accusative,formal
നിങ്ങൾക്ക്,second,plural,dative,formal
നിങ്ങളുടെ,second,plural,genitive,formal,determination=dependent
നിങ്ങളുടേതു്,second,plural,genitive,formal,determination=independent
നിങ്ങളിൽ,second,plural,locative,formal
നിങ്ങളാൽ,second,plural,instrumental,formal
നിങ്ങളോടു്,second,plural,sociative,formal
ഞാൻ,first,singular,nominative,exclusive
എനിക്ക്,first,singular,dative
നമുക്ക്,first,plural,dative,inclusive
എന്നെ,first,singular,accusative,exclusive
നമ്മെ,first,plural,accusative,inclusive
എന്റെ,first,singular,genitive,determination=dependent,exclusive
എന്റേത്,first,singular,genitive,determination=independent,exclusive
എന്നിൽ,first,singular,locative
എന്നാൽ,first,singular,instrumental
എന്നോടു്,first,singular,sociative
ഞങ്ങൾ,first,plural,nominative,exclusive
നാം,first,plural,nominative,inclusive
ഞങ്ങളെ,first,plural,accusative,exclusive
ഞങ്ങൾക്ക്,first,plural,dative,exclusive
ഞങ്ങളുടെ,first,plural,genitive,exclusive,determination=dependent
ഞങ്ങളുടേത്,first,plural,genitive,exclusive,determination=independent
നമ്മുടെ,first,plural,genitive,inclusive,determination=dependent
നമ്മുടേതു്,first,plural,genitive,inclusive,determination=independent
ഞങ്ങളിലു്,first,plural,locative,exclusive
ഞങ്ങളാൽ,first,plural,instrumental,exclusive
ഞങ്ങളോടു്,first,plural,sociative,exclusive
താൻ,third,singular,nominative,reflexive
തങ്ങൾ,third,plural,nominative,formal,reflexive
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ locale.group.it=it_IT,it_CH
locale.group.ja=ja_JP
locale.group.ko=ko_KR
locale.group.ms=ms_MY
locale.group.ml=ml_IN
locale.group.nb=nb_NO
locale.group.nl=nl_NL,nl_BE
locale.group.pt=pt_BR,pt_PT
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#
# Copyright 2025 Unicode Incorporated and others. All rights reserved.
#
tokenizer.implementation.class=DefaultTokenizer
tokenizer.nonDecompound.file=/org/unicode/inflection/tokenizer/ml/nondecompound.tok
tokenizer.decompound=(ശ്രീ)(.+?)(ഗുരു|സര്‍ക്കാര്‍)|(.+?)(ഗുരു|സര്‍ക്കാര്‍|ഉണ്ട്|ആണ്|ഇല്ല|ഒടൊപ്പം|ഉടൻ|ഓടെ|ഓട്|ഒപ്പം|തന്നെ|പോലും|പോലെ|ഉം|യ്|കളുടെ|ങ്ങളുടെ|ത്തിന്റെ|ൻ്റെ|ന്റെ|യുടേ|യുടെ|യാൽ|യിൽ|ഇൽ|ല്|ൽ|ക്ക്|മാർ|ങ്ങൾ|കൾ|നെ|യെ)

Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
അമ്മ
അച്ഛൻ
അച്ഛി
അമ്മൻ
മകൻ
മകൾ
കുട്ടി
കുട്ടികൾ
ആൺകുട്ടി
ആൺകുട്ടികൾ
പെൺകുട്ടി
പെൺകുട്ടികൾ
കഥ
ചിത്രം
ചിത്രങ്ങൾ
ഗ്രന്ഥം
ഗ്രന്ഥങ്ങൾ
മക്കൾ
ഞാൻ
നീ
നിങ്ങൾ
അവൻ
അവൾ
അവ
അവർ
ഇത്
അത്
ഇവ
അവ
ശ്രീ
നാരായണ
ഗുരു
കേരളം
സര്‍ക്കാര്‍
കേരളസര്‍ക്കാര്‍
2 changes: 1 addition & 1 deletion inflection/src/inflection/dialog/PronounConcept.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ PronounConcept::PronounConcept(const SemanticFeatureModel& model, std::u16string
for (int32_t idx = 0; idx < pronounData->numValues(); idx++) {
const auto& pronounEntry = pronounData->getPronounEntry(idx);
std::u16string_view displayString(pronounEntry.first);
if (displayString.back() == u' ') {
if (!displayString.empty() && displayString.back() == u' ') {
displayString.remove_suffix(1);
}
auto status = U_ZERO_ERROR;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* Copyright 2025 Unicode Incorporated and others. All rights reserved.
*/

#include <inflection/dialog/language/MlCommonConceptFactory.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <inflection/dialog/Plurality.hpp>

namespace inflection::dialog::language {

// In Malayalam, numbers generally follow the noun
::inflection::dialog::SpeakableString
MlCommonConceptFactory::quantifiedJoin(const ::inflection::dialog::SpeakableString& formattedNumber,
const ::inflection::dialog::SpeakableString& nounPhrase,
const ::std::u16string& /*measureWord*/,
Plurality::Rule countType) const
{
::inflection::dialog::SpeakableString space(u" ");
if (countType == Plurality::Rule::ONE) {
return nounPhrase + space + formattedNumber;
}
return formattedNumber + space + nounPhrase;
}

} // namespace inflection::dialog::language
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
* Copyright 2025 Unicode Incorporated and others. All rights reserved.
*/
#pragma once

#include <inflection/dialog/language/fwd.hpp>
#include <inflection/dialog/CommonConceptFactoryImpl.hpp>
#include <inflection/grammar/synthesis/fwd.hpp>
#include <inflection/dialog/Plurality.hpp>

namespace inflection::dialog::language {

class MlCommonConceptFactory : public CommonConceptFactoryImpl {
using super = CommonConceptFactoryImpl;

public:
explicit MlCommonConceptFactory(const ::inflection::util::ULocale& language);
~MlCommonConceptFactory() override;

protected:
::inflection::dialog::SpeakableString quantifiedJoin(
const ::inflection::dialog::SpeakableString& formattedNumber,
const ::inflection::dialog::SpeakableString& nounPhrase,
const ::std::u16string& measureWord,
::inflection::dialog::Plurality::Rule countType) const override;
};

} // namespace inflection::dialog::language
2 changes: 2 additions & 0 deletions inflection/src/inflection/dialog/language/fwd.hpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
/*
* Copyright 2025 Unicode Incorporated and others. All rights reserved.
* Copyright 2017-2024 Apple Inc. All rights reserved.
*/
// Forward declarations for inflection.dialog.language
Expand Down Expand Up @@ -28,6 +29,7 @@ namespace inflection
class JaCommonConceptFactory;
class KoCommonConceptFactory;
class KoCommonConceptFactory_KoAndList;
class MlCommonConceptFactory;
class MsCommonConceptFactory;
class NbCommonConceptFactory;
class NlCommonConceptFactory;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
/*
* Copyright 2025 Unicode Incorporated and others. All rights reserved.
* Copyright 2017-2024 Apple Inc. All rights reserved.
*/
#include <inflection/grammar/synthesis/GrammarSynthesizerFactory.hpp>
Expand All @@ -13,6 +14,7 @@
#include <inflection/grammar/synthesis/HiGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/ItGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/KoGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/MlGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/NbGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/NlGrammarSynthesizer.hpp>
#include <inflection/grammar/synthesis/PtGrammarSynthesizer.hpp>
Expand Down Expand Up @@ -41,6 +43,7 @@ static const ::std::map<::inflection::util::ULocale, addSemanticFeatures>& GRAMM
{::inflection::util::LocaleUtils::HINDI(), &HiGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::ITALIAN(), &ItGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::KOREAN(), &KoGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::MALAYALAM(), &MlGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::NORWEGIAN(), &NbGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::DUTCH(), &NlGrammarSynthesizer::addSemanticFeatures},
{::inflection::util::LocaleUtils::PORTUGUESE(), &PtGrammarSynthesizer::addSemanticFeatures},
Expand Down
Loading
Loading