Skip to content

hw1 babakov #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
522f6f3
Babakov Nikolay mcl2018 hw1
bbkjunior Oct 30, 2018
6ba2b4f
add explanation
bbkjunior Oct 30, 2018
f10b624
quiz correction
bbkjunior Oct 31, 2018
f9f69b3
final changes for maxmatch
bbkjunior Nov 1, 2018
8b831f2
chernovik
bbkjunior Nov 11, 2018
e40c447
chst
bbkjunior Nov 11, 2018
f28009a
response
bbkjunior Nov 12, 2018
6be3d36
files
bbkjunior Nov 12, 2018
03b702b
proposal
bbkjunior Nov 12, 2018
8427b37
response
bbkjunior Nov 13, 2018
dc918bf
Merge branch 'master' of https://github.com/bbkjunior/ftyers.github.io
bbkjunior Nov 13, 2018
044a43c
123
bbkjunior Nov 18, 2018
2340228
changed tex file to md because tex turned out to be unreadable
bbkjunior Dec 2, 2018
68080cc
unigram
bbkjunior Dec 2, 2018
2081405
report update
bbkjunior Dec 6, 2018
7b1a9b1
quiz 3
bbkjunior Dec 12, 2018
9df27e1
Merge branch 'master' of https://github.com/bbkjunior/ftyers.github.io
bbkjunior Dec 12, 2018
e75af68
quiz3
bbkjunior Dec 12, 2018
311d629
begin last hw
bbkjunior Dec 26, 2018
43f7d5c
proposal update
bbkjunior Jan 7, 2019
7b3658f
Resolved merge conflict by incorporating both suggestions.
bbkjunior Jan 7, 2019
fef5a60
data
bbkjunior Jan 7, 2019
89f0d00
comment
bbkjunior Jan 7, 2019
120aec4
getting ready
bbkjunior Jan 13, 2019
e6c8c50
ohe feat
bbkjunior Jan 21, 2019
1f61295
prediction
bbkjunior Jan 23, 2019
06dd80c
ready to use model with explanation
bbkjunior Jan 23, 2019
8430133
readme
bbkjunior Jan 28, 2019
ede5963
ud
bbkjunior Jan 28, 2019
edb779e
Update README.md
bbkjunior Jan 28, 2019
0f52390
Update README.md
bbkjunior Jan 28, 2019
3a37dc0
hw
bbkjunior Mar 25, 2019
9a67bac
hw
bbkjunior Mar 26, 2019
d67dfa5
hw
bbkjunior Mar 26, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[utf8]{inputenc}
\usepackage[russian]{babel}

\title{Finite-state Morphology}
<<<<<<< HEAD
\author{Nikolay Babakov}
=======
\author{Nikolay Babakov }
>>>>>>> dc918bfc8c12eafb0e80ec50fea9f00385b53cd7
\date{November 2018}

\begin{document}

\maketitle Tasks overview

\section{Archiphonemes}
On this stage we should apply new Case to our lexc file. I made these changes

Multichar_Symbols
%<ins%> ! Творительный падеж
%{A%} ! Архифонема а/е

This let me receive necessary output
$ hfst-lexc chv.lexc | hfst-fst2strings
hfst-lexc: warning: Defaulting to OpenFst tropical type
Root...1 CASES...1 PLURAL...2 N...1 Nouns...
пакча<n><ins>:пакча>п{A}
пакча<n><pl><ins>:пакча>сем>п{A}
урам<n><ins>:урам>п{A}
урам<n><pl><ins>:урам>сем>п{A}
канаш<n><ins>:канаш>п{A}
канаш<n><pl><ins>:канаш>сем>п{A}
хула<n><ins>:хула>п{A}
хула<n><pl><ins>:хула>сем>п{A}


\section{Phonological rules}
Tried different rules, exported them to corresponding files (left_right_rule,left_rule etc) , pasted original text and marked changed examples with “!!!” for better visualization

\section{Phonological rules}
Changed lexc file according to the instructions

LEXICON CASES

%<ins%>:%>п%{A%} # ;
%<gen%>:%>%{Ă%}н # ;

Multichar_Symbols
%<gen%> ! Дательный падеж

This let me get necessary output
<<<<<<< HEAD
=======
Received necessary result
>>>>>>> dc918bfc8c12eafb0e80ec50fea9f00385b53cd7

$ hfst-fst2strings chv.lexc.hfst | grep урам | grep gen
урам<n><gen>:урам>{Ă}н
урам<n><pl><gen>:урам>се{м}>{Ă}н

After modifying lexc file I also added %{м%} archyphonem and implemented the rule for deletion

"Case for deletion {м} arcyphoneme"
%{м%}:0 <=> _ %>: %{Ă%}: н ;

And here is my implementation of Back vowel harmony with exceptions

"Back vowel harmony for archiphoneme {Ă}"
%{Ă%}:ӑ <=> BackVow: [ ArchiCns: | Cns: | %>: ]+ _ ;
except
%{м%}: %>: _ н ;
Vow: %>: _ н ;

\section{Productive derivation}
<<<<<<< HEAD
I was stuck on this stage
=======
I am finally stuck on this stage
>>>>>>> dc918bfc8c12eafb0e80ec50fea9f00385b53cd7
I have updated my chv.lexc as instructed and run the following makefile
all:
./hfst-lexc chv.lexc.txt -o chv.lexc.hfst
./hfst-twolc chv.twol -o chv.twol.hfst
./hfst-compose-intersect -1 chv.lexc.hfst -2 chv.twol.hfst -o chv.gen.hfst
./hfst-invert chv.gen.hfst -o chv.mor.hfst

<<<<<<< HEAD
The reason for this issue was absent nominative case
So I added

%<nom%>:%> # ;

and everything started working well

\section{Loan words}
In this section the changes I have performed were as follows

Multichar_Symbols

%{ъ%}

Alphabet
%{ъ%}:0

Rules
"Non surface {м} if following %{Ă%}: followed by н"
%{ъ%}:0 <=> _ %>: %{Ă%}: н ;

After that I got necessary output

./hfst-fst2strings chv.gen.hfst | grep gen | grep специалист
специалист<n><gen>:специалистӑн

\section{Modified files overview}
Here I will provide the overview of my results
chv.lexc

FILE CONTENT
Multichar_Symbols

%<n%> ! Имя существительное
%<pl%> ! Множественное число
%<nom%> ! Именительный падеж
%<ins%> ! Творительный падеж
%<gen%> ! Дательный падеж
%{A%} ! Архифонема [а] или [е]
%{Ă%} ! Архифонема [а] или [е]
%> ! Граница морфемы
%{м%} ! Родительный archiphoneme
%<der_лӑх%> !производный суффикс
%{ъ%}
MY COMMENT
This is a list of multichar symbols. These symbols are namely the units which can turn out to be different leeters or absence of the letter at all according to the rules we will state.

FILE CONTENT
LEXICON Root
Nouns ;

MY COMMENT
Here we define the class of words we are going to define

FILE CONTENT
LEXICON CASES
%<nom%>:%> # ;
%<ins%>:%>п%{A%} # ;
%<gen%>:%>%{Ă%}н # ;

MY COMMENT
Here we state the list of possible cases and define how are the words changed when each case is applied

FILE CONTENT
LEXICON PLURAL
CASES ;
%<pl%>:%>се%{м%} CASES ;
MY COMMENT
Here we define plural and singular form of the words in use

FILE CONTENT
LEXICON SUBST
CASES ;
PLURAL ;

MY COMMENT
We define rules which can be applied to productive derivation

FILE CONTENT
LEXICON DER-N
%<der_лӑх%>:%>л%{Ă%}х SUBST "weight: 1.0" ;

MY COMMENT
We state rules for derivation and assign weight so if any conflict in rules happen we will know which word should be selected

FILE CONTENT
LEXICON N
%<n%>: PLURAL ;
%<n%>: SUBST ;
%<n%>: DER-N ;

MY COMMENT
Define all main rule which can be applied to our words

FILE CONTENT
LEXICON Nouns
урам:урам N ; ! "улица"
пакча:пакча N ; ! "сад"
хула:хула N ; ! "город"
канаш:канаш N ; ! "совет"
тӗс:тӗс N ; ! "вид"
патша:патша N ; ! "царь"
куҫ:куҫ N ; ! "глаз"
патшалӑх:патшалӑх N ; ! "государство"
специалист:специалист%{ъ%} N ; ! "специалист"

MY COMMENT
The list of words the rules will be applied to.

chv.twol
FILE CONTENT
Alphabet
а ӑ е ё ӗ и о у ӳ ы э ю я б в г д ж з к л м н п р с ҫ т ф х ц ч ш щ й ь ъ
А Ӑ Е Ё Ӗ И О У Ӳ Ы Э Ю Я Б В Г Д Ж З К Л М Н П Р С Ҫ Т Ф Х Ц Ч Ш Щ Й Ь Ъ

MY COMMENT

FILE CONTENT
%{э%}:0 %{л%}:0 %{с%}:0 %{а%}:0
%{A%}:а %{A%}:е
%{Ă%}:ӑ %{Ă%}:ӗ %{Ă%}:0
%{м%}:0 %{м%}:м
%{н%}:н %{н%}:0
%{ъ%}:0
;

MY COMMENT

FILE CONTENT
Sets

Vow = ӑ а ы о у я ё ю ӗ э и ӳ ;

BackVow = ӑ а ы о у я ё ю %{ъ%} ;

FrontVow = ӗ э и ӳ ;

ArchiCns = %{м%} ;

Cns = б в г д ж з к л м н п р с ҫ т ф х ц ч ш щ й ь ъ ;

MY COMMENT
To define rules more compactly we need to unite letters into some sets

FILE CONTENT
Rules

"Remove morpheme boundary"
%>:0 <=> _ ;

"Back vowel harmony for archiphoneme {A}"
%{A%}:а <=> BackVow: [ Cns: | %>: ]+ _ ;

"Back vowel harmony for archiphoneme {Ă}"
%{Ă%}:ӑ <=> BackVow: [ ArchiCns: | Cns: | %>: ]+ _ ;
except
%{м%}: %>: _ н ;
Vow: %>: _ н ;

"Non surface {Ă} in plural genitive"
%{Ă%}:0 <=> [ Vow: | %{м%}: ] %>: _ н ;

"Non surface {м} if following %{Ă%}: followed by н"
%{м%}:0 <=> _ %>: %{Ă%}: н ;

"Non surface {м} if following %{Ă%}: followed by н"
%{ъ%}:0 <=> _ %>: %{Ă%}: н ;

MY COMMENT
The rules which will change the words in certain context
=======
Every time I try to run recommended command I get very strange output and I was not able to do anything with that even though I acted upon telegram's instructions
echo патшалӑх | hfst-lookup -qp chv.mor.hfst
патшалӑх патшалӑх+? inf

All related files in their final state can be found in the same folder where this report is located
>>>>>>> dc918bfc8c12eafb0e80ec50fea9f00385b53cd7

\end{document}
5 changes: 5 additions & 0 deletions 2018-komp-ling/practicals/Finite-state Morphology/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
all:
./hfst-lexc chv.lexc.txt -o chv.lexc.hfst
./hfst-twolc chv.twol -o chv.twol.hfst
./hfst-compose-intersect -1 chv.lexc.hfst -2 chv.twol.hfst -o chv.gen.hfst
./hfst-invert chv.gen.hfst -o chv.mor.hfst
Binary file not shown.
Binary file not shown.
54 changes: 54 additions & 0 deletions 2018-komp-ling/practicals/Finite-state Morphology/chv.lexc.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
Multichar_Symbols

%<n%> ! Имя существительное
%<pl%> ! Множественное число
%<nom%> ! Именительный падеж
%<ins%> ! Творительный падеж
%<gen%> ! Дательный падеж
%{A%} ! Архифонема [а] или [е]
%{Ă%} ! Архифонема [а] или [е]
%> ! Граница морфемы
%{м%} ! Родительный archiphoneme
%<der_лӑх%> !производный суффикс
%{ъ%}

LEXICON Root

Nouns ;

LEXICON CASES

%<nom%>:%> # ;
%<ins%>:%>п%{A%} # ;
%<gen%>:%>%{Ă%}н # ;

LEXICON PLURAL
CASES ;
%<pl%>:%>се%{м%} CASES ;


LEXICON SUBST
CASES ;
PLURAL ;

LEXICON DER-N

%<der_лӑх%>:%>л%{Ă%}х SUBST "weight: 1.0" ;

LEXICON N

%<n%>: PLURAL ;
%<n%>: SUBST ;
%<n%>: DER-N ;

LEXICON Nouns

урам:урам N ; ! "улица"
пакча:пакча N ; ! "сад"
хула:хула N ; ! "город"
канаш:канаш N ; ! "совет"
тӗс:тӗс N ; ! "вид"
патша:патша N ; ! "царь"
куҫ:куҫ N ; ! "глаз"
патшалӑх:патшалӑх N ; ! "государство"
специалист:специалист%{ъ%} N ; ! "специалист"
Binary file not shown.
45 changes: 45 additions & 0 deletions 2018-komp-ling/practicals/Finite-state Morphology/chv.twol
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
Alphabet
а ӑ е ё ӗ и о у ӳ ы э ю я б в г д ж з к л м н п р с ҫ т ф х ц ч ш щ й ь ъ
А Ӑ Е Ё Ӗ И О У Ӳ Ы Э Ю Я Б В Г Д Ж З К Л М Н П Р С Ҫ Т Ф Х Ц Ч Ш Щ Й Ь Ъ
%{э%}:0 %{л%}:0 %{с%}:0 %{а%}:0
%{A%}:а %{A%}:е
%{Ă%}:ӑ %{Ă%}:ӗ %{Ă%}:0
%{м%}:0 %{м%}:м
%{н%}:н %{н%}:0
%{ъ%}:0
;

Sets

Vow = ӑ а ы о у я ё ю ӗ э и ӳ ;

BackVow = ӑ а ы о у я ё ю %{ъ%} ;

FrontVow = ӗ э и ӳ ;

ArchiCns = %{м%} ;

Cns = б в г д ж з к л м н п р с ҫ т ф х ц ч ш щ й ь ъ ;

Rules

"Remove morpheme boundary"
%>:0 <=> _ ;

"Back vowel harmony for archiphoneme {A}"
%{A%}:а <=> BackVow: [ Cns: | %>: ]+ _ ;

"Back vowel harmony for archiphoneme {Ă}"
%{Ă%}:ӑ <=> BackVow: [ ArchiCns: | Cns: | %>: ]+ _ ;
except
%{м%}: %>: _ н ;
Vow: %>: _ н ;

"Non surface {Ă} in plural genitive"
%{Ă%}:0 <=> [ Vow: | %{м%}: ] %>: _ н ;

"Non surface {м} if following %{Ă%}: followed by н"
%{м%}:0 <=> _ %>: %{Ă%}: н ;

"Non surface {м} if following %{Ă%}: followed by н"
%{ъ%}:0 <=> _ %>: %{Ă%}: н ;
Binary file not shown.
Loading