-
Notifications
You must be signed in to change notification settings - Fork 29
[WIP] project proposal #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
no timeline, abstract, intro, references yet
work in progress |
Good so far. I'd like to use Dynet as the NN backend if possible. Would that be ok? |
Yes, why not :) I also wanted to play with pyTorch, so if it'll fit and I'll have free time I'd reimplement Dynet part on pyTorch. BTW, why Dynet? |
I've heard DyNet trains faster on CPU compared to tensorflow/theano/etc. In addition, it's probably a bit easier to install, and doesn't require non-free software like CUDA. :) |
Oh, got it. It will be interesting to compare pyTorch and Dynet performance then! And finally CUDA seems to be non-open-source, but freware, so the available use cases are not really obvious for me :\ |
Yes, that would be. Btw, when I say "non-free" I'm referring to free software as defined by the FSF (see here), I don't mean бесплатный :) |
2018-komp-ling/projects/serikov.md
Outdated
* Vizualization skill. | ||
|
||
#### Sub-goals | ||
* 1 week| Reproduce the dataset used in original paper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The input to the network is a series of sequentially presented phonemes from a corpus of 602 Turkish words. "
This shouldn't take any time at all. I can provide you with the words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this week the input data reproduction took ~3 days, and there still being some questions unanswered, so I think that weekly buffer to deal with the possible problems with the data collection could be helpful.
2018-komp-ling/projects/serikov.md
Outdated
### EP requirements | ||
|
||
#### Sub-goals | ||
* 1 week| Collect the data to repeat the research on different languages data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For which languages do you have phonemes sequences? Just asking, it was interesting to know the best way to collect data like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much any of the Turkic languages, you can do something like:
$ cat apertium-tur//apertium-tur.tur.lexc |\
grep -v '^!' |\
grep '[^<> ]\+:[^<> ]\+ \(N[^PU]\|V\)[^ ]\+ ;' | cut -f1 -d':' |\
sort -Ru | head -1000
From apertium-tur. I'm happy to generate the data for you.
I plan to start working on the proj ~ 19 november -- i'll spend a couple of days setting up dependencies and reading guides, so ~21th november is a good day to start, isn't it? Following the timeline I should finish EP before the start of the 3rd module in HSE |
Great! Just let me know when you need some data. If it goes well, it could be an ACL short paper (deadline 4th March). :) |
That command looks to catch the lexemes, but aren't the NNs described on the paper waiting for phonemes, not characters? |
@oserikov She says (p.2): "However, this phenomenon of consonant harmony can clearly not be considered in this study, as the two allophones for these consonants are represented by the same phoneme in the input data." This suggests that she is using just the surface characters not phonemes. We should do the same. |
The project on turkic phonetics and NNs interpretation.