Skip to content

Conversation

oserikov
Copy link

@oserikov oserikov commented Nov 1, 2018

The project on turkic phonetics and NNs interpretation.

no timeline, abstract, intro, references yet
@oserikov
Copy link
Author

oserikov commented Nov 1, 2018

work in progress

@ftyers
Copy link
Owner

ftyers commented Nov 4, 2018

Good so far. I'd like to use Dynet as the NN backend if possible. Would that be ok?

@oserikov
Copy link
Author

oserikov commented Nov 6, 2018

Yes, why not :) I also wanted to play with pyTorch, so if it'll fit and I'll have free time I'd reimplement Dynet part on pyTorch. BTW, why Dynet?

@ftyers
Copy link
Owner

ftyers commented Nov 6, 2018

I've heard DyNet trains faster on CPU compared to tensorflow/theano/etc. In addition, it's probably a bit easier to install, and doesn't require non-free software like CUDA. :)

@oserikov
Copy link
Author

oserikov commented Nov 6, 2018

Oh, got it. It will be interesting to compare pyTorch and Dynet performance then! And finally CUDA seems to be non-open-source, but freware, so the available use cases are not really obvious for me :\

@ftyers
Copy link
Owner

ftyers commented Nov 7, 2018

Oh, got it. It will be interesting to compare pyTorch and Dynet performance then! And finally CUDA seems to be non-open-source, but freware, so the available use cases are not really obvious for me :\

Yes, that would be. Btw, when I say "non-free" I'm referring to free software as defined by the FSF (see here), I don't mean бесплатный :)

* Vizualization skill.

#### Sub-goals
* 1 week| Reproduce the dataset used in original paper
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The input to the network is a series of sequentially presented phonemes from a corpus of 602 Turkish words. "

This shouldn't take any time at all. I can provide you with the words.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this week the input data reproduction took ~3 days, and there still being some questions unanswered, so I think that weekly buffer to deal with the possible problems with the data collection could be helpful.

### EP requirements

#### Sub-goals
* 1 week| Collect the data to repeat the research on different languages data
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For which languages do you have phonemes sequences? Just asking, it was interesting to know the best way to collect data like that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much any of the Turkic languages, you can do something like:

$ cat apertium-tur//apertium-tur.tur.lexc  |\
 grep -v '^!' |\
 grep '[^<> ]\+:[^<> ]\+ \(N[^PU]\|V\)[^ ]\+ ;' | cut -f1 -d':' |\
 sort -Ru | head -1000

From apertium-tur. I'm happy to generate the data for you.

@oserikov
Copy link
Author

I plan to start working on the proj ~ 19 november -- i'll spend a couple of days setting up dependencies and reading guides, so ~21th november is a good day to start, isn't it? Following the timeline I should finish EP before the start of the 3rd module in HSE

@ftyers
Copy link
Owner

ftyers commented Nov 13, 2018

Great! Just let me know when you need some data. If it goes well, it could be an ACL short paper (deadline 4th March). :)

@oserikov
Copy link
Author

Pretty much any of the Turkic languages, you can do something like:

$ cat apertium-tur//apertium-tur.tur.lexc  |\
 grep -v '^!' |\
 grep '[^<> ]\+:[^<> ]\+ \(N[^PU]\|V\)[^ ]\+ ;' | cut -f1 -d':' |\
 sort -Ru | head -1000

From apertium-tur.

That command looks to catch the lexemes, but aren't the NNs described on the paper waiting for phonemes, not characters?

@ftyers
Copy link
Owner

ftyers commented Nov 20, 2018

@oserikov She says (p.2): "However, this phenomenon of consonant harmony can clearly not be considered in this study, as the two allophones for these consonants are represented by the same phoneme in the input data." This suggests that she is using just the surface characters not phonemes. We should do the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants