-
Notifications
You must be signed in to change notification settings - Fork 530
Conversation
Codecov Report
@@ Coverage Diff @@
## numpy #1269 +/- ##
==========================================
- Coverage 82.53% 82.42% -0.12%
==========================================
Files 38 38
Lines 5446 5491 +45
==========================================
+ Hits 4495 4526 +31
- Misses 951 965 +14
|
It's unclear what part of the PR is related to apache/mxnet#18717 and what part are unrelated changes. I suggest to focus on making the apache/mxnet#18717 feature available in MXNet instead of making the GluonNLP code more complex to workaround the missing feature. What do you think? Is there any overlap between the weights "Store the converted weights sperately for backbone and masked language model"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you cite apache/mxnet#18717 could you please check if apache/mxnet#18749 addresses the feature request and if this PR should be adapted? Thanks
@leezu There is a huge overlap between 'model.param' and 'model-mlm.params' which could be eliminated by apache/mxnet#18749 by only storing 'model-mlm.params' and load it with In addition, this PR also solves the problem that the current roberta model does not handle MLM task properly. That is, the MLM model takes the same input ( gluon-nlp/src/gluonnlp/models/bert.py Line 476 in e78a24e
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocked by apache/mxnet#18749
Thanks @zheyuye. Yes, I recommend to adding features in the MXNet side instead of adopting a problematic workaround for GluonNLP |
Alright, I am going to combine backbone and it's mlm models together in this PR once |
@leezu Why is it blocked by apache/mxnet#18749 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sxjscience it adds a workaround for a missing feature in MXNet. We should improve MXNet serialization format instead of adding workarounds. In any case, let's add the workaround now and remove it later again.
Description
Refactor reoberta model following apache/mxnet#18717
Changes
Comments
@sxjscience @hymzoque