Poor performance when running splitter model

Note that this issue occurs only in #25, not the master branch. The splitting model implemented in 3e48684e9e769d6bb352bfaafd74f89a8b4779a3 performs badly, e.g.:

```
(virtualenv)  $ python -m deep_reference_parser split "Upson MA (2019). This is a reference. In a journal. 16(1) 1-23" -t 
Using TensorFlow backend.
ℹ Using config file:
/home/matthew/Documents/wellcome/deep_reference_parser/deep_reference_parser/configs/2020.3.6_splitting.ini
ℹ Attempting to download model artefacts if they are not found locally
in models/splitting/2020.3.6_splitting/. This may take some time...
✔ Found models/splitting/2020.3.6_splitting/indices.pickle
✔ Found models/splitting/2020.3.6_splitting/weights.h5
✔ Found embeddings/2020.1.1-wellcome-embeddings-300.txt

=============================== Token Results ===============================

    token   label
---------   -----
    Upson   null 
       MA   i-r  
        (   i-r  
     2019   i-r  
        )   i-r  
        .   o    
     This   o    
       is   o    
        a   o    
reference   o    
        .   o    
       In   i-r  
        a   i-r  
  journal   i-r  
        .   o    
     16(1   o    
        )   o    
        1   o    
        -   o    
       23   o   
```

It was expected that this model would perform less well than the model implemented in [2020.3.1](https://github.com/wellcometrust/deep_reference_parser/releases/tag/2020.3.1), however it seems to be worse than expected. 

The new model `2020.3.6` is required to ensure compatibility with the changes implemented in #25. Changes to the Rodrigues data format mean that this model runs in less than one hours, instead of around 16 hours. 

Some experimentation with hyper-parameters is probably all that is needed to bring this model up to scratch, and in any case it is largely superseded by the multitask `split_parse` model. If a high quality splitting model is required immediately, revert to an earlier Pre-release version for now, all of which perform very well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Poor performance when running splitter model #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poor performance when running splitter model #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions