https://arxiv.org/abs/2209.08569
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding (Wenjin Wang, Zhengjie Huang, Bin Luo, Qianglong Chen, Qiming Peng, Yinxu Pan, Weichong Yin, Shikun Feng, Yu Sun, Dianhai Yu, Yin Zhang)
바이두 쪽에서도 layout lm을 만들었군요. ernie답게 입력이 복잡합니다.
#layout