Improving BERT Pretraining with Syntactic Supervision
Loading...
Access rights
openAccess
CC BY
CC BY
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2023
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
Series
Proceedings of the 2023 CLASP Conference on Learning with Small Data, pp. 176-184, CLASP Papers in Computational Linguistics ; Volume 5
Abstract
Bidirectional masked Transformers have become the core theme in the current NLP landscape. Despite their impressive benchmarks, a recurring theme in recent research has been to question such models’ capacity for syntactic generalization. In this work, we seek to address this question by adding a supervised, token-level supertagging objective to standard unsupervised pretraining, enabling the explicit incorporation of syntactic biases into the network’s training dynamics. Our approach is straightforward to implement, induces a marginal computational overhead and is general enough to adapt to a variety of settings. We apply our methodology on Lassy Large, an automatically annotated corpus of written Dutch. Our experiments suggest that our syntax-aware model performs on par with established baselines, despite Lassy Large being one order of magnitude smaller than commonly used corpora.Description
Keywords
Other note
Citation
Tziafas, G, Kogkalidis, K, Wijnholds, G & Moortgat, M 2023, Improving BERT Pretraining with Syntactic Supervision . in Proceedings of the 2023 CLASP Conference on Learning with Small Data . CLASP Papers in Computational Linguistics, vol. 5, Association for Computational Linguistics, pp. 176-184, Learning with Small Data, Gothenburg, Sweden, 11/09/2023 . < https://arxiv.org/abs/2104.10516 >