Abstract:
Peer review is a critical component of the scientific publishing process, since its results directly influence the decision to publish a research work. Therefore, it is crucial to maintain ethical standards in the peer review process, and this work focuses on one important aspect: the appropriate use of citation recommendations in reviews. This study developed a classification model that identifies reviews with unjustified citation recommendations using NLP methods. To train the model, reviews from ICLR 2021, a top-tier machine learning conference, were manually annotated. It was found that the Multinomial Naive Bayes classifier performed the best among all the classifiers tested, and achieved 82% F1-score, 70% precision and 100% recall for the target class. Moreover, data augmentation techniques and optimal regularization strategies were explored to overcome the dataset's limited size. This classifier could serve as an assistive tool for conference organizers and reviewers. The results of this study provide a starting point for developing a comprehensive solution to ensure adherence to quality and ethical guidelines in peer review.