Data quality management in big data: Strategies, tools, and educational implications

Loading...
Thumbnail Image

Access rights

openAccess
CC BY
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Date

2025-06

Major/Subject

Mcode

Degree programme

Language

en

Pages

Series

Journal of Parallel and Distributed Computing, Volume 200

Abstract

This study addresses the critical need for effective Big Data Quality Management (BDQM) in education, a field where data quality has profound implications but remains underexplored. The work systematically progresses from requirement analysis and standard development to the deployment of tools for monitoring and enhancing data quality in big data workflows. The study's contributions are substantiated through five research questions that explore the impact of data quality on analytics, the establishment of evaluation standards, centralized management strategies, improvement techniques, and education-specific BDQM adaptations. By addressing these questions, the research advances both theoretical and practical frameworks, equipping stakeholders with the tools to enhance the reliability and efficiency of data-driven educational initiatives. Integrating Artificial Intelligence (AI) and distributed computing, this research introduces a novel multi-stage BDQM framework that emphasizes data quality assessment, centralized governance, and AI-enhanced improvement techniques. This work underscores the transformative potential of robust BDQM systems in supporting informed decision-making and achieving sustainable outcomes in educational projects. The survey findings highlight the potential for automated data management within big data architectures, suggesting that data quality frameworks can be significantly enhanced by leveraging AI and distributed computing. Additionally, the survey emphasizes emerging trends in big data quality management, specifically (i) automated data cleaning and cleansing and (ii) data enrichment and augmentation.

Description

Publisher Copyright: © 2025 The Authors

Keywords

Artificial intelligence, Big data, Data quality, Distributed computing, Education projects

Other note

Citation

Nguyen, T, Nguyen, H T & Nguyen-Hoang, T A 2025, ' Data quality management in big data: Strategies, tools, and educational implications ', Journal of Parallel and Distributed Computing, vol. 200, 105067 . https://doi.org/10.1016/j.jpdc.2025.105067