Research data integrity provides a strong foundation for high quality research outcomes, and it is an essential part of the research data lifecycle due to its critical role in research rigor, reproducibility, replication, and data reuse (the four Rs). Understanding research data integrity is therefore imperative in collaborative interdisciplinary research and collaborative cross-sector research where different norms, procedures, and terminology regarding data exist.
Research data integrity is closely associated with data management, data quality, and data security. Producing data that are reliable, trustworthy, valid, and secure throughout the research process requires purposefully planning for research data integrity and careful consideration of research data lifecycle actions like data acquisition, analysis, and preservation. In addition, purposeful planning enables researchers to conduct rigorous research and generate outcomes that are reproducible, replicable, and reusable. To advance this conversation, we developed two tools: a concept model that visually represents the relationship between data management, data quality, and data security as components of research data integrity, and a schema for implementing these components in practice. We contend that disentangling research data integrity and its components, developing a standardized way of describing their interplay, and intentionally addressing them in the research data lifecycle reduces threats to research data integrity.
In this paper, we break down the complexity of research data integrity to make it more understandable and propose a practical process by which research data integrity can be achieved in a way that is useful for data producers, providers, users, and educators. We position our concept model and schema within the larger dialog around research integrity and data literacy and illuminate the role that research data integrity and its components (data management, data quality, and data security) play in the four Rs. In this paper, we present a concept model and schema for use as tools for instruction/training and practical implementation. Using these tools, we examine the role of research data integrity in rigorous and reproducible research and offer insight into ensuring research data integrity throughout the research process.