Blocker, Alexander W., and Xiao-Li Meng. 2013.
“The Potential and Perils of Preprocessing: Building New Foundations.” Bernoulli 19 (4).
https://doi.org/10.3150/13-BEJSP16.
Gebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021.
“Datasheets for Datasets.” Communications of the ACM 64 (12): 86–92.
https://doi.org/10.1145/3458723.
Kandel, Sean, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011.
“Wrangler: Interactive Visual Specification of Data Transformation Scripts.” In
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 3363–72.
Vancouver BC Canada:
ACM.
https://doi.org/10.1145/1978942.1979444.
Kołczyńska, Marta. 2022.
“Combining Multiple Survey Sources: A Reproducible Workflow and Toolbox for Survey Data Harmonization.” Methodological Innovations 15 (1): 62–72.
https://doi.org/10.1177/20597991221077923.
Liu, Yang, Alex Kale, Tim Althoff, and Jeffrey Heer. 2021.
“Boba: Authoring and Visualizing Multiverse Analyses.” IEEE Transactions on Visualization and Computer Graphics 27 (2): 1753–63.
https://doi.org/10.1109/TVCG.2020.3028985.
Lucchesi, Lydia R., Petra M. Kuhnert, Jenny L. Davis, and Lexing Xie. 2022.
“Smallset Timelines: A Visual Representation of Data Preprocessing Decisions.” In
2022 ACM Conference on Fairness, Accountability, and Transparency, 1136–53. Seoul Republic of Korea: ACM.
https://doi.org/10.1145/3531146.3533175.
Meng, Xiao-Li. 2014.
“A Trio of Inference Problems That Could Win You a Nobel Prize in Statistics (If You Help Fund It).” In
Past, Present, and Future of Statistical Science, edited by Xihong Lin, Christian Genest, David L. Banks, Geert Molenberghs, David W. Scott, and Jane-Ling Wang, 0th ed., 561–86.
Chapman and Hall/CRC.
https://doi.org/10.1201/b16720-52.
Peng, Roger D. 2011.
“Reproducible Research in Computational Science.” Science 334 (6060): 1226–27.
https://doi.org/10.1126/science.1213847.
Sarma, Abhraneel, Jessica Hullman, Kyle Hwang, and Matthew Kay. 2018. “Milliways: Taming Multiverses Through Principled Evaluation of Data Analysis Paths.”
Steegen, Sara, Francis Tuerlinckx, Andrew Gelman, and Wolf Vanpaemel. 2016.
“Increasing Transparency Through a Multiverse Analysis.” Perspectives on Psychological Science 11 (5): 702–12.
https://doi.org/10.1177/1745691616658637.
van der Loo, Mark P. J., and Edwin de Jonge. 2021.
“Data Validation Infrastructure for R.” Journal of Statistical Software 97 (10).
https://doi.org/10.18637/jss.v097.i10.
Wickham, Hadley. 2014.
“Tidy Data.” Journal of Statistical Software 59 (10).
https://doi.org/10.18637/jss.v059.i10.
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016.
“The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 160018.
https://doi.org/10.1038/sdata.2016.18.