Adair, G. (1984). The hawthorne effect: A reconsideration of the methodological artifact. Journal of Applied Psychology, 69, 334–345.
Agresti, A. (1996). An introduction to categorical data analysis. Wiley.
Agresti, A. (2002). Categorical data analysis (2nd ed.). Wiley.
Anscombe, F. J. (1973). Graphs in statistical analysis. American Statistician, 27, 17–21.
Bickel, P. J., Hammel, E. A., & O’Connell, J. W. (1975). Sex bias in graduate admissions: Data from Berkeley. Science, 187, 398–404.
Box, J. F. (1987). Guinness, gosset, fisher, and small samples. Statistical Science, 2, 45–52.
Brown, M. B., & Forsythe, A. B. (1974). Robust tests for equality of variances. Journal of the American Statistical Association, 69, 364–367.
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Houghton Mifflin.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum.
Cramér, H. (1946). Mathematical methods of statistics. Princeton University Press.
Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press.
Evans, J. St. B. T., Barston, J. L., & Pollard, P. (1983). On the conflict between logic and belief in syllogistic reasoning. Memory and Cognition, 11, 295–306.
Fisher, R. A. (1922a). On the interpretation of \(\chi^2\) from contingency tables, and the calculation of \(p\). Journal of the Royal Statistical Society, 84, 87–94.
Fisher, R. A. (1922b). On the mathematical foundation of theoretical statistics. Philosophical Transactions of the Royal Society A, 222, 309–368.
Fisher, R. A. (1925). Statistical methods for research workers. Oliver; Boyd.
Fox, J., & Weisberg, S. (2011). An R companion to applied regression (2nd ed.). Sage.
Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60, 328–331.
Hays, W. L. (1994). Statistics (5th ed.). Harcourt Brace.
Hogg, R. V., McKean, J. V., & Craig, A. T. (2005). Introduction to mathematical statistics (6th ed.). Pearson.
Hothersall, D. (2004). History of psychology. McGraw-Hill.
Hsu, J. C. (1996). Multiple comparisons: Theory and methods. Chapman; Hall.
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Med, 2(8), 697–701.
Jeffreys, H. (1961). The theory of probability (3rd ed.). Oxford.
Johnson, V. E. (2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 48, 19313–19317.
Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 237–251.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.
Keynes, J. M. (1923). A tract on monetary reform. Macmillan; Company.
Krajcsi, A. (2021). Advancing best practices in data analysis with automatic and optimized output data analysis software [Preprint]. PsyArXiv.
Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47, 583–621.
Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. Public Library of Science One, 9, 1–8.
Larntz, K. (1978). Small-sample comparisons of exact levels for chi-squared goodness-of-fit statistics. Journal of the American Statistical Association, 73, 253–263.
Lehmann, E. L. (2011). Fisher, Neyman, and the creation of classical statistics. Springer.
Levene, H. (1960). Robust tests for equality of variances. In I. O. et al (Ed.), Contributions to probability and statistics: Essays in honor of harold hotelling (pp. 278–292). Stanford University Press.
Lyon, J. D., & Tsai, C.-L. (1996). A comparison of tests for heteroscedasticity. Journal of the Royal Statistical Society: Series D (The Statistician), 45(3), 337–349.
McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: The case of \(r\) and \(d\). Psychological Methods, 11, 386–401.
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12, 153–157.
Meehl, P. H. (1967). Theory testing in psychology and physics: A methodological paradox. Philosophy of Science, 34, 103–115.
Merriam-Webster. (2022). Petrichor, cromulent, and other words the internet loves.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50, 157–175.
Pfungst, O. (1911). Clever hans (the horse of mr. Von osten): A contribution to experimental animal and human psychology (C. L. Rahn, Trans.). Henry Holt.
Rosenthal, R. (1966). Experimenter effects in behavioral research. Appleton.
Sahai, H., & Ageel, M. I. (2000). The analysis of variance: Fixed, random and mixed models. Birkhauser.
Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52, 591–611.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677–680.
Stigler, S. M. (1986). The history of statistics. Harvard University Press.
Student, A. (1908). The probable error of a mean. Biometrika, 6, 1–2.
Welch, B. L. (1947). The generalization of Student’s” problem when several different population variances are involved. Biometrika, 34, 28–35.
Yates, F. (1934). Contingency tables involving small numbers and the \(\chi^2\) test. Supplement to the Journal of the Royal Statistical Society, 1, 217–235.