Chapter 3 Automatic statistical analysis

Danielle Navarro; Róbert Fodor

To produce strong research results, it is important to have good data analysis, which can make or break a research project. In psychological research, this includes: identifying and selecting a good design from many alternative design options, sampling participants appropriately, observing ethical standards in data collection and documentation, controlling for confounding variables, selecting appropriate statistics and replications of analyses, analysing data with appropriate software, assessing the effect sizes of results based on their inferential characteristics, reporting findings accurately in papers – just to name a few. Some of these tasks can be automated to ensure that the appropriate protocol is followed, which is why we have CogStat, and why you are reading this book.

Neglecting early protocols can lead to errors. For example, using a tool designed for normally distributed data on a highly skewed data set will be inappropriate. In this book, we will cover why this characteristic of our data set matters. Ignoring the foundations of hypothesis testing tools will harm the reliability of the research and, in turn, the reliability and prestige of psychology as a science.

There are several manual statistical programs, such as SPSS, SAS, and Stata, and some programming languages used for data analysis, such as R and Python. These tools can be time-consuming and require multiple steps for simple tasks like creating tables and graphs, not to mention the time spent on learning how to use them. One benefit of automatic statistical software is that it is programmed to follow all necessary steps as part of its protocol. A good automatic statistical software will apply the most appropriate tools based on current statistical consensus.

In this book, you will learn what it means to do normality and heteroscedasticity checks, calculate effect sizes, and performing hypothesis tests etc. We focus on making sure you understand what they are, why they matter, and how to interpret them. We will indulge in presenting mathematical formulas as well, but you won’t have to do any of the calculations. To be fair, you don’t have to manually calculate the respective metrics in a manual statistical program either, but deciding which tool to pick up first, or which step to take next might be a challenge if statistics is not in your veins. Beyond the anxiety of how to even get started with manual statistical programs, applying a single tool can take up to tens of steps. In comparison, producing the full analysis with multiple tools with supporting charts and graphs may only take three steps in CogStat.

There is a high pressure on researchers to produce a large volume of work with statistically positive results (Krajcsi, 2021), and they often have too little time for rigorous data analysis and interpretation. At the same time, journals set somewhat arbitrary standards (e.g. specific $p$ -values are demanded whether or not they avoid Type I errors – this is one of the topics in this book), so some researchers might just use those as rules of thumb without going deeper into data analysis. In recent years, the reliability of psychological science has been questioned (Open Science Collaboration, 2015) due to the overstated use of certain statistical measures (e.g., $p$ -values, effect sizes), and the overall lower-than-expected replicability. While automatic analysis cannot solve all of the systematic issues, it does enable researchers to shift their focus to the quality of content. Time spent on data analysis can be redirected towards understanding the implications of the test results for the research question.

For more about the merits of automatic statistical analysis, here are some further reads from Attila Krajcsi, the creator of CogStat:

References

Krajcsi, A. (2021). Advancing best practices in data analysis with automatic and optimized output data analysis software [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/hnmsq

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716