Chapter 18 Summary guide

This is a non-comprehensive summary guide about the tools used.

Descriptive statistics

Meaning
Mean Average – the “centre of gravity” of the data
Standard deviation How clustered is the data around the mean (smaller figure means more clustered, larger figure closer to interquartile range means more spread out)
Skewness The assymetry of the data compared to a normal distribution (bell curve)
Kurtosis Pointiness of the data. Smaller figure means more pointy, larger figure means less pointy
Range The spread of the data set between the maximum and minimum values
Maximum The highest value in the data set
Upper quartile 25% of the data points reside at and above this value
Median This is the value of the data point in the middle (or the average of the two middle points in case of even number of data points). 50-50% of data points reside at above and below this value
Lower quartile 25% of the data points reside at and below this value
Minimum The lowest value in the data set

Analysing differences with hyptothesis testing

Test Description Example from the chapter
CATEGORICAL DATA
χ2\chi^2 goodness‑of‑fit test Tests whether the observed data fits a theoretical distribution Probability of card suits being random
χ2\chi^2 test for independence Tests whether two distributions are the same Probability that a member of species jj gives response ii to aliens
Fisher’s exact test Tests whether two distributions are the same (small sample sizes and contingency tables Witches’ happiness being burned in Salem
McNemar test of marginal homogeneity 2 by 2 contingency table with nominal data Whether the frequencies of suit choices are different the second time than the first time
NON-CATEGORICAL DATA
One group in one situation
One-sample z‑test Tests whether the mean is equal to a given hypothetical value (you know the population standard deviation and your sample size is large) The mean of grades in Dr Zeppo’s class is above 67.5
One-sample t-test or Student’s t-test Tests whether the mean is equal to a given hypothetical value (you only have an estimated standard deviation, you cannot speak for the population) The mean of grades in Dr Zeppo’s class is above 67.5
Two groups in one situation
Independent samples t-test (Student’s) Comparing two groups: Tests whether the means of two independent samples are equal (assumes normal distribution) The mean of grades of students tutored by Anastasia and Bernadette in Dr Harpo’s class
Independent samples t-test (Welch) Comparing two groups: Tests whether the means of two independent samples are equal (assumes unequal variances) The mean of grades of students tutored by Anastasia and Bernadette in Dr Harpo’s class (with an alternative data set that violates normality assumption)
Two-sample Wilcoxon test (Mann-Whitney test) Tests whether the means of two independent samples are equal (assumes non-normal distribution) AFL games attendance if the game is finals or not
One group in two situations
Paired samples t-test Tests whether the means of two samples are equal (assumes normal distribution) Changes to the mean of grades between the first test and the second test in Dr Chico’s class
Wilcoxon signed-rank test (paired) Tests whether the means of two paired samples are equal (assumes non-normal distribution) Whether a biology class has any effect on the happiness of students
Three or more groups
One-way ANOVA Tests whether the means of three or more independent samples are equal (assumes normal distribution) Effects of 3 drugs on mood in our clinical trial
Kruskal-Wallis rank sum test Tests whether the means of three or more independent samples are equal (assumes non-normal distribution) Effects of 3 drugs on mood in our clinical trial (adjusted data set)
Factorial ANOVA Tests whether the means of three or more independent samples are equal but with more than one grouping variable Effects of 3 drugs and 2 therapy methods on mood in our clinical trial

Analysing relationship with hyptothesis testing

Test Description Example from the chapter
Pearson’s correlation coefficient Tests whether two variables are correlated (continuous data) Correlation between the number of hours of slept and mood
Spearman’s rank-order correlation Tests whether two variables are correlated (ordinal data) Correlation between effort and grade
Linear regression Tests whether a linear relationship exists between two variables (continuous data) Correlation between the number of hours of sleep (parent, baby) and parent grumpiness