Chapter 18 Summary guide

Danielle Navarro; Róbert Fodor

This is a non-comprehensive summary guide about the tools used.

Descriptive statistics

	Meaning
Mean	Average – the “centre of gravity” of the data
Standard deviation	How clustered is the data around the mean (smaller figure means more clustered, larger figure closer to interquartile range means more spread out)
Skewness	The assymetry of the data compared to a normal distribution (bell curve)
Kurtosis	Pointiness of the data. Smaller figure means more pointy, larger figure means less pointy
Range	The spread of the data set between the maximum and minimum values
Maximum	The highest value in the data set
Upper quartile	25% of the data points reside at and above this value
Median	This is the value of the data point in the middle (or the average of the two middle points in case of even number of data points). 50-50% of data points reside at above and below this value
Lower quartile	25% of the data points reside at and below this value
Minimum	The lowest value in the data set

Analysing differences with hyptothesis testing

Test	Description	Example from the chapter
CATEGORICAL DATA
$\chi^2$ goodness‑of‑fit test	Tests whether the observed data fits a theoretical distribution	Probability of card suits being random
$\chi^2$ test for independence	Tests whether two distributions are the same	Probability that a member of species $j$ gives response $i$ to aliens
Fisher’s exact test	Tests whether two distributions are the same (small sample sizes and contingency tables	Witches’ happiness being burned in Salem
McNemar test of marginal homogeneity	2 by 2 contingency table with nominal data	Whether the frequencies of suit choices are different the second time than the first time
NON-CATEGORICAL DATA
One group in one situation
One-sample z‑test	Tests whether the mean is equal to a given hypothetical value (you know the population standard deviation and your sample size is large)	The mean of grades in Dr Zeppo’s class is above 67.5
One-sample t-test or Student’s t-test	Tests whether the mean is equal to a given hypothetical value (you only have an estimated standard deviation, you cannot speak for the population)	The mean of grades in Dr Zeppo’s class is above 67.5
Two groups in one situation
Independent samples t-test (Student’s)	Comparing two groups: Tests whether the means of two independent samples are equal (assumes normal distribution)	The mean of grades of students tutored by Anastasia and Bernadette in Dr Harpo’s class
Independent samples t-test (Welch)	Comparing two groups: Tests whether the means of two independent samples are equal (assumes unequal variances)	The mean of grades of students tutored by Anastasia and Bernadette in Dr Harpo’s class (with an alternative data set that violates normality assumption)
Two-sample Wilcoxon test (Mann-Whitney test)	Tests whether the means of two independent samples are equal (assumes non-normal distribution)	AFL games attendance if the game is finals or not
One group in two situations
Paired samples t-test	Tests whether the means of two samples are equal (assumes normal distribution)	Changes to the mean of grades between the first test and the second test in Dr Chico’s class
Wilcoxon signed-rank test (paired)	Tests whether the means of two paired samples are equal (assumes non-normal distribution)	Whether a biology class has any effect on the happiness of students
Three or more groups
One-way ANOVA	Tests whether the means of three or more independent samples are equal (assumes normal distribution)	Effects of 3 drugs on mood in our clinical trial
Kruskal-Wallis rank sum test	Tests whether the means of three or more independent samples are equal (assumes non-normal distribution)	Effects of 3 drugs on mood in our clinical trial (adjusted data set)
Factorial ANOVA	Tests whether the means of three or more independent samples are equal but with more than one grouping variable	Effects of 3 drugs and 2 therapy methods on mood in our clinical trial

Analysing relationship with hyptothesis testing

Test	Description	Example from the chapter
Pearson’s correlation coefficient	Tests whether two variables are correlated (continuous data)	Correlation between the number of hours of slept and mood
Spearman’s rank-order correlation	Tests whether two variables are correlated (ordinal data)	Correlation between effort and grade
Linear regression	Tests whether a linear relationship exists between two variables (continuous data)	Correlation between the number of hours of sleep (parent, baby) and parent grumpiness