# Chapter 18 Summary guide

This is a non-comprehensive summary guide about the tools used.

## Descriptive statistics

Meaning | |
---|---|

Mean | Average – the “centre of gravity” of the data |

Standard deviation | How clustered is the data around the mean (smaller figure means more clustered, larger figure closer to interquartile range means more spread out) |

Skewness | The assymetry of the data compared to a normal distribution (bell curve) |

Kurtosis | Pointiness of the data. Smaller figure means more pointy, larger figure means less pointy |

Range | The spread of the data set between the maximum and minimum values |

Maximum | The highest value in the data set |

Upper quartile | 25% of the data points reside at and above this value |

Median | This is the value of the data point in the middle (or the average of the two middle points in case of even number of data points). 50-50% of data points reside at above and below this value |

Lower quartile | 25% of the data points reside at and below this value |

Minimum | The lowest value in the data set |

## Analysing differences with hyptothesis testing

Test | Description | Example from the chapter |
---|---|---|

CATEGORICAL DATA |
||

\(\chi^2\) goodness‑of‑fit test | Tests whether the observed data fits a theoretical distribution | Probability of card suits being random |

\(\chi^2\) test for independence | Tests whether two distributions are the same | Probability that a member of species \(j\) gives response \(i\) to aliens |

Fisher’s exact test | Tests whether two distributions are the same (small sample sizes and contingency tables | Witches’ happiness being burned in Salem |

McNemar test of marginal homogeneity | 2 by 2 contingency table with nominal data | Whether the frequencies of suit choices are different the second time than the first time |

NON-CATEGORICAL DATA |
||

One group in one situation |
||

One-sample z‑test | Tests whether the mean is equal to a given hypothetical value (you know the population standard deviation and your sample size is large) | The mean of grades in Dr Zeppo’s class is above 67.5 |

One-sample t-test or Student’s t-test | Tests whether the mean is equal to a given hypothetical value (you only have an estimated standard deviation, you cannot speak for the population) | The mean of grades in Dr Zeppo’s class is above 67.5 |

Two groups in one situation |
||

Independent samples t-test (Student’s) | Comparing two groups: Tests whether the means of two independent samples are equal (assumes normal distribution) | The mean of grades of students tutored by Anastasia and Bernadette in Dr Harpo’s class |

Independent samples t-test (Welch) | Comparing two groups: Tests whether the means of two independent samples are equal (assumes unequal variances) | The mean of grades of students tutored by Anastasia and Bernadette in Dr Harpo’s class (with an alternative data set that violates normality assumption) |

Two-sample Wilcoxon test (Mann-Whitney test) | Tests whether the means of two independent samples are equal (assumes non-normal distribution) | AFL games attendance if the game is finals or not |

One group in two situations |
||

Paired samples t-test | Tests whether the means of two samples are equal (assumes normal distribution) | Changes to the mean of grades between the first test and the second test in Dr Chico’s class |

Wilcoxon signed-rank test (paired) | Tests whether the means of two paired samples are equal (assumes non-normal distribution) | Whether a biology class has any effect on the happiness of students |

Three or more groups |
||

One-way ANOVA | Tests whether the means of three or more independent samples are equal (assumes normal distribution) | Effects of 3 drugs on mood in our clinical trial |

Kruskal-Wallis rank sum test | Tests whether the means of three or more independent samples are equal (assumes non-normal distribution) | Effects of 3 drugs on mood in our clinical trial (adjusted data set) |

Factorial ANOVA | Tests whether the means of three or more independent samples are equal but with more than one grouping variable | Effects of 3 drugs and 2 therapy methods on mood in our clinical trial |

## Analysing relationship with hyptothesis testing

Test | Description | Example from the chapter |
---|---|---|

Pearson’s correlation coefficient | Tests whether two variables are correlated (continuous data) | Correlation between the number of hours of slept and mood |

Spearman’s rank-order correlation | Tests whether two variables are correlated (ordinal data) | Correlation between effort and grade |

Linear regression | Tests whether a linear relationship exists between two variables (continuous data) | Correlation between the number of hours of sleep (parent, baby) and parent grumpiness |