Glossaria.net

Glossary Statistics / Term

Chi-square Statistic

The chi-square statistic is used to measure the agreement between categorical data and a multinomial model that predicts the relative frequency of outcomes in each possible category. Suppose there are n independent trials, each of which can result in one of k possible outcomes. Suppose that in each trial, the probability that outcome i occurs is pi, for i = 1, 2, … , k, and that these probabilities are the same in every trial. The expected number of times outcome 1 occurs in the n trials is n×p1; more generally, the expected number of times outcome i occurs is

expectedi = n×pi.

If the model be correct, we would expect the n trials to result in outcome i about n×pi times, give or take a bit. Let observedi denote the number of times an outcome of type i occurs in the n trials, for i = 1, 2, … , k. The chi-squared statistic summarizes the discrepancies between the expected number of times each outcome occurs (assuming that the model is true) and the observed number of times each outcome occurs, by summing the squares of the discrepancies, normalized by the expected numbers, over all the categories:

chi-squared =

(observed1 − expected1)2/expected1 + (observed2 − expected2)2/expected2 + … + (observedk − expectedk)2/expectedk.

As the sample size n increases, if the model is correct, the sampling distribution of the chi-squared statistic is approximated increasingly well by the chi-squared curve with

(#categories − 1) = k − 1

degrees of freedom (d.f.), in the sense that the chance that the chi-squared statistic is in any given range grows closer and closer to the area under the Chi-Squared curve over the same range. This page illustrates the sampling distribution of the chi-square statistic.

Permanent link Chi-square Statistic - Creation date 2021-08-07


< Chi-square curve Glossary / Statistics Class Boundary >