## chi-squared tests

Consider a set of 10 measurements of leaf-size:
{*x*_{1}, *x*_{2}, ..., *x*_{10}}.
where *x*_{1} is the size of the first leaf, etc.
According to some expert, leaf sizes are supposed to be "normally"
distributed with mean µ and standard deviation .
Knowing all these numbers you could now calculate the quantity
known as chi-square: .

where in this case there are 10 *x* values,
so *k*=10. (This formula says: find how each *x* deviates
from the mean µ, square each difference, add up all the squared-differences
and divide by the standard deviation squared.) More general versions
of this formula would allow different means and standard deviations
for each measurement.

Roughly speaking we expect the measurements to deviate from the mean
by the standard deviation, so: |(*x*_{i}-µ)| is about
the same thing as . Thus in calculating chi-square
we'd end up adding up 10 numbers that would be near 1. More
generally we expect to approximately
equal *k*, the number of data points. If chi-square is
"a lot" bigger than expected something is wrong. Thus one purpose
of chi-square is to compare observed results with expected results
and see if the result is likely.

*X*^{2}: a version of to test expected distribution

Click here to calculate *X*^{2}
In biology the most common application for chi-squared is in
comparing observed counts of particular cases to the expected counts.
For example, the willow tree (*Salix*) is dioecious, that is,
like people (and unlike most plants) a willow tree will have just
male or female sex organs. One might expect that half of the willows
are male and half female. If you examine *N* willow trees
and count that *x*_{1} of them are male and
*x*_{2} of them are female, you will probably
not find that exactly *x*_{1}=½*N* and
*x*_{2}=½*N*. Is the difference
significant enough to rule out the 50/50 hypothesis? We could
almost calculate the chi-squared, but we don't know the standard
deviation for each count. Never fear: most counts are distributed
according to the Poisson distribution, and as such the standard deviation
equals the square root of the expected count. Thus we can
calculate *X*^{2}:

In our simple willow example there are just two cases so *k*=2,
and the expected results are: *E*_{1}=½*N* and
*E*_{2}=½*N*. Note that the *E*_{i}
are generally not whole numbers even though the counts *x*_{i}
must be whole numbers. If there were more cases (say *k* cases),
we would need to know the probability *p*_{i} for each case
and then we could calculate each
*E*_{i}=*p*_{i}*N*, where *N*
is determined by finding the total of the counts:

Finally it should be noted that the technical differences
between a Poisson distribution and a normal distribution cause
problems for small *E*_{i}. As a rule of thumb,
avoid using *X*^{2} if any *E*_{i}
is less than 5. If *k* is large this technical difficulty
is mitigated.