Skip to content

Understanding Chi Square

The Chi Square test lets you know whether your data is statistically different from what would be expected by random chance.  We’ll use the following data as an example.

Yes No Total
Female 45 5 50
Male 15 35 50
Total 60 40 100

At first glance, it looks like women are more likely to answer “Yes.” But a good statistician will ask, “is it something we can trust as truly different, or is it due to random variation?” In order to trust it, we calculate the Chi Square. The Chi Square compares the “observed” (actual) data from our sample to a value that we calculate–the “expected” values, or what would happen if there were no difference between the groups other than random chance.

Understanding the expected values

To determine the “expected” values,  you calculate them based on the same sample size and groups. For example, if your observed data comes from a survey, then your expected data needs to match the number of people you surveyed and the number of them that fall into particular groups. In the data given, there are 50 men and 50 women, and 60 “yes” answers and 40 “no” answers. These totals do not change. So, the question is, how would the numbers of  ”yes” or “no” answers change if they were evenly distributed between the men and women?


Yes No Total
Female ? ? 50
Male ? ? 50
Total 60 40 100

For this example, you might be able to figure this out in your head:

Yes No Total
Female 30 20 50
Male 30 20 50
Total 60 40 100

But how can we calculate this? We might need to do so if our numbers were less friendly.

How to calculate the expected values

When you figured out the values for “yes” in your head, you probably looked at the total “yes” answers and then divided by two, to make it equal between the males and females (60/2=30). Another way to think about this is that you multiplied the total number of yeses by the total number of females and then divided by the total in the table (60*50/100=30). While this second way might seem more complicated, it allows us to analyze data where the two groups are not equal in number.

So, follow this pattern for each cell. Let’s do the Female=No cell:

(TotalNo * TotalFemale) / TotalTable

( 40 * 50 ) / 100 = 20

Now, if we combine our values into one table, we have something like this:

Yes No Total
Female E: 30
O: 45
E: 20
O: 5
E&O: 50
Male E: 30
O: 15
E: 20
O: 35
E&O: 50
Total E&O: 60 E&O: 40 E&O: 100

Finding the difference squared and the Chi square

Once we have all the Expected values, we need to find the difference squared (so they’re all positive) between the individual cells’ expected and observed values:

D = ((O – E)2 / E)

Yes No Total
Female E: 30
O: 45
D: 7.50
E: 20
O: 5
D: 11.25
E&O: 50
Male E: 30
O: 15
D: 7.50
E: 20
O: 35
D: 11.25
E&O: 50
Total E&O: 60 E&O: 40 E&O: 100
D: 37.5

Adding all the differences, we get a calculated Chi square of 37.5. But we’re not done yet.

Looking up the value on the Chi square table

Many statistics rely on a concept called degrees of freedom (d.f.). Generally speaking, the d.f. are based on the number of variables involved in a calculation. For Chi Square, the d.f. are:

df = (# rows – 1) * (# columns – 1)
= ( 2 – 1) * ( 2 – 1) = 1

Now we have everything we need to find the probability that the observed data differs from the expected data.

  • Observed and expected values
  • Total Chi Square = 37.5
  • Degrees of freedom = 1

To determine this, we look up the probability and critical value on a Chi square table. Our critical value (the value above which we decide that our observed data is significantly different from the expected) is determined by our alpha level. For this problem, let’s use 0.05, which is very common.

TO FIND THE CRITICAL VALUE:

Find the row with your d.f. on the left and then go to the column under .05 (the alpha level). In this case, it’s 3.841. A calculated Chi square above this value is statistically significantly different from any variation we expect to happen from random chance. In other words, based on this sample, we can conclude that women are more likely to answer “Yes.”

TO FIND THE PROBABILITY:

To find the probability (p-value), go back to the Chi square table. Find the row with your d.f. on the left and then go to the column that is closest to your calculated Chi square. Notice that the highest value we can see on the table is 7.879, much lower than our calculated Chi square of 37.5. The 7.879 is in the column underneath a probability of 0.005.  In other words, there is a 0.5% likelihood of getting a score of 7.879. Since our 37.5 is larger, that means it is less than 0.5% likely that we would get that score by random chance.

 

Post a Comment

Your email is never published nor shared.