BIOSTATISTICS SERIES |
https://doi.org/10.5005/jp-journals-10028-1618 |
Statistics Corner: Chi-squared Test
1Department of Biostatistics, Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India
2Department of Psychology, Mehr Chand Mahajan DAV College for Women, Chandigarh, India
Corresponding Author: Kamal Kishore, Department of Biostatistics, Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India, Phone: +91 9591349768, e-mail: kkishore.pgi@gmail.com
Received on: 01 February 2023; Accepted on: 25 February 2023; Published on: 10 April 2023
ABSTRACT
It is desirable to collect and analyze quantitative data with parametric tests. Researchers, however, also gather categorical variables such as cured vs non-cured and diseased vs non-diseased. And many times, they convert continuous variables such as body mass index (BMI) to high, regular, and low BMI and quality of life to good, average, and poor categories. When both independent and dependent variables are categorical—Chi-square is a standard test. A researcher designs a study and collects data with many categorical outcome variables. The literature search recommends applying the Chi-squared test. The researcher, however, has a few vital questions related to the Chi-squared test:
What is Yates’ correction?
When to apply Fisher’s exact test?
What is the post hoc Chi-squared test?
How to assess the strength of an association?
How to cite this article: Kishore K, Jaswal V. Statistics Corner: Chi-squared Test. J Postgrad Med Edu Res 2023;57(1):40-44.
Source of support: Nil
Conflict of interest: None
Keywords: Categorical data, Chi-square, Fisher’s exact test, Non-parametric, Test of association.
INTRODUCTION
Health researchers frequently collect nominal variables such as recovered (yes vs no) or diseased (yes vs no) in routine investigations. The t-test and Wilcoxon-Mann-Whitney test discussed in previous articles are ideal for continuous or ordinal outcome variables; however, the same does not apply to nominal data analysis.1,2 Karl Pearson proposed the Chi-squared test in 1900 to analyze nominal data.3 The test has become one of the most popular non-parametric tests due to ease of understanding and calculation. The results from the Chi-squared test are not valid when the sample size is small—the extensions or alternatives are proposed by various researchers.
This manuscript will extend the discussion to analyze, report, and interpret study findings from two independent groups discussed in the previous articles.1,2 The current article will discuss—(1) the problem statement, (2) the Chi-squared test, (3) the types of Chi-squared tests, (4) post hoc Chi-squared tests, (5) the strength of association, and (6) the interpretation and reporting of study findings. We will begin by framing a research question. All data analysis was conducted using R Commander (Rcmdr)—a graphical user interface for free, open-source, and command-driven R software.
PROBLEM STATEMENT
The Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India is a tertiary care institute—the mandate of PGIMER is to undertake intensive research in patient care. Statistical analysis training is an integral part of student’s academic learning. The literature has shown increasing statistical anxiety among medical students. The researcher selected a reliable and valid structure of the survey of attitudes toward statistics (SATS-36) questionnaire to capture statistical anxiety. We segregated SATS anxiety scores into mild, moderate, and high anxiety categories. The survey’s objective is whether there is any significant association between the levels of designation and statistical anxiety. To further clarify, the “faculty, research staff, and students” are the three levels of designation (independent variable), and mild, moderate, and high statistics anxiety is the outcome variable in the study.
Disclaimer
For demonstration, Excel® was used to generate the data for the analysis. However, the SATS-36 questionnaire genuinely assesses statistics anxiety.
Chi-squared Test (χ2)
Chi-square is a test of significance when the dependent variable is nominal—it does not tell the strength of the association. Further, the order of categories does not affect Chi-square—it is affected only by differences between groups. The Chi-squared test can be extended to interval or ratio data that researchers collapse into ordinal categories. The distribution of Chi-square is continuous, whereas the test applies to nominal data. As continuous distribution estimates the discrete probability of observed binomial (yes vs no) frequencies, the same overestimates the Chi-square value, which results in a smaller p-value than expected—a crucial reason for the increase in type-I error. The continuity correction for large samples does not make a significant difference—however, a small, expected value in a few cells for a small sample can skew the calculation of Chi-square statistics and hence the p-value. Yates’ continuity correction is applied to reduce error in the approximation. For a 2 × 2 table, it assumes to have at least an expected value—not observed value of five or more. The Chi-squared test for more than two rows and columns assumes that not >20% of cells have an expected value less than five and no cell has a value less than one. The researchers don’t have to apply Yates’ correction for more than two columns and rows or when the test result is not statistically significant. Unlike bidirectional tests such as Z and t-tests, Chi-square is a one-tailed (right side) test—only positive and large values can reject the null hypothesis.4
In general, there are three types of Chi-squared tests:
Test of Association: It Assesses the association between Two Categorical Variables
-
Research question: Is there any association between designation (faculty, research staff, and students) and statistical anxiety (mild, moderate, and high)?
-
Null hypothesis (H0): No statistically significant association exists between designation and statistical anxiety.
-
Alternative hypothesis (H1): There is a statistically significant association between designation and statistical anxiety.
Test of Homogeneity: It Assesses whether the Proportion of Statistical Anxiety Outcomes (Mild, Moderate, and High) is Similar between the Designations
-
Research question: Is the proportion of statistical anxiety outcomes (mild, moderate, and high) similar between designations?
-
Null hypothesis (H0): The proportion of statistical anxiety outcomes is not statistically significantly different between designations.
-
Alternative hypothesis (H1): The proportions of statistical anxiety outcomes are statistically significantly different between designations.
Test of Goodness of Fit: It Assesses whether Proportions of Outcomes, such as the Proportion of Diseased in Intervention and Control Groups, are Distributed According to a Prespecified Set of Population Proportions
-
Research question: Are the proportions of mild, moderate, and severe statistical anxiety distributed 15, 5, and 10% in each faculty, research staff, and student category, respectively? Please note that proportion distribution is per certain distributions such as binomial, poison, or previous years’ population standards, such as 30% of students opting for medical, 20% for engineering, and 30% for others.
-
Null hypothesis (H0): The proportion of statistical anxiety is not statistically significantly different than specified proportions (15: 5: 10%) among designations.
-
Alternative hypothesis (H1): The proportion of statistical anxiety is statistically significantly different than specified proportions among designations.
Assumptions
-
The patients are randomly selected.
-
Patients in the sample are independent of each other.
-
The data in the cells are frequencies or counts—no percentages or proportions.
-
The variable levels are mutually exclusive—a single subject contributes data to one and only one group.
-
Independent and dependent variables should be categorical (nominal or ordinal).
Limitations
-
Small sample size—overly sensitive, give smaller p-value. Apply Yates correction or Fisher’s exact test.
-
Interpretation is challenging for large numbers of categories (20 or more).
-
May obtain low association despite significant results.
Fisher’s Exact Test
When the sample size is small, or some cells have a frequency zero or expected value <5—apply Fisher’s exact test. The test is more precise than the Chi-square. Still, the same is applicable only for 2 × 2 contingency tables—no Fisher test for more than two categories of the independent and dependent variable. Fisher’s exact test avoids flaws of Chi-square by not approximating the p-value from continuous distribution—it directly calculates the p-value from the data.
Post hoc Test
The Chi-squared test for three or more groups is a global test—it does not tell what rows and columns are significantly associated. The practitioner is often interested in knowing the association between specific groups. Many researchers undertake subjective inspection by eyeballing the data and then decide whether specific cells are different—not a reliable and valid method. Sharpe discussed four variations of post hoc tests after obtaining statistically significant omnibus Chi-squared test results; these are—(1) residual analysis, (2) comparing cells, (3) ransacking, and (4) partitioning.5 We will discuss only “partitioning” for its ease of understanding and application; interested readers can consult Sharpe for more detail about other methods.5 In the partition test, the researcher systematically partitions the original r × c contingency table into an orthogonal set of 2 × 2. (Table 1) displays the partitioning of a 3 × 3, and (Table 2) exhibits a 3 × 2 table. The number of partitions depends on the degree of freedom (df) for the original contingency table; a 3 × 2 contingency table with two df allows two partitions, and a 3 × 3 table with four df allows for four divisions. The sum of the likelihood ratio Chi-square value for partitioned tables will be equal to the original contingency table. A vital rule is that each marginal total of the original table must be a marginal total for one and only one sub-table.
Designations | ||||
---|---|---|---|---|
Clinical | Surgical | Basic | ||
Departments | Research staff | 12 | 14 | 27 |
Students | 7 | 18 | 27 | |
Faculty | 6 | 8 | 15 | |
χ2 = 1.93, p=0.75 | ||||
Clinical | Surgical | Basic | ||
Research staff | 12 | 14 | - | |
Students | 7 | 18 | - | |
Faculty | - | - | - | |
χ2 = 1.80, p = 0.18, 2x2 table | ||||
Clinical | Surgical | Basic | ||
Research staff | 12 | 14 | 27 | |
Students | 7 | 18 | - | |
Faculty | - | 27 | - | |
χ2 = 0.01, p = 0.92 | ||||
Clinical | Surgical | Basic | ||
Research staff | 12 | 14 | - | |
Students | 7 | 18 | - | |
Faculty | 6 | 8 | - | |
χ2 = 0.15, p = 0.70 | ||||
Clinical | Surgical | Basic | ||
Research staff | 12 | 14 | 27 | |
Students | 7 | 18 | 27 | |
Faculty | 6 | 8 | 15 | |
χ2 = 0.00, p = 0.98 |
Add bold item in the same cell to calculate Chi-square
Designations | |||
---|---|---|---|
Clinical | Surgical | ||
Departments | Research staff | 12 | 14 |
Students | 7 | 18 | |
Faculty | 6 | 8 | |
χ2 = 1.92, p = 0.38 | |||
Clinical | Surgical | ||
Research staff | 12 | 14 | |
Students | 7 | 18 | |
Faculty | - | - | |
χ2 = 1.80, p = 0.18 | |||
Clinical | Surgical | ||
Research staff | 12 | 14 | |
Students | 7 | 18 | |
Faculty | 6 | 8 | |
χ2 = 0.15, p = 0.70 |
Add bold Item in the same cell to calculate Chi-square
Strength of Association
The Chi-squared test, Yates’ correction, and Fisher’s exact test only give statistical significance (p-value)—it does not tell the researcher about the strength of the relationship between variables. Φ and Cramer’s V are alternatives to the correlation coefficient (continuous data) for two nominal variables. The Φ coefficient for 2 × 2 and Cramer’s V for more than two rows and columns determine strengths of association––how strongly two categorical variables are associated. It ranges from 0 to 1, where, 0 indicates no association, and 1 indicates a perfect association between the two variables. The heuristic to interpret the association is—(1) <0.20, weak association, (2) 0.2–0.6 moderate association, and (3) >0.6, strong association.6 The Cramer’s V statistic doesn’t show direction. On a 2 × 2 table, Φ shows direction with a positive or negative sign, but directionality doesn’t make much sense in a larger table of nominal categories. Rcmdr does not calculate Φ and Cramer’s V—the interested reader can calculate the same from VASSAR university online calculator. (VassarStats: Website for Statistical Computation)
Chi-squared Test in Rcmdr
After opening Rcmdr, researchers can access the Chi-squared test in the menu “statistical analysis < discrete variables < enter and analyze two-way table (option 1) and create a two-way table and compare two proportions (Fisher’s exact test—option 2).” Option 1, displayed in (Fig. 1), needs summarized data to run a Chi-square comparison, and option 2, depicted in (Fig. 2), works with raw data. By default, option 1 displays two rows and columns—the user can change the same by dragging the square grid, as demonstrated in (Fig. 1). Bonferroni and Holm’s correction for pairwise (2 × 2) comparison is available only with raw data (option 2). Further, there is flexibility to run Chi-square with and without continuity correction with option 2, compared to mandatory continuity correction with option 1.
Fig. 1: R software window to enter and analyze summarized data
Fig. 2: R software window to enter and analyze raw data
Reporting and Interpretation
We intend to find an association between designation (faculty, students, and research staff) and statistical anxiety (mild, moderate, and severe). For parsimony, we are reporting only Chi-square for the 3 × 3 table. On inspection, we found that all expected cell frequencies are more than five, but the sample size is relatively small—therefore, we applied the Chi-squared test with continuity correction. The result shows no statistically significant association between designation and statistical anxiety (p = 0.75). There was a weak association (Φ = 0.09) between designation and statistical anxiety—it is for demonstration only, and there is no need to report if the researcher does not find a statistically significant association.
CONCLUSION
The Chi-squared test is one of the most popular non-parametric statistical tests. The conventional wisdom that Chi-squared tests do not make assumptions is wrong and flawed—it makes crucial assumptions but not about homogeneity and normality of data. Further, many researchers do not use Yates’ correction for the small sample size to obtain the correct p-value. Most researchers conclude by stating a statistically significant association between variables—it is advisable to report the strength of association with φ or Cramer’s V. There are at least four post hoc Chi-squared tests available. Most researchers, however, do not apply post hoc Chi-square to calculate specific sources of association for three or more rows and columns. We hope the current manuscript will motivate the researchers to adopt and report correct practices.
ACKNOWLEDGMENT
We acknowledge Mr Tejinder Singh from the Department of Biostatistics, Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India, for his valuable time and input to improve the quality of the article.
ORCID
Kamal Kishore https://orcid.org/0000-0001-8936-0843
Vidushi Jaswal https://orcid.org/0000-0002-8914-4283
REFERENCES
1. Kishore K, Jaswal V. Statistics corner: comparing two unpaired groups. J Postgrad Med Educ Res 2022;56(3):145–148. DOI: 10.5005/jp-journals-10028-1594
2. Kishore K, Jaswal V. Statistics Corner: Wilcoxon–Mann–Whitney Test. J Postgrad Med Educ Res 2022;56(4):199–201. DOI: 10.5005/jp-journals-10028-1613
3. Crack TF. A note on Karl Pearson’s 1900 Chi-squared test: two derivations of the asymptotic distribution, and uses in goodness of fit and contingency tests of independence, and a comparison with the exact sample variance Chi-square result. Res Methods Methodol Account Ejournal 2018:1–29. DOI: 10.2139/ssrn.3284255
4. Driscoll P, Lecky F. Article 8. An introduction to hypothesis testing. Non-parametric comparison of two groups—1. Emerg Med J 2001;18(4):276–282. DOI: 10.1136/emj.18.4.276
5. Sharpe D. Chi-square test is statistically significant: now what? Pract Assess Res Evaluation 2015;20(8):1–10. DOI: 10.7275/tbfa-x148
6. Kotrlik J, Williams H, Jabor K. Reporting and interpreting effect size in quantitative agricultural education research. J Agric Educ 2011;52(1):132–142. DOI: 10.5032/jae.2011.01132
________________________
© The Author(s). 2023 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted use, distribution, and non-commercial reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.