![]() |
|
| English - Español |
|
|
* These steps need not be in the sequence in this diagram. The sequence may be adjusted according to the needs of the research teams. ** These elements are optional and may be omitted if not relevant for research teams Module 29: DETERMINING DIFFERENCES BETWEEN GROUPS, |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Where: | SD1 is the standard deviation of the first sample SD2 is the standard deviation of the second sample n1 is the sample size of the first sample n2 is the sample size of the second sample |
For our data if we take the women with normal deliveries as sample 1 and those with Cesarean sections as sample 2 the standard error of the difference is:

(4) Finally, divide the difference between the means by the standard error of the difference. The value now obtained is called t-value.
| In the above example: t = 2 | 2 0.56 | = 3.6 |
Expressed in one single formula:

(where X1 is the mean value of the first sample, and X2 is the mean value of the second sample)
Once the t-value has been calculated, you will have to refer to a t-table, from which you can determine whether the null hypothesis is rejected or not. Annex 29.1 contains a t-table.
(1) First, decide which significance level (a-value or alpha value) you want to use (see Module 28). Remember that the chosen significance level (a-value) is an expression of the likelihood of finding a difference by chance when there is no real difference. Usually we choose a significance level of 0.05.
(2) Second, determine the number of degrees of freedom for the test being performed. Degrees of freedom is a measure derived from the sample size, which has to be taken into account when performing a t-test. The bigger the sample size (and the degrees of freedom) the smaller the difference needed in order to reject the null hypothesis.
(3) Third, the t-value belonging to the a-value (the significance level we chose) and the degrees of freedom are located in the table.
If the calculated t-value is equal to or larger than the value derived from the table, the p-value of significance is smaller than the chosen a-value (indicated at the top of the column). We then reject the null hypothesis and conclude that there is a statistically significant difference between the two means.
If the calculated t-value is smaller than the value derived from the table, the p-value is larger than the a-level we chose. We then accept the null hypothesis and conclude that the observed difference is not statistically significant.
The way the number of degrees of freedom is calculated differs from one statistical test to the other. For student’s t-test the number of degrees of freedom is calculated as the sum of the two sample sizes minus 2.
Thus, for Example 1, comparing the heights of women with and without Cesarean sections, the number of degrees of freedom is:
d.f. = 60 + 52 - 2 = 110.
Note:
This is an approximate way of determining degrees of freedom. For the exact method, refer to a statistics textbook
In our example we look up the t-value belonging to a = 0.05 and d.f. = 120 and we find it is 1.98.
We now compare the absolute value of the t-value calculated in Step 1 (i.e., the t-value, ignoring the sign) with the t-value derived from the table in Step 2.
In our example the t-value calculated in step 1 is 3.6, which is larger than the t-value derived from the table in step 2 (1.98). Thus the p-value is smaller than 0.05, and we therefore reject the null hypothesis and conclude that the observed difference of 2 cm between the mean heights of women with normal deliveries and women with Cesarean sections is a statistically significant difference.
We can express this conclusion in different ways:
If you want to compare mean values of more than two groups (e.g., heights of urban, semi-urban and rural women) you cannot use student’s t-test. In this case you must use the F-test, which is not described here.
If you have categorical data the chi-square test is used to find out whether observed differences between proportions of events in two or more groups may be considered statistically significant.
Example 2:
Suppose that in a cross-sectional study of the factors affecting the utilisation of antenatal clinics you found that 64% of the women who lived within 10 kilometres of the clinic came for antenatal care, compared to only 47% of those who lived more than 10 kilometres away. This suggests that antenatal care (ANC) is used more often by women who live close to the clinics. The complete results are presented in Table 29.2:
Table 29.2: Utilisation of antenatal clinics by women living far from and near the clinic

From the table we conclude that there seems to be a difference in the use of antenatal care between those who live close to and those who live far from the clinic (64% versus 47%). We now want to know if this observed difference is statistically significant or not.
The chi-square test can be used to give us the answer. This test is based on measuring the difference between the observed frequencies and the expected frequencies if the null hypothesis (i.e., the hypothesis of no difference) were true.
To perform a ?2 test you need to complete the following 3 steps:
| (1) | Calculate the expected frequency (E) for each cell. | |||
| To find the expected frequency E of a cell you multiply the row total by the column total and divide by the grand (overall) total: | ||||
| E = | row total x column total grand (overall) total | |||
| (2) | For each cell, subtract the expected frequency from the observed frequency (O – E). | |||
| (3) | For each cell, square the result of (O – E) and divide by the expected frequency E. | |||
| (4) | Add the squared results calculated in step (c) for all the cells. | |||
| The formula for calculating a chi-square value (steps (b) to (d)) is as follows: | ||||
| ?2 = S | (0 – E)2 E | |||
| where: | O | is the observed frequency (indicated in the table) | |
| E | E is the expected frequency (to be calculated), and | ||
| S | (the sum of) directs you to add together the values of (O – E)2 /E for all the cells of the table. |
For a two-by two table (which contains 4 cells) the formula is:

As for the t-test, the calculated ?2 value has to be compared with a theoretical ?2 value in order to determine whether the null hypothesis is rejected or not. Annex 29.2 contains a table of theoretical ?2 values.
(1) First you must decide what significance level you want to use (alpha or a-value). We usually take 0.05.
(2) Then the degrees of freedom have to be calculated. With the ?2 test the number of degrees of freedom is related to the number of cells, i.e. the number of groups you are comparing. The number of degrees of freedom is found by multiplying the number of rows (r) minus 1 by the number of columns (c) minus 1:
d.f. = (r-1) x (c-1)
For a simple two-by-two table the number of degrees of freedom is 1: d.f. = (2-1) x (2-1) = 1(3) Then the ?2value belonging to the a-value and the number of degrees of freedom are located in the table. If the calculated ?2 value is equal to or larger than the ?2 value from the table then the p-value is smaller than the chose significance (a)-value. In this case, we reject the null hypothesis and conclude that there is a statistically significant difference between the groups. If the calculated ?2 value is smaller than the ?2 value from the table, then the p-value found is larger than the chosen significance level of 0.05. In this case, we accept the null hypothesis and conclude that the observed difference is not statistically significant.
As for the t-test, the null hypothesis is rejected if p<0.05, which is the case if the calculated ?2 value is larger than the theoretical ?2 value in the table.
Let us now apply the ?2 test to the data given in Example 2 (utilisation of antenatal care). This gives the following result:
Step 1: Calculating the ?2 value
First the expected frequencies for each cell are calculated as follows:
E1 = 86 x 80/155 = 44.4 E2 = 69 x 80/155 = 35.6
E3 = 86 x 75/155 = 41.6 E4 = 69 x 75/155 = 33.4
For convenience sake the observed and expected frequencies are shown in the following table:
Table 29.3: Utilisation of antenatal clinics observed and expected frequencies

Note that the expected frequencies refer to the values we would have expected, given the total numbers of 80 and 75 women in the two groups, if the null hypothesis, stating that there is no difference between the two groups, were true.
Now the ?2 value can be calculated:
| ?2 = | (51 – 44.4)2 44.4 | + | (29 – 35.6)2 35.6 | + | (35 – 41.6)2 41.6 | + | (40 – 33.4)2 33.4 | ||
| = | 0.98 | + | 1.22 | + | 1.05 | + | 1.30 | = 4.55 |
Step 2: Using the ?2 table
As we have a simple two-by-two table, the number of degrees of freedom (d.f.) is 1.
Use the table of chi-square values in Annex 29.2. We have decided beforehand on a level of significance of 5% (a-value = 0.05).
As the number of d.f. is 1, we look along that row in the column where p = 0.05. This gives us the value of 3.84. Our value of 4.55 is larger than 3.84, which means that the p value is smaller than 0.05.
Step 3: Interpreting the result
We can now conclude that the women living within a distance of 10 km from the clinic utilise antenatal care significantly more often than the women living more than 10 km away.
It is important to present your data clearly and to carefully formulate any conclusions based on statistical tests in the final report of your study.
For the above example, you could present Table 29.2 in the report and state your conclusions in the following way:
‘Table 2 indicates that 64% of the women living within a distance of 10 km from the clinic used ante-natal care during pregnancy, compared to only 47% of women living 10 km or further away from the nearest clinic. This difference is statistically significant (?2 = 4.55; p < 0.05).’
Note:
In the above example one could decide to distinguish between three different distances: less than 5 km, 5 to 10 km and more than 10 km. The data would then be put in a two-by-three table. The number of degrees of freedom would be (3–1) x (2–1) = 2.
For two-by-two tables there is a quick method for calculating the Chi-square value, which can replace step 1 described above.
If the various numbers in the cross-table are represented by the following letters:

Where, E = a + b; F = c + d; G = a + c; H = b + d,
The quick formula for calculating the Chi-square value for a two-by-two table is:
| ?2 = | N(ad–bc)2 (a+b)(c+d)(a+c)(b+d) | = | N(ad–bc)2 EFGH |
Note:
Computers are helpful when dealing with large data sets. A variety of software programmes provide statistical tests, including p-values. A statistical calculator can also calculate the chi-squares for you.
GROUP WORK
If your data was collected by unpaired observations, identify the appropriate significance test and perform the necessary analysis.
The first column lists the number of degrees of freedom. The headings of the other columns give the a-values for t to exceed the entry value. Use symmetry for negative values.

If the calculated t-value (ignoring the sign) is larger than the value indicated in the table, the p-value in your calculation is smaller than the chosen p (a-) value indicated at the top of the column.
In that case, the null hypothesis, stating that there is no difference, is rejected, and it can be concluded that there is a significant difference in the result of your study.

If the calculated ?2 value is larger than the value indicated in the table, the p-value is smaller than the chosen level of significance (indicated at the top of the column).
In that case, the null hypothesis, stating that there is no difference, is rejected, and it can be concluded that the difference between the two groups in your study is statistically significant.
In Table 29.4 the results of a schistosomiasis survey among the inhabitants of two villages are presented.
Table 29.4: Prevalence of schistosomiasis in two villages, A and B

It seems that the prevalence of schistosomiasis is the same in both villages (32%).
However, the researchers suspect that age is a confounding variable. Therefore, Table 29.4 is split up into two tables (27.5 and 27.6). Note that adding the numbers in Tables 29.5 and 29.6 will give us Table 29.4.
Table 29.5: Prevalence of schistosomiasis in children aged 5-19 in villages A and B

?2 – 9.08; 1 degree of freedom; p < 0.01.
Table 29.6: Prevalence of schistosomiasis in those aged 20 years and above in villages A and B

?2 – 2.78; 1 degree of freedom; p < 0.05.
From Tables 29.5 and 29.6 it becomes clear that:
Age is said to be a confounding variable because it is related to the variable of interest (prevalence of schistosomiasis) and to the groups being compared (residence in Village A or B).
This example illustrates an important point in analysing data. It may be very misleading to pool dissimilar data. In this particular example, pooling the age groups masked an important real difference. In other situations pooling the data may suggest a difference or association that does not exist or even a difference opposite to that which exists.
It is, therefore, important to analyse the above data for the different age groups separately. The appropriate ?2 values (with continuity correction) for comparing the prevalences in Villages A and B are shown in Tables 29.5 and 29.6. The difference in prevalence is significant for children but not for adults.
It is often useful to have a summary test that pools the evidence from the individual tables, but takes into account the confounding factor (age in our example). The Mantel-Haenszel ?2 test for doing this will be described.
For each of the two-by-two tables we will use the notation:

Step 1. For each of the two-by-two tables,
Step 2. The Mantel-Haenszel Chi-Square (?2 MH ) value is
| ?2 MH = | (O – E – 0.5)2 Va | with degrees of freedom = n1 |
| Where: | O = | the sum of the (Oa) observed frequencies |
| E = | the sum of the (Ea) expected frequencies | |
| V = | the sum of the (Va) variances | |
| 0.5 is the continuity correction factor | ||
To check for statistical significance, we use the ?2 tables as discussed earlier.
Application:
In the prevalence of schistosomiasis in the two villages, there are two 2-by-2 tables, for those less than 20 years (children), and for those aged 20 years and above (adults). (See Tables 29.5 and 29.6.) From the two tables, the observed and expected frequencies are given below.
In the example, the calculations are:

O = 80, E = 64.4, V = 18.6
| ?2 MH = | (80 – 64.4 – 0.5)2 18.6 | = | 15.12 18.6 | = 12.25 (p<0.001) |
It, therefore, can be concluded that the prevalence of schistosomiasis is significantly different in Village A and B. (Remember that this seemed not be the case when we looked at Table 29.4, in which the data for both adults and children were pooled.)
The Mantel-Haenszel ?2 test is an approximate test. The rule for assessing its adequacy is more complicated than that for the ordinary ?2 test. Two additional values are calculated for each table and summed over the tables. These are:
Both these sums should differ from the total of the expected values, Ea, by at least 5. The details of the calculation for the above example are:

These sums are 110 and 0, both of which differ from 64.4 (Ea) by more than 5. The use of the Mantel-Haenszel test is therefore valid.
Trainer’s Notes
Timing and teaching methods
| 1 hour | Introduction and discussion |
| 3 hours + | Group work |
When performing statistical tests make sure that each member of the group does at least one test on his or her own.
| guest (Lire)heure de l'Est (É.-U. et Canada) Login | Accueil|Carrières|Droits d'auteurs et usage|Informations générales|Nous rejoindre|Basse vitesse |