Many researchers feel triumphant when they discover a “statistically significant” finding, without really understanding what it means. When a statistic is significant, it simply means that you are very sure that the statistic is reliable. It doesn’t mean that the finding is important or that it has any decision-making utility.
For example, suppose we give 1,000 people an IQ test, and we ask if there is a significant difference between male and female scores. The mean score for males is 98 and the mean score for females is 100. We use an independent group t-test and find that the difference is significant at the .001 level. The big question is, “What does this imply?”. The difference between 98 and 100 on an IQ test is a very small difference…so small, in fact, that it’s not even important.
Then why did the t-statistic tests came out significant? Because, the sample size was large. When we have a large sample size, very small differences will be detected as significant. This means that we are very sure that the difference is real (i.e., it didn’t happen by accident). It doesn’t imply that the difference is large or important. On the contrary, if we had only taken the sample of 25 people instead of 1,000, the two-point difference between males and females would not have been significant.
Significance is a statistical term that tells how sure we are that a difference or relationship exists. To say that a significant difference or relationship exists only is not all. We might be very sure that a relationship exists, but is it a strong, moderate, or weak relationship? After finding a significant relationship, it is important to evaluate its strength and magnitude. Significant relationships can be strong or weak. Significant differences can be large or small. It totally depends on the sample size.
Many researchers use the word “significant” to describe a finding that may have decision-making utility to a client. From a statistician’s perspective, this is an incorrect implication of the word. However, the word “significant” has virtually universal meaning to the public. Thus, many researchers use the word “significant” to describe a difference or relationship that may be strategically important to a client or any outcome of the research (regardless of any statistical tests). In these situations, the word “significant” is used to advise a client to take note of a particular difference or relationship because it may be relevant to the company’s strategic plan. The word “significant” is not the exclusive domain of statisticians and its either use is correct in the business world. Thus, for the statistician, it may be wise to adopt a policy of always referring to “statistical significance” rather than simply “significance” when communicating with the public.
One-Tailed and Two-Tailed Significance Tests
One important concept in significance testing is whether we use a one-tailed or two-tailed test of significance. The answer is that it depends on the hypothesis. When the research hypothesis states the direction of the difference or relationship, then we use one-tailed probability. For example, the one-tailed test would be used to test the null hypotheses such as; Females will not score significantly higher than males in an IQ test. Blue collar workers will not buy significantly more product than white collar workers. Batman is not significantly stronger than the average person. In each case, the null hypothesis (indirectly) predicts the direction of the difference. A two-tailed test would be used to test the null hypotheses such as; there will be no significant difference in IQ scores of males and females. There will be no significant difference in the amount of product purchased between blue collar and white collar workers. There is no significant difference in strength between Batman and the average person. The one-tailed probability is exactly half the value of the two-tailed probability.
There is a strong controversy (for about the last hundred years) on whether or not it is ever appropriate to use a one-tailed test. The rationale is that if you already know the direction of the difference, why bother doing any statistical tests. While it is generally safe to use a two-tailed test, there are situations where a one-tailed test seems more appropriate. The bottom line is that it is the choice of the researcher whether to use one-tailed or two-tailed research questions.
Procedure Used to Test for Significance
Whenever we perform a significance test, it involves comparing a test value that we have calculated to some critical value for the statistic. It doesn’t matter what type of statistic we are calculating (e.g., a t-statistic, a chi-square statistic, an F-statistic, etc.), the procedure to test for significance is the same.
- Decide on the critical alpha level you will use (i.e., the error rate you are willing to accept).
- Conduct the research.
- Calculate the statistic.
- Compare the statistic to a critical value obtained from a table.
If the statistic is higher than the critical value from the table:
- Finding is significant.
- Reject the null hypothesis.
- The probability is small that the difference or relationship happened by chance, and p is less than the critical alpha level (p < alpha).
If the statistic is lower than the critical value from the table:
- Finding is not significant.
- Fail to reject the null hypothesis.
- The probability is high that the difference or relationship happened by chance, and p is greater than the critical alpha level (p > alpha).
Modern computer software can calculate exact probabilities for most test statistics. If we have an exact probability from computer software, we can simply compare it to our critical alpha level. If the exact probability is less than the critical alpha level, our finding is significant, and if the exact probability is greater than the critical alpha level, our finding is not significant. Using a table is not necessary when we have the exact probability for a statistic.