Last post reminded me of how easy it is to get bogged down in definitions. As one more comment on it, I had a short conversation with someone in which we found that while scientific (or artistic) progress could be seen as progress in the dictionary sense, this is because they involve a) a body of knowledge or repertoire of skills and concepts, and b) have limited, self-defined values. Social progress is trickier, both because of the term’s use in the past to justify the value system of those in power and because society is such a broad concept that there are no initial, self-defined values equivalent to science’s value of expanded and corrected knowledge.
Anyways. On to the body of this post, which is a synthesis of things from my stats class and from this post from www.sciencebasedmedicine.org.
There are basically two different types of scientific studies, correlational and experimental.
Correlational studies are used to examine the relationship between two or more phenomena. These studies typically only measure phenomena, and make no attempt to manipulate them. Correlational studies are often considered useful mainly to guide future experimental studies, but they have their own value. They see a large amount of use in psychology, where it can be difficult to manipulate certain phenomena without affecting any others, and where some useful manipulations are ethically prohibited.
A good example, given in the lecture, would be a study looking at socioeconomic status (SES) and severity of depressive symptoms. Proceeding on the assumption that you have already sorted out how you are defining and measuring SES and depression, you can gather some participants and figure out their SES and their symptoms, typically through self-report questionnaires, though you could use other sources as well. Most diagnoses of depression, for example, use reports from the patient, family, friends, their doctor, etc, instead of relying entirely on self-report.
Now that you have these two measures for all of your participants, you can plot SES vs. symptom severity on a graph, as below, and calculate the correlation co-efficient. When looking at social factors like SES and depression, a moderate-to-strong correlation is usually above 0.5 and is essentially the slope of the line from the graph. This is a very simple sort of study – most correlational studies are much more involved, as you will see when I eventually post on my own research.

Now, there are three major problems with looking at causality from correlational studies. Firstly, they have no direction of cause-effect. Does someone have a lower SES because they are more depressed, and thus have a harder time at work? Or are they depressed because of all the stresses of low SES? It is tempting to say ‘both’, but this is not necessarily the case. The second problem is that both variables might in fact be correlated with a third variable. Perhaps SES status correlates with years in school, and people who spend fewer years in school tend more to depression. Lastly, correlations can be entirely coincidental. Thus, all correlations need to be looked at in light of prior plausibility. The price of tea in China may be strongly correlated with the number of newborns each month, but there is unlikely to be a direct causal relationship.
So how can we determine causality in correlational studies?
We can’t, not without experimental evidence, but we can infer it using some guidelines. Two major ideas come from John Stuart Mill. The method of concomitant variation states that if any of the phenomena under study vary in a particular way (such as rising together, or one rises whenever the other falls), then there is a causal link, though we cannot determine which is cause and which effect. This is a useful starting point, but it is susceptible to the second and third problems; some third factor may be at work, or it may be happenstance (though this is less likely under this method). The method of difference states that if a phenomena occurs in one instance, but not another, and the circumstances are the same in each instance except for one factor, that factor is part of the cause of the phenomena. This is the origin of the control group in studies, something I will return to in a later post.
A more recent set of criteria for causality was created by Austin Bradford Hill in 1965. He gives a set of 9 criteria to be considered before causality can be inferred from medical studies. The criteria, though, have a good broad applicability to any field of study.
1) Strength. How strong is the association between SES and symptom severity? If it is low, around 0.3, then a causal relationship is less likely than if it is closer to 0.8.
2) Consistency. How many studies have found the same correlation? If several other studies have found a low or no correlation between SES and symptom severity, one should be wary of assigning a causal relationship.
3) Specificity. This criterion is more applicable to medicine, but can be used here. If your SES measure includes a lot of things, such as schooling, income, employment, etc, then your causal relationship with symptom severity is much more confused than if SES is only family income.
4) Temporality. Another one which is more difficult to apply to our example, this criterion states that cause should precede effect. A well-designed experiment could tease it out, though – if you did the study at different times, you could see if a drop in depression heralded a rise in SES, or vice-versa.
5) Biological gradient. Better termed ‘Effect gradient’ for our purposes, this means a causal relationship should reflect the size of what is occurring, and it is essentially the same as the first criterion. In a different setting, this could be an increase in the severity of cancer with an increase in radiation dosage, which creates a stronger causal link than just radiation being associated with incidence of cancer.
6) Plausibility. Can you explain your findings? Do they make sense? There are good reasons to suspect lower SES is associated with depression, but if our question was SES with hair colour, we should be cautious in inferring a causal relationship even in light of a fairly strong correlation. This is an extremely important criterion in science, one which is ignored too often, and prior plausibility is a topic that will come up frequently on this blog.
7) Coherence. Do the findings fit into our existing understanding? If there are studies showing people with low SES spend most days singing and dancing, could our findings be mistaken? Coherence is trickier, because it means evaluating existing evidence in light of your own. Perhaps your findings are correct and the previous studies have been biased or poorly framed. The stronger the evidence for the framework into which your results must fit, the more important this criterion becomes. This is often forgotten by those wishing to paint themselves as a modern-day Galileo.
8 ) Experimentation. This is still ultimately required, to move beyond inference to demonstrating an actual causal relationship and describing the direction of that relationship.
9) Analogy. Are there other studies which have found a variation in severity of depressive symptoms in other populations? Are people with lower education more depressed than those with more? Similar findings by analogy can support your own, though as with coherence one needs to be cautious.
I’ll just mention now – one cannot just apply these as a rubric for saying ‘yes, our findings suggest a causal relationship’. One needs to be careful to think each item through beforehand, and without #8, any definite conclusions are tricky. As well, just because you have a correlation does not mean you have found anything useful. Statistics do nothing to help bad science.
So what use is a correlation?
Well, say after everything you do manage to reasonably conclude there is a relationship between SES and severity of depressive symptoms. Even without establishing a direct causal link, you still have useful information. You know that people with low SES are more likely to have more severe depression – thus, those patients can be treated with priority.
Well, ideally. I’m sure that will come up later.
I’ve promised just in this post another three or four posts, but give me some time. I hope this has been interesting, and ideally a bit more focused than the last one. If you have any questions, or want to correct me on anything, please feel free to comment.