Archive

Posts Tagged ‘causality’

Neuropsychology and racism

13/04/2010 2 comments

This could be an interesting discussion about the huge debate among scientists over questions of IQ, environment, race (in the sense of the social construct rather than any sort of precise ethnic identity), and idiocies like The Bell Curve. There are some very good criticisms to be made regarding IQ as a construct, as well as the idea of an underlying -g- factor for general cognitive performance. If pressed I’ll admit a great deal of sympathy for Neurodiversity though I do take some issue with parts.

With that in mind, I’d like to describe a study in the most recent issue of Current Biology 20(7). In it, Dr. Santos and her colleagues compared attitudes towards race and gender between normal children and children with Williams Syndrome. Normal here, as per usual in the scientific literature, should be taken to mean devoid of any confounding clinical condition, rather than a value judgment.

Anyways. They had 20 normal children matched to 20 WS children, each group having 10 males and 10 females. They found that WS children, who genetically lack social fear of strangers, had similar rates of gender stereotyping, but lacked the negative racial stereotyping found in the normal group. This was done by showing a picture accompanied by a short story which used one of six positive or negative qualifiers. The images were of two figures – either a white and a black girl, white and black boy, a black man and black woman, or a white man and white woman. The children were asked to point to which figure in the image-pair the qualifier applied to – responses were compared to 50%, where the qualifier would be applied to either figure at chance rates.

This suggests that social fear of strangers as engendered by some of the missing genes of WS plays a strong role in developing negative racial attitudes, but that negative gender attitudes develop through some other mechanism. The authors suggest social imitative learning and overgeneralization as potential roots (given previous studies they’ve conducted).

There is a neural basis for all of this. In humans the amygdala is known to be involved in fear/emotional responses and learning. Children with WS have abnormal regulation of the amygdala by the prefrontal cortex (involved in executive functions like behavioural initiation and inhibition) and decreased interaction between the amygdala and fusiform face area. This suggest two factors at work, in which children with WS do not recognize out-group faces as out-group, and there is no fear response evoked in the amygdala.

Now, we shouldn’t immediately go out and delete a chunk of our chromosome 7. People with WS have a host of problems that accompany any chromosomal disorder. Still, studies like this are useful for teasing out some of the biological bases of social attitudes, as well as making monistic naturalists like me pretty happy. They also show how people with natural differences can be helpful in research not strictly connected to their difference. Rather than lesioning a bunch of people along the amygdala and FFA we can find a group of people with very well-defined differences in neural development of structure who have a particular phenotype. Comparing them to ‘normal’, otherwise matched individuals can lead to very fruitful results we could not otherwise discover. Obviously, such research needs to be carried out in participation with the individuals involved, rather than studying them as freaks or weird abnormalities. But if you can do that then you can find out some pretty nifty things.

Santos A, Meyer-Lindenberg A, Deruelle C. Absence of racial, but not gender, stereotyping in Williams syndrome children. Current Biology 20(7):R307-R308.

Multiple comparisons and coincidence

Statistics refers to various methods of looking at subsets of whole populations, called samples. Because it is usually impossible to use the entire population of whatever you are looking at (ie, when polling you can’t called /every/ American), we use samples of those populations and then extrapolate our results from that sample.

In science as in polling, you do want to try to have a representative sample, or to be comparing similar groups, so that your sample resembles the population. Single traits in the population, like height, tend to be distributed along a bell curve, as below – there are a lot of people of middle height, with some very short and some very tall people.

That’s a ‘normal’ curve, where the mean (average) equals the mode (most frequent score). There’s some fancier stuff about how tall or spread-out the curve is, but that’s essentially it.

Now, most statistical tests that you do in science assume that whatever you’re measuring is normally distributed (as in that curve above). They then compare the means of the difference groups in the experiment. See, there’s no way of knowing just from looking at two groups if they’re really that different, or if those differences are due only to chance. If you’ve measured, say, the reaction times of a medicated vs. a non-medicated group, you can’t just say ‘The medicated group had a mean reaction time of 500ms, and the non-medicated group had a mean reaction time of 450ms, therefore the medication slows reaction time.’ This might not be true – because of all the potential sources of error, the differences in reaction time can’t be solely attributed to the medication. What you need to know is /how/ different the reaction times are, and if you can say those differences are /significant/.

In the most general terms, what you would do here is find the difference between the two group averages, and then divide that by the total error measure of both groups. That’s a bit tricky to get into, so please for now take my word for it that you can get a mathematical representation of the error in the measurement (it’s like the +/- 0.05mL from high school science).

If you’re doing this by hand, you then take your score and compare it to a table of scores, based on your sample size. Normally, what we use as the significance cut-off is a p-value of < 0.05. What this means if our score falls into the top or bottom 2.5% on the curve above, and any difference we found between our groups is thus pretty unlikely to have occurred by chance alone. This is why polling data is reported as '19 times out of 20', since that's a 5% chance of the result being a coincidence.

Now, this is fine as it goes. However, if you want to compare more than two groups, things get tricky, because you'll often need to compare each group to one another, meaning instead of two comparisons you'll have 3 (3 groups), 5 (4 groups), 10 (5 groups), and so on (feel free to correct me if I messed up the comparison numbers there – the idea's still sounds).

Now, if I ask you to pull one gold coin out of a jar of 19 brass ones, you have a 5% chance of getting the right one. And the next time I ask you to do it, you still have a 5% chance, assuming you put the coin you pulled out first back inside. But – the second time you do it you have a 9.75% /cumulative/ chance of drawing the gold coin. The third time, you have a 14.3% chance. And so on. So how do we keep the chance 5% each time?

The same problem occurs during statistics. The more comparisons you do, the higher the likelihood of you getting a significant result through chance alone. Thus, we use a variety of techniques to make sure we're still looking at results which are unlikely to be found through chance alone. The method I use most often is a fairly conservative one (ie I'm more likely to discard a real difference as non-significant) called Bonferroni correction, which makes my significance criteria stricter for every comparison I do – essentially adding 20 coins for every time I plan to draw. Others do the equivalent of adding 20 coins after every new draw.

Now, where am I going with all this, other than giving some nifty insight into science and statistics?

Well, watch this and then come back:

From about 1:20 and again from 6:30 or so is what I'm thinking of. Coincidences seem marvellously unlikely to us, but the fact is that they are simply due to multiple comparisons. The more times we 'compare' things, the more likely we are to get a match, no matter how unlikely. The more people you meet, the more likely you are to meet someone else with a one-legged uncle names Charles. Spectacularly unlikely, yes, but if every person you meet counts as another comparison of means, or uncles, it becomes less and less unlikely.

This generalizes quite a bit. Lightning striking a church? There is a lot of lightning, and a lot of churches. Ignoring any real effects (churches are often on hills, have spires with metal bits, etc), by chance alone you'd almost have to /expect/ it to happen to some church over time.

Many things that seem unlikely are unlikely only in isolation – given the large number of things on Earth, and the fact that things happen over time, seemingly unlikely things are going to happen fairly regularly, especially when viewed in retrospect.

We need to try to correct for this. And that's what science strives for.

Experimental Design: Correlational Studies and Causality

Last post reminded me of how easy it is to get bogged down in definitions. As one more comment on it, I had a short conversation with someone in which we found that while scientific (or artistic) progress could be seen as progress in the dictionary sense, this is because they involve a) a body of knowledge or repertoire of skills and concepts, and b) have limited, self-defined values. Social progress is trickier, both because of the term’s use in the past to justify the value system of those in power and because society is such a broad concept that there are no initial, self-defined values equivalent to science’s value of expanded and corrected knowledge.

Anyways. On to the body of this post, which is a synthesis of things from my stats class and from this post from www.sciencebasedmedicine.org.

There are basically two different types of scientific studies, correlational and experimental.

Correlational studies are used to examine the relationship between two or more phenomena. These studies typically only measure phenomena, and make no attempt to manipulate them. Correlational studies are often considered useful mainly to guide future experimental studies, but they have their own value. They see a large amount of use in psychology, where it can be difficult to manipulate certain phenomena without affecting any others, and where some useful manipulations are ethically prohibited.

A good example, given in the lecture, would be a study looking at socioeconomic status (SES) and severity of depressive symptoms. Proceeding on the assumption that you have already sorted out how you are defining and measuring SES and depression, you can gather some participants and figure out their SES and their symptoms, typically through self-report questionnaires, though you could use other sources as well. Most diagnoses of depression, for example, use reports from the patient, family, friends, their doctor, etc, instead of relying entirely on self-report.

Now that you have these two measures for all of your participants, you can plot SES vs. symptom severity on a graph, as below, and calculate the correlation co-efficient. When looking at social factors like SES and depression, a moderate-to-strong correlation is usually above 0.5 and is essentially the slope of the line from the graph. This is a very simple sort of study – most correlational studies are much more involved, as you will see when I eventually post on my own research.

Now, there are three major problems with looking at causality from correlational studies. Firstly, they have no direction of cause-effect. Does someone have a lower SES because they are more depressed, and thus have a harder time at work? Or are they depressed because of all the stresses of low SES? It is tempting to say ‘both’, but this is not necessarily the case. The second problem is that both variables might in fact be correlated with a third variable. Perhaps SES status correlates with years in school, and people who spend fewer years in school tend more to depression. Lastly, correlations can be entirely coincidental. Thus, all correlations need to be looked at in light of prior plausibility. The price of tea in China may be strongly correlated with the number of newborns each month, but there is unlikely to be a direct causal relationship.

So how can we determine causality in correlational studies?

We can’t, not without experimental evidence, but we can infer it using some guidelines. Two major ideas come from John Stuart Mill. The method of concomitant variation states that if any of the phenomena under study vary in a particular way (such as rising together, or one rises whenever the other falls), then there is a causal link, though we cannot determine which is cause and which effect. This is a useful starting point, but it is susceptible to the second and third problems; some third factor may be at work, or it may be happenstance (though this is less likely under this method). The method of difference states that if a phenomena occurs in one instance, but not another, and the circumstances are the same in each instance except for one factor, that factor is part of the cause of the phenomena. This is the origin of the control group in studies, something I will return to in a later post.

A more recent set of criteria for causality was created by Austin Bradford Hill in 1965. He gives a set of 9 criteria to be considered before causality can be inferred from medical studies. The criteria, though, have a good broad applicability to any field of study.

1) Strength. How strong is the association between SES and symptom severity? If it is low, around 0.3, then a causal relationship is less likely than if it is closer to 0.8.

2) Consistency. How many studies have found the same correlation? If several other studies have found a low or no correlation between SES and symptom severity, one should be wary of assigning a causal relationship.

3) Specificity. This criterion is more applicable to medicine, but can be used here. If your SES measure includes a lot of things, such as schooling, income, employment, etc, then your causal relationship with symptom severity is much more confused than if SES is only family income.

4) Temporality. Another one which is more difficult to apply to our example, this criterion states that cause should precede effect. A well-designed experiment could tease it out, though – if you did the study at different times, you could see if a drop in depression heralded a rise in SES, or vice-versa.

5) Biological gradient. Better termed ‘Effect gradient’ for our purposes, this means a causal relationship should reflect the size of what is occurring, and it is essentially the same as the first criterion. In a different setting, this could be an increase in the severity of cancer with an increase in radiation dosage, which creates a stronger causal link than just radiation being associated with incidence of cancer.

6) Plausibility. Can you explain your findings? Do they make sense? There are good reasons to suspect lower SES is associated with depression, but if our question was SES with hair colour, we should be cautious in inferring a causal relationship even in light of a fairly strong correlation. This is an extremely important criterion in science, one which is ignored too often, and prior plausibility is a topic that will come up frequently on this blog.

7) Coherence. Do the findings fit into our existing understanding? If there are studies showing people with low SES spend most days singing and dancing, could our findings be mistaken? Coherence is trickier, because it means evaluating existing evidence in light of your own. Perhaps your findings are correct and the previous studies have been biased or poorly framed. The stronger the evidence for the framework into which your results must fit, the more important this criterion becomes. This is often forgotten by those wishing to paint themselves as a modern-day Galileo.

8 ) Experimentation. This is still ultimately required, to move beyond inference to demonstrating an actual causal relationship and describing the direction of that relationship.

9) Analogy. Are there other studies which have found a variation in severity of depressive symptoms in other populations? Are people with lower education more depressed than those with more? Similar findings by analogy can support your own, though as with coherence one needs to be cautious.

I’ll just mention now – one cannot just apply these as a rubric for saying ‘yes, our findings suggest a causal relationship’. One needs to be careful to think each item through beforehand, and without #8, any definite conclusions are tricky. As well, just because you have a correlation does not mean you have found anything useful. Statistics do nothing to help bad science.

So what use is a correlation?

Well, say after everything you do manage to reasonably conclude there is a relationship between SES and severity of depressive symptoms. Even without establishing a direct causal link, you still have useful information. You know that people with low SES are more likely to have more severe depression – thus, those patients can be treated with priority.

Well, ideally. I’m sure that will come up later.

I’ve promised just in this post another three or four posts, but give me some time. I hope this has been interesting, and ideally a bit more focused than the last one. If you have any questions, or want to correct me on anything, please feel free to comment.

Follow

Get every new post delivered to your Inbox.