Archive

Posts Tagged ‘experimental design’

Pilots.

So, I’m going to be running my first real experiment soon.

I’ve been in a strange position where I’ve designed studies and analyzed studies, but never actually run one. I’m a bit nervous, but luckily human research accounts for the chance that things will go wrong, and that there will be hitches, and that’s why pretty well everyone runs a pilot.

Pilot experiments come in a couple different flavours. Sometimes you aren’t sure about one of your technical parameters, or which specific type of stimuli to present. Maybe the literature doesn’t provide a good guide to how fast a target should move in a pursuit-rotor (follow the dot) task, or you’ve changed or invented a task and have no idea how well people will perform on it. You may want to avoid floor or ceiling effects, after all. And sometimes you need a couple data points to strengthen a funding application before you can actually afford the time or money to run the full study.

Other times you just want to run through the experiment a couple times to feel comfortable with it. I get the basic idea of repetitive transcranial magnetic stimulation, which this experiment uses, but I’ve never actually deactivated someone’s brain using it. This is where it’s helpful to have a science-positive friend who’s willing to let you zap their brain with a magnet. Or you can do what most labs do, and just try it out on your fellow lab-mates.

The primose path to mad science.

14/12/2010 1 comment

So I’ve soured a bit on SVAR. I corrected my input and now my model doesn’t work, which is frustrating, and I don’t quite understand the math well enough to fix things without a good deal of trial and error. Still, were it not for the awesome folks at AFNI, I’d be further behind.

I was listening to a conversation in our lab, between one of my profs and a former student of his. They were talking about how a lot of speech or language theories rely on certain theoretical brain networks, and were grousing a bit about how few of these models reflected actual anatomy. In particular, certain regions which are thought to be connected in a network don’t actually have direct connections via white matter fibre tracts.

As you might recall, white matter in the brain, which generally forms the pith-looking part of the brain when you look at slices, is mostly made up of axons, the long wire-type bits of neurons which connect one neuron to other neurons. Thus, you have layers of grey matter, or neurons themselves, on the surface of the brain and axons under the surface, connecting neurons both between and within regions, as well as to other neurons deeper in the brain.

I was thinking about how you might find accurate fibre tracts, so you could know which regions are directly connected to others. One way is to insert tracers into cell bodies, wait until normal brain function has caused the tracer to be transported along those neurons’ axons, and then kill the subject and cut open the brain to see where the pathway leads. When doing research intended to apply to humans, this is often done in monkeys. Obviously, this is controversial enough in monkeys, and can’t be done in humans.

But monkey brains are not entirely like human brains. Sure, we’re both primates, but a lot of the functional difference between us and monkeys comes from our brain differences. There’s a lot of debate over which parts of the monkey brain are equivalent or “homologous” to which parts of the human brain. And that doesn’t even take into account that monkeys use different parts of their brains for some things than we do. So in humans, you can try various techniques and tricks.

A popular one is called “diffusion tensor imaging”, or DTI. This is a type of magnetic resonance imaging, which uses properties of water diffusion along axons to create images of fibre tracts. Don’t ask me the details – I’m still learning how to describe simple volumetric MRI. But part of the issue is it’s a difficult, representative reconstruction of actual fibres, and has certain limitations in how specifically it can describe the tracts.

So wouldn’t it be nice to trace them directly? I was thinking about this on the way home tonight. I mean, we do have extracted brains from corpses. You could stain a region and follow – but not really, because a brain needs to be fixed (hardened) in formaldehyde in order to retain any real structure after death. Okay, but presumably you could construct some artificial support, and maybe keep it in liquid. That’s just an engineering problem. That’s not a knock against engineers, by the way, but just a way of saying it’s possible barring the actual design and construction.

But the tracing methods I recall from undergrad need an active brain, which is transporting nutrients and the tracer down along axons. But… well, we could devote some research to temporarily sustaining a brain in some kind of pseudo-skull, in an oxygen-rich nutrient bath. No reason why it’s not doable, if complicated.

And that’s when it hit me.

I’d come from a fairly simple, abstract research question to putting living brains from corpses in jars.

And I believe, very much, that we are our brains, embodied in our flesh. A disembodied human brain would not be like a person – a body is a requisite part of that, to my mind – but it would have been a person. Imagine total sensory deprivation. Not just the mild sort done in experiments, but the sort where you literally cannot see, hear, smell, taste, or feel anything. At. All.

This never happens unless you are unconscious. There is always sensory feedback when we are the slightest bit awake and aware.

But I still think if you kept a brain alive, it would be alive. Conscious, in a way.

It really is a short step from scientist to mad scientist.

(Which is sort of cool in a way. Also – imagine the applications of living brains in jars! The possibilities are endless!)

Ahem. Right.

An exciting life

The discussions I had in my lab today:

Q: Should you align your functional images to your anatomicals, or vice-versa?
A: Anatomicals to functionals, since I’m looking at activity patterns.

Q: Why the hell is everything misaligned in the GUI now?
A: Who cares, once everything’s been aligned to standard space it won’t matter.

Q: Should you warp your images to standard space before or after convolving your hrf with a piece-wise linear spline and polynomial baseline model with autocorrelation matrix?
A: Further argument. Warping your betas afterwards means there’s less interpolation of data before the analysis, but doing it before makes things easier when doing lots of group comparisons. The argument is left unresolved.

Q: Should I be taking functional ROIs computed from a conjunction analysis of another study, or paint anatomical ROIs on my individual subjects.
A: Man, that last one sounds like a lot of work. We’ll try the first option for now.

It was actually all pretty exciting. I’m actually getting to know enough to have these discussions.

Multiple comparisons and coincidence

Statistics refers to various methods of looking at subsets of whole populations, called samples. Because it is usually impossible to use the entire population of whatever you are looking at (ie, when polling you can’t called /every/ American), we use samples of those populations and then extrapolate our results from that sample.

In science as in polling, you do want to try to have a representative sample, or to be comparing similar groups, so that your sample resembles the population. Single traits in the population, like height, tend to be distributed along a bell curve, as below – there are a lot of people of middle height, with some very short and some very tall people.

That’s a ‘normal’ curve, where the mean (average) equals the mode (most frequent score). There’s some fancier stuff about how tall or spread-out the curve is, but that’s essentially it.

Now, most statistical tests that you do in science assume that whatever you’re measuring is normally distributed (as in that curve above). They then compare the means of the difference groups in the experiment. See, there’s no way of knowing just from looking at two groups if they’re really that different, or if those differences are due only to chance. If you’ve measured, say, the reaction times of a medicated vs. a non-medicated group, you can’t just say ‘The medicated group had a mean reaction time of 500ms, and the non-medicated group had a mean reaction time of 450ms, therefore the medication slows reaction time.’ This might not be true – because of all the potential sources of error, the differences in reaction time can’t be solely attributed to the medication. What you need to know is /how/ different the reaction times are, and if you can say those differences are /significant/.

In the most general terms, what you would do here is find the difference between the two group averages, and then divide that by the total error measure of both groups. That’s a bit tricky to get into, so please for now take my word for it that you can get a mathematical representation of the error in the measurement (it’s like the +/- 0.05mL from high school science).

If you’re doing this by hand, you then take your score and compare it to a table of scores, based on your sample size. Normally, what we use as the significance cut-off is a p-value of < 0.05. What this means if our score falls into the top or bottom 2.5% on the curve above, and any difference we found between our groups is thus pretty unlikely to have occurred by chance alone. This is why polling data is reported as '19 times out of 20', since that's a 5% chance of the result being a coincidence.

Now, this is fine as it goes. However, if you want to compare more than two groups, things get tricky, because you'll often need to compare each group to one another, meaning instead of two comparisons you'll have 3 (3 groups), 5 (4 groups), 10 (5 groups), and so on (feel free to correct me if I messed up the comparison numbers there – the idea's still sounds).

Now, if I ask you to pull one gold coin out of a jar of 19 brass ones, you have a 5% chance of getting the right one. And the next time I ask you to do it, you still have a 5% chance, assuming you put the coin you pulled out first back inside. But – the second time you do it you have a 9.75% /cumulative/ chance of drawing the gold coin. The third time, you have a 14.3% chance. And so on. So how do we keep the chance 5% each time?

The same problem occurs during statistics. The more comparisons you do, the higher the likelihood of you getting a significant result through chance alone. Thus, we use a variety of techniques to make sure we're still looking at results which are unlikely to be found through chance alone. The method I use most often is a fairly conservative one (ie I'm more likely to discard a real difference as non-significant) called Bonferroni correction, which makes my significance criteria stricter for every comparison I do – essentially adding 20 coins for every time I plan to draw. Others do the equivalent of adding 20 coins after every new draw.

Now, where am I going with all this, other than giving some nifty insight into science and statistics?

Well, watch this and then come back:

From about 1:20 and again from 6:30 or so is what I'm thinking of. Coincidences seem marvellously unlikely to us, but the fact is that they are simply due to multiple comparisons. The more times we 'compare' things, the more likely we are to get a match, no matter how unlikely. The more people you meet, the more likely you are to meet someone else with a one-legged uncle names Charles. Spectacularly unlikely, yes, but if every person you meet counts as another comparison of means, or uncles, it becomes less and less unlikely.

This generalizes quite a bit. Lightning striking a church? There is a lot of lightning, and a lot of churches. Ignoring any real effects (churches are often on hills, have spires with metal bits, etc), by chance alone you'd almost have to /expect/ it to happen to some church over time.

Many things that seem unlikely are unlikely only in isolation – given the large number of things on Earth, and the fact that things happen over time, seemingly unlikely things are going to happen fairly regularly, especially when viewed in retrospect.

We need to try to correct for this. And that's what science strives for.

Experimental Design: Correlational Studies and Causality

Last post reminded me of how easy it is to get bogged down in definitions. As one more comment on it, I had a short conversation with someone in which we found that while scientific (or artistic) progress could be seen as progress in the dictionary sense, this is because they involve a) a body of knowledge or repertoire of skills and concepts, and b) have limited, self-defined values. Social progress is trickier, both because of the term’s use in the past to justify the value system of those in power and because society is such a broad concept that there are no initial, self-defined values equivalent to science’s value of expanded and corrected knowledge.

Anyways. On to the body of this post, which is a synthesis of things from my stats class and from this post from www.sciencebasedmedicine.org.

There are basically two different types of scientific studies, correlational and experimental.

Correlational studies are used to examine the relationship between two or more phenomena. These studies typically only measure phenomena, and make no attempt to manipulate them. Correlational studies are often considered useful mainly to guide future experimental studies, but they have their own value. They see a large amount of use in psychology, where it can be difficult to manipulate certain phenomena without affecting any others, and where some useful manipulations are ethically prohibited.

A good example, given in the lecture, would be a study looking at socioeconomic status (SES) and severity of depressive symptoms. Proceeding on the assumption that you have already sorted out how you are defining and measuring SES and depression, you can gather some participants and figure out their SES and their symptoms, typically through self-report questionnaires, though you could use other sources as well. Most diagnoses of depression, for example, use reports from the patient, family, friends, their doctor, etc, instead of relying entirely on self-report.

Now that you have these two measures for all of your participants, you can plot SES vs. symptom severity on a graph, as below, and calculate the correlation co-efficient. When looking at social factors like SES and depression, a moderate-to-strong correlation is usually above 0.5 and is essentially the slope of the line from the graph. This is a very simple sort of study – most correlational studies are much more involved, as you will see when I eventually post on my own research.

Now, there are three major problems with looking at causality from correlational studies. Firstly, they have no direction of cause-effect. Does someone have a lower SES because they are more depressed, and thus have a harder time at work? Or are they depressed because of all the stresses of low SES? It is tempting to say ‘both’, but this is not necessarily the case. The second problem is that both variables might in fact be correlated with a third variable. Perhaps SES status correlates with years in school, and people who spend fewer years in school tend more to depression. Lastly, correlations can be entirely coincidental. Thus, all correlations need to be looked at in light of prior plausibility. The price of tea in China may be strongly correlated with the number of newborns each month, but there is unlikely to be a direct causal relationship.

So how can we determine causality in correlational studies?

We can’t, not without experimental evidence, but we can infer it using some guidelines. Two major ideas come from John Stuart Mill. The method of concomitant variation states that if any of the phenomena under study vary in a particular way (such as rising together, or one rises whenever the other falls), then there is a causal link, though we cannot determine which is cause and which effect. This is a useful starting point, but it is susceptible to the second and third problems; some third factor may be at work, or it may be happenstance (though this is less likely under this method). The method of difference states that if a phenomena occurs in one instance, but not another, and the circumstances are the same in each instance except for one factor, that factor is part of the cause of the phenomena. This is the origin of the control group in studies, something I will return to in a later post.

A more recent set of criteria for causality was created by Austin Bradford Hill in 1965. He gives a set of 9 criteria to be considered before causality can be inferred from medical studies. The criteria, though, have a good broad applicability to any field of study.

1) Strength. How strong is the association between SES and symptom severity? If it is low, around 0.3, then a causal relationship is less likely than if it is closer to 0.8.

2) Consistency. How many studies have found the same correlation? If several other studies have found a low or no correlation between SES and symptom severity, one should be wary of assigning a causal relationship.

3) Specificity. This criterion is more applicable to medicine, but can be used here. If your SES measure includes a lot of things, such as schooling, income, employment, etc, then your causal relationship with symptom severity is much more confused than if SES is only family income.

4) Temporality. Another one which is more difficult to apply to our example, this criterion states that cause should precede effect. A well-designed experiment could tease it out, though – if you did the study at different times, you could see if a drop in depression heralded a rise in SES, or vice-versa.

5) Biological gradient. Better termed ‘Effect gradient’ for our purposes, this means a causal relationship should reflect the size of what is occurring, and it is essentially the same as the first criterion. In a different setting, this could be an increase in the severity of cancer with an increase in radiation dosage, which creates a stronger causal link than just radiation being associated with incidence of cancer.

6) Plausibility. Can you explain your findings? Do they make sense? There are good reasons to suspect lower SES is associated with depression, but if our question was SES with hair colour, we should be cautious in inferring a causal relationship even in light of a fairly strong correlation. This is an extremely important criterion in science, one which is ignored too often, and prior plausibility is a topic that will come up frequently on this blog.

7) Coherence. Do the findings fit into our existing understanding? If there are studies showing people with low SES spend most days singing and dancing, could our findings be mistaken? Coherence is trickier, because it means evaluating existing evidence in light of your own. Perhaps your findings are correct and the previous studies have been biased or poorly framed. The stronger the evidence for the framework into which your results must fit, the more important this criterion becomes. This is often forgotten by those wishing to paint themselves as a modern-day Galileo.

8 ) Experimentation. This is still ultimately required, to move beyond inference to demonstrating an actual causal relationship and describing the direction of that relationship.

9) Analogy. Are there other studies which have found a variation in severity of depressive symptoms in other populations? Are people with lower education more depressed than those with more? Similar findings by analogy can support your own, though as with coherence one needs to be cautious.

I’ll just mention now – one cannot just apply these as a rubric for saying ‘yes, our findings suggest a causal relationship’. One needs to be careful to think each item through beforehand, and without #8, any definite conclusions are tricky. As well, just because you have a correlation does not mean you have found anything useful. Statistics do nothing to help bad science.

So what use is a correlation?

Well, say after everything you do manage to reasonably conclude there is a relationship between SES and severity of depressive symptoms. Even without establishing a direct causal link, you still have useful information. You know that people with low SES are more likely to have more severe depression – thus, those patients can be treated with priority.

Well, ideally. I’m sure that will come up later.

I’ve promised just in this post another three or four posts, but give me some time. I hope this has been interesting, and ideally a bit more focused than the last one. If you have any questions, or want to correct me on anything, please feel free to comment.

Follow

Get every new post delivered to your Inbox.