I have a confession: I find statistics fascinating. If you tell me something about the world, my first question is “how do you know?” And if you have some stats to back it up, I’m so much more likely to listen.
That said, I’m beginning to find that every time I need to actually use statistics in my research, the statistical tools that I’m equipped with never seem to be right for the task.
For instance, recently boss wanted to know how much statistical power we had to detect a given effect size with this fixed sample size. Easy enough, you say (well, if you’re a statistician or epidemiologist, at least): just look up the formula, plug in your numbers and you’re good to go.
Well… of course, in the “real” world, it’s never that simple.
First, we wanted to compare two groups that were completely different sizes – while textbooks seem to only include formulae for comparing groups of the same size.
Second, instead of values distributed evenly(-ish) around a mean value (i.e. normally distributed), we expect a significant number of observations will have very large values on the outcome of interest. Think of it like trying to compare whether the heights of two groups of high school students are different or not, but having a random number of NBA players in each group – the NBA players are going to have a big effect on the average height if you calculate this using the mean. Researchers interested in our topic typically deal with this by comparing the median values of the outcome, rather than mean values.
Well, it turns out, formulae for power calculations for unbalanced samples are hard to find; formulae for power calculations for median values are even harder to find; and a formula for using median comparisons with unbalanced samples turned out to be impossible to find (within my 4 day deadline!).
So what’s an intrepid young researcher to do? Improvise, of course!
I won’t bore you with the details (any more than I already have). But if you’re interested, or have a solution of your own, I’d love to hear from you in the comments section.
[...] and unconventional real data sets can be. I’ve talked about this before in the context of power analysis , but it gets much [...]