Monday 2 May 2011

How not to measure scientific productivity

Ever since funding agencies started demanding some way to measure a return on their investment, administrators have been coming up with ways to measure scientists' performance. And scientists have been coming up with ways to maximize their value under any particular measuring scheme. In every generation of this game of performance measurement, scientists produce carefully thought-out articles on how the current system simply doesn't capture scientists' work accurately.
A recent Physical Review E paper is one of these, showing that the raw citation count doesn't really capture a paper's importance, nor the researcher's true performance. Yeah, amazing isn't it?
To quickly review: scientists do their research, but none of it counts until it is published in a peer reviewed journal. Being published indicates that a body of research has overcome the minor hurdle of peer review—basically, peer review is an attempt to spot methodological and/or logic failures, and, in general, it works marginally better than a random selection process.
After the work is published, if you are lucky, other scientists read your work. If their own work relies in some way on your conclusions or methodology, they will credit you with a citation. They may also credit you with a citation along the lines of "we will show that Lee et al. were talking a complete load of cobblers and you should ignore everything in their paper."
The administrators of universities and research institutions want to examine this process and rank their scientists. What possible methods could they use?
One way would to be count papers. When this was tried, researchers simply packaged up their research results into smaller and smaller units in order to maximize publication numbers. This is probably also where guest authorship—putting an author on a paper they didn't contribute to—started to appear.
Maybe we can take into account the impact factor of the journals the research is published in? Sure, but a 30-page paper detailing the workings of quantum gravity is never going to be published in a top-ranked journal, even though its authors may well be lauded in their field.
Well, then, let's let the scientists themselves decide by counting how many times a bit of research is cited in the literature—surely that's fair.
Not so, according to Radicchi and Castellano. They analyzed 95 percent of the papers published by the American Physical Society between 1985 and 2010. They divided the papers up according to the author-selected subfield classification. With the papers sorted by discipline, they looked at average citation rates as a function of field and time. Naturally, they found that older papers are cited more often (well, duh). They also show that the average number of citations varies over an order of magnitude, depending on which subfield the authors are active in. This is also no surprise.
Their solution: the author's performance should be normalized in a field and author-number dependent manner. This involves taking a paper and dividing the number of citations that paper has by the average for the publication year in that subfield and the number of authors. If you do that, then papers do not accumulate worth simply through age, which is a good thing.
But normalizing by author numbers is myopic. Consider the researchers at the LHC. Any paper on, say, the Higgs boson is probably going to have over 100 authors. The implication is that authorship of this important paper should be devalued simply because it took a big team to do the work. It's absurd to think that the act of writing the paper is divided up evenly. But it is equally absurd to think that the detector physicists could have done the work without the model makers, or the beam line physicists, etc. Nobody's work is 100 times less valuable simply because there were that many authors.
The justification for this, of course, is to put pressure on scientists to not inflate author lists. But this is no simple issue. For instance, the person who gets the funding and oversees the lab probably didn't twiddle any of the knobs. The research may have been an outgrowth of the experiments he or she proposed, but not directly their own idea.
In an ideal world, this person should probably not be on the author list, yet, without them, there is no research. And science has no way other than authorship for recognizing the contribution of senior scientists whose direct role in the research can be rather diffuse, but are still critical to obtaining the funding used by those who did the actual research.
But what really gets me is the banality of the analysis. Normalizing the data is hardly original. You could argue that there is some surprise that such a simple solution "works." But, the reason it works is that the statistics behind citations rates in each subfield is identical... Um, yeah, that is a shocker.
If I am so deeply unimpressed, why am I reporting on this? Because the mere fact that this paper is published and will be cited indicates the uselessness of the entire system, and no amount of normalization can fix something that is fundamentally flawed.
It also implies that we think that administrators are stupid and count citations in a vacuum. In fact they do not. Most are ex-scientists themselves, and, yes, at some point they ask for that number. But do they compare the solid state physicist with the paper that has 30,000 citations with the nuclear physicist whose work has 100? Not directly, they don't. And certainly not without context.
Of course, now that I have written this, the comments section will be filled with counter examples of clueless administrators doing precisely what I have said they don't do. For all of you in that situation, you have my sympathy—a stupid administrator can make life very unpleasant—but to convince me that this is a general problem, you would have to show that your experience is typical.

No comments:

Post a Comment