Navigation
Public engagement

Virus Fighter

Build a virus or fight a pandemic!

Play online

Maya's Marvellous Medicine

Read online for free

Print your own copy

Battle Robots of the Blood

Read online for free

Print your own copy

Just for Kids! All about Coronavirus

Read online for free

Print your own copy

Archive
LabListon on Twitter
« | Main | Career trajectory »
Wednesday
Jun302021

In praise of metrics during tenure review

Metrics, especially impact factor, have fallen badly out of favour as a mechanism for tenure review. There are good reasons for this - metrics have flaws, and journal impact factors clearly have flaws. It is important, however, to weigh up the pros and cons of the alternative systems that are being put in place, as they also have serious flaws. 

To put my personal experience on the table, I've always been in institutes with 5 yearly rolling tenure. I've experienced two tenure reviews based on metrics, and two based on soft measures. I've also been a part of committees designing these systems, for several institutes. I've seen colleagues hurt by metric-orientated systems, and colleagues hurt by soft measurement systems. There is no perfect system, but I think that people seriously underestimate the potential harm of soft measurement systems. 

Example of a metric-based system

When I first joined the VIB, they had a simple metric-based system. Over the course of 5 years, I was expected to publish 5 articles in journals with an impact factor over 10. I went into the system thinking that these objectives were close to unachievable, although the goals came along with serious support that made it all highly achievable.

For me, the single biggest advantage of the metric-based system was its transparency. It was not the system I would have designed, but I knew the goals, and more importantly I could tell when I had reached those goals. 3 years into my 5 year term I knew that I had met the objectives and that the 5 yearly-review would be fine. That gave me and my team a lot of peace of mind. We didn't need to stress about an unknowable outcome.

Example of a soft measurement system

The VIB later shifted to a system that is becoming more common, where output is assessed for scientific quality by the review panel, rather than by metrics. The Babraham Institute, where I am now, uses a similar system. Different institutes have different expectations and assessment processes, but in effect these soft measurement systems all come down to a small review panel making a verdict on the quality of your science, with the instruction not to use metrics.

This style of assessment creates an unknown. You really don't know for sure how the panel will judge your science until the day their verdict comes out. Certainly, they have the potential to save group leaders that would be hurt by metric-based systems, but equally they can fail group leaders who were productive but judged more harshly by biases introduced through the panel then by the peer-review they experienced by manuscript reviewers.

This in fact brings me to my central thesis: with either metrics or soft measurement systems, you end up having a small number of people read your papers and make their own judgement on the quality of the science. So let's compare how the two work in practice:

Metrics vs soft measurements

Under the metric-based system, essentially my tenure reviewers were the journal editors and external reviewers. For my metrics, I had to hit journals with impact factors about 10, which gives me around 10 journals to aim at in my field. I had 62 articles during my first 5 years, and let's say that the average article went to two journals, each with an editor and 3 reviewers. That gives me a pool of around 500 experts reviewing my work, and judging whether it is of the quality and importance worthy of hitting a major journal. There is almost certainly going to be overlap in that pool, and I published a lot more than many starting PIs, but it is not unreasonable to think that 100 different experts weighed in. Were all of those reviews quality? No, of course not. But I can say that I had the option to exclude particular reviewers, the reviewers could not have open conflicts of interest, the journal editor acted as an assessor of the review quality, and I had the opportunity to rebut claims with data. Each individual manuscript review is a reviewer roulette, a flawed process, but in aggregate it does create a body of work reviewed by experts in the field.

Consider now the soft measurement system. In my experience institutes reviewed all PIs at the same time. Some institutes do this with an external jury, with perhaps 10 individuals but maybe only 1-3 are actually experts on your topic. Other institutes do this with an internal jury, perhaps 3-5 individuals in the most senior posts. In each case, you have an extremely narrow range of experts, reviewing very large numbers of papers in a very short amount of time. In my latest review I had 79 articles over the prior 5 years. I doubt anyone actually read them all (I wouldn't expect them to). More realistically, I expect they read most of the titles, some of the abstracts, and perhaps 1-2 articles briefly. Instead, what would have heavily influenced the result is the general opinion of my scientific quality, which is going to be very dependent on the individuals involved. While both systems have treated me well, I have seen very productive scientists fall afoul of this system, simply because of major personality clashes with their head of department (who typically either selects the external board, or chairs the injury jury). Indeed, I have seen PIs leave the institute rather than be reviewed under this system, and (in my experience) the system has been a heavier burden on women and immigrants.

Better metrics

As part of the University of Leuven Department of Microbiology and Immunology board, I helped to fashion a new system which was built as a composite of metrics. The idea was to keep the transparency and objectivity of metrics, but to use them in a responsible manner and to ameliorate flaws. The system essentially used a weighted points score, building on different metrics. For publications in the prior 5 years, journal impact factor was used. For publications >5 years old, this was replaced by actual citations of your article. Points were given for teaching, Masters and PhD graduations, and various services to the institute. Again, each individual metric includes inherent flaws, and the basket of metrics used could have been improved, but the ethos behind the system was that by using a portfolio of weighted metrics you even out some of the flaws and create a transparent system.

The path forward

I hope it is clear that I recognise the flaws present in metrics, but also that I consider metrics to confer transparency and to be a valuable safeguard against the inevitable political clashes that can drive decisions by small juries. In particular, metrics can safe-guard junior investigators against the conflicts of interest that can dominate when a small internal jury has the power to judge the value of output. Just because metrics are flawed doesn't mean the alternatives are necessarily better.

In my ideal world (in the unlikely scenario that I ever become an institute director!), I would implement a two-stage review system, using 7 years cycles. The first stage would be metric-based, using a portfolio of different metrics. These metrics would be in line with institute values, to drive the type of behaviour and outputs that are desired. The metric would include provisions for parental or sick leave, built into the system. They would be discussed with PIs at the very start of review period, and fixed. Everything would be above board, transparent, and realistic for PIs to achieve. Administration would track the metrics, eliminating the excess burden of constant reviewing on scientists.

For PIs who didn't meet the metric-based criteria a second system would kick in. This second system would be entirely metric-free, and would instead focus on the re-evaluation of their contributions. By limiting this second evaluation to the edge cases, substantial resources could be invested to ensure that the re-evaluation was performed in as unbiased a manner as possible, with suitable safeguards. I would have a panel of 6 experts (paid for their time), 3 selected from a list proposed by the PI and 3 selected from a list proposed by the department head. Two internal senior staff would also sit on the panel, one selected by the PI and one selected by the department head. The panel would be given example portfolios of PIs that met the criteria of tenure-review, to bench-mark against. The PI would present their work and defend it. The panel would write a draft report and send it to the PI. The PI would then have the opportunity to rebut any points on the report, either in writing or as an oral defence, by the choice of the PI. The jury would then make a decision on whether the quality of the work met the institute objectives.

I would argue that this compound system brings in the best of both worlds. For most PIs, the metric-based system will bring transparency and will reduce both stress and paperwork. For those PIs that metrics don't adequately demonstrate their value, they get the detailed attention that is only possible when you commit serious resources to a review. Yes, it takes a lot of extra effort from the PI, the jury and the institute, which is why I don't propose it to run for everyone.

TLDR: it is all very well and good to celebrate when an institute says it is going to drop impact factors in their tenure assessment, but the reality is that the new systems put in place are often more political and subjective than the old system. Thoughtful use of a balanced portfolio of metrics can actually improve the quality of tenure review while reducing the stress and administrative burden on PIs.

Reader Comments (1)

In what ways can soft measurement systems in tenure reviews be detrimental to academic professionals?

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>