First of all, a disclaimer. The opinions expressed here are mine and mine alone, and do not necessarily reflect the official opinions or policies of the NDCA.

Second, it is important to keep in mind that everything I’m saying here derives from LD. Policy pools are differently composed and differently managed from LD pools, and PF, of course, gets random judges.

What I wanted to do in this post, instead of talking about mutual preferences in a theoretical vacuum, was to look at exactly how mutual preferences have been working at tournaments. So I did some number crunching. Among the tournaments I tabbed last year with MJP, using first TRPC and then tabroom.com (which does a much better job with automated mutuality), are Yale, Bronx, Princeton and Columbia. These 4 tournaments add up to 2928 prelim rounds of Varsity LD. All were similarly divvied up into 6 tiers, 18% each of the first 5 and 8% strikes (plus varying numbers of conflicts). They were all large to very large fields, with judge pools hovering around 50% of the size of the field. I think that these numbers are large enough, and stable enough, for decent analysis.

Of the 2928 rounds:

2538 were paired 1-1. That is, 87%.

132 were 2-2. 5%.

This means that 93% (rounded) were either 1-1 or 2-2.

4% were 3-3. This means that 95% (taking into account the rounding) were 1-1, 2-2 or 3-3.

2% were 4-4; 1% were 5-5; 2% were 1-offs (usually 1-2, maybe lower); 1% were 2-offs (or worse).

Important note: Easily half of the “bad” pairings (3-3 or worse) are the result of teams’ being out of competition. As a rule, we would work toward insuring that anyone down two or fewer got better prefs. The most egregious pairings are desperation ballot pushes, for which no system can be held accountable.

Minor note: Making it 6 equal tiers (more strikes) would have little statistical impact.

Another minor note: There are always some folks who don’t rank, but the impact of more ranking being done is relatively small according to the quick comparisons I evaluated on the side. And lately not preffing is more and more a rarity.

And here’s the bottom line: going by strict mutuality and 6 traditionally divided tiers, MJP in LD delivers better than 93% 1s and 2s in all in-competition rounds.

There is a lot of controversy over mutuality per se, or at least there is a lot of variation on how MJP is handled from tournament to tournament. (There is even a difference of opinion whether to call it MJP or MPJ!) I have been arguing in favor of strict constructionism, on the assumption that equal desirability is the fairest way to adjudicate a round. The counter argument, preferring the wrong side of non-mutual pairings of 1-2 over 3-3 or less, values the ability of the given judge to adjudicate over that judge’s favorability, saying that in a high stakes round a judge who might be more favorable to your opponent than you is nevertheless preferable to what you perceive of as a less qualified judge. In other words, an assumption is made that anyone you’ve ranked a 3 or less is simply not that qualified as a judge. (There are plenty of other reasons to rank a judge in your bottom half, but given their infrequency in the analysis above, they’re mostly beside the point.)

Those who know me know I’ve gotten my knickers in quite a twist over this 1-2 v. 3-3 argument. The evidence above demonstrates that I was silly to do so. At least in the LD world I’ve been inhabiting, it really isn’t an issue. In a world where there are 8-10 judges in each tier, i.e., 8-10 1s, 8-10 2s, etc., which is the usual world I’ve been tabbing for as many years as I’ve been using MJP, we can see that about 95% of all rounds are mutual 3-3s or better. 93% of all rounds are mutual 2-2s or better. 87% are 1-1s. When we get into the smaller numbers, we lose some of the qualitative value, but it’s no leap to assert that at least 90% of all rounds are 1-1 or 2-2.

My strict construction of mutual seems pretty silly in the face of the small numbers where it matters. More to the point, it really does matter to the teams who are that small number. A 3-3 may not be the most desirable assignment, obviously, but at some point one might have to debate in front of someone less than one’s most desirable judges. But 4s and 5s (assuming our tiers of 8-10) may not serve anyone’s interest. They are the judges those occasional teams who get them have decidedly agreed they don’t want; why should my bullheadedness keep those debaters from improving their lot, if it can be done?

I concede.

However, the question arises, what to concede to. I’ve talked this over a little bit, and want to do more before coming down with a solution. The problem is, despite all our attempts otherwise, we still might get 4-4s or worse for in-contention teams going by strict mutuality. What do we do? We can give them the bad side of a 1-2, which some might see as the best solution. But as I’ve been told, if you’re on the wrong side of that, there might be quite a bit of distance in how you’ve preffed that judge (your #1 vs my #20, for example), and no matter how you slice it, you’re debating in front of one of their 1s. A suggested solution here is to, first, look for a 2-3. Being on the wrong side of a 2-3 is much less damaging, this argument goes, than being on the wrong side of a 1-2. In this case, you’re only debating in front of one of their 2s, who will presumably be less disposed to voting for their style than their 1s would be.

Interesting question, I think. I’m looking forward to hearing people’s opinions.

Some interesting statistics on MJP in LD

Job Postings