Which current player is No. 2 among Test batsmen of all time when measured against the performances of the average player?

Joe Root does 22 runs better than the average replacement player. What about Steven Smith? AFP

When players are compared, contextualisation is key. How well does a player do in South Africa, England, New Zealand and Australia is a question frequently raised, for example. How does a player perform when his team wins versus when they lose? Everyday debates among cricket fans often take the form "Player X played this unforgettable innings against this monster attack on this minefield. This conclusively proves that he can play here. What has your player Y done that can compare to this?"

The general point of contextualisation is to say: given a certain set of circumstances, how does a player perform compared to the average player? The VORP (Value Over Replacement Player) measure in baseball is a sporting example. As yet, cricket does not have a comparable conventionally accepted measure.

In this essay I attempt to devise a Runs Above Average Replacement (RAAR) measure for Test batsmen. This answers the following question: "Given the range of contexts faced by Test batsmen, if a given player was replaced by the average player, what would be the difference (in terms of runs) in the player's contribution to the team?"

Broadly, two approaches are possible. In a maximalist approach, a large number of factors and multiple scales and classifications are used. For example, in addition to the three primary measures (runs, wickets and deliveries), secondary measures like wins, player-of-the-match prizes, and some categorical classification of pitches and decisions at the toss could be taken into account. The ICC's official player ratings and Anantha Narayanan's work in these pages offer examples of this approach.

The advantage of the maximalist approach is that it explicitly accounts for a large number of factors. The drawbacks are that, first, every criterion has to be justified, and this is often difficult, especially with secondary measures. Second, as more criteria are added, the risk that some factors are unintentionally implicitly repeated increases. This means that the intended weight of a factor is often not the same as the actual weight afforded to the factor in the calculation.

In a minimalist approach, such as the one adopted here, the intent is to make the smallest possible number of assumptions to define the context.

There are only two definite measures in a Test match - the number of wickets (40, 20 per team) and the possible results (win, loss, draw and tie). From a given team's point of view, exactly four results are possible in a Test that is not abandoned - win, loss, draw and tie. Each team can lose 0 to 20 wickets in a Test and this can result in all four results. A tie is possible in Tests where ten to 40 wickets fall. A draw is possible in Tests where 0 to 39 wickets fall. A winning or losing team can, in theory, lose between 0 and 20 wickets.

If a team wins by an innings after scoring 550, then the playing conditions appear unusually batting-friendly for that team. For the opposing team that has lost 20 wickets for less than 550 runs, the conditions appear otherwise. The intuition here is that for a team, Tests of the same set (number of wickets, result) are generally more similar to each other than Tests of different sets (number of wickets, result) in terms of how hard it was for the team's players to bat and bowl in the Test.

In theory, a team can win a Test after losing 0 to 20 wickets. In practice no team has won a Test after losing zero or one wicket in a match. Five have won after losing two wickets. Teams most commonly win after losing ten or 20 wickets. When Test wins are grouped together by the number of wickets spent in the match, the average cost of a wicket steadily declines as the number of wickets that fall rises. The average team that wins after losing ten wickets scores 44.7 runs per wicket, while the average team that wins after losing 20 wickets scores 27.5 runs per wicket. Teams that win after spending 15 wickets average 39.7 runs per wicket. Batting in these three different types of Tests are distinct propositions. It is one thing to score a Test hundred in a total of 450. It is another to do it in a total of 270.

In the case of a draw, parity between both teams is rare. The more common draw involves one team doing better than the other, but not well enough to win outright. There is an argument for treating "wickets, result, team" as the minimal category instead of "wickets, result", but in my view this goes against the spirit of the draw or tie result, which is, by its definition, shared by both teams.

It is exceedingly rare for a team to lose a Test match without having lost 20 wickets. Teams have lost Tests after losing 19 wickets 15 times, and 18 wickets seven times. In all, teams have lost Tests after losing fewer than 20 wickets 33 times. In contrast, in 1585 defeats (98%) the losing team has lost 20 wickets. The average losing team that is bowled out twice spends 22 runs per wicket.

These averages vary by host country. The table below shows the average runs per wicket for a team that won a Test match in each host country after losing 20 wickets. This suggests that if the average runs per wicket in each of the over 80 categories of Test based on various combinations of wickets lost and results (of which only 71 have featured in Test history so far) is taken to represent each of those types of Tests, then it should be biased for each host country. This also indicates that other criteria, such as venue or opposition, could be meaningful for biasing purposes. This biasing is done, in this instance, simply by including the matches from the relevant host venue twice.

The choice to simply repeat a record to achieve biasing is arbitrary. It could be argued that the record should be repeated twice or three times instead of just once.

The extent to which the biasing criterion reshapes the resulting record depends on the amount of information available. More than one biasing criterion can be used, and the more specific the biasing criteria, the smaller their influence: matches in country C are a subset of all matches, and matches at ground G are a subset of matches in country C, and matches at ground G against opponent O are a subset of matches at ground G. For example, 53 Test teams have won after losing 18 wickets in a Test. Thirteen of these wins have come in England, while only three have come in India. So if a team wins a Test at Lord's losing 18 wickets, the biased records will include the 53 base records, plus 13 England-specific records, plus four Lord's-specific records. If a team wins after losing 18 wickets at Chepauk, the biased record will be 53 base records, plus three India-specific records, plus two Chepauk-specific records.

The results presented here bias by host country and ground. If applied to the Melbourne Test between Australia and New Zealand last year, the runs per wicket for the average player (or average replacement) is 40.6 for Australia (used up 15 wickets, won by 247 runs), and 22.2 for New Zealand (bowled out twice and lost). Steve Smith made 92 runs for two dismissals in the match. His runs above average replacement for the match is (92 ÷ 2 - 40.6), or +5.4. Kane Williamson made 9 in his two innings, and his runs above average replacement for the match is -17.7.

The table below shows the Runs Above Average Replacement per Test for the top 55 run getters in Tests, from Sachin Tendulkar to Leonard Hutton. The figures in the last column are the representative runs per wicket for the average player in the Tests the player featured in.

Steven Smith has played in Tests where the average representative runs-per-wicket figure is 39, and in these Tests, he has produced 30.2 runs per Test more than that. For Ricky Ponting, the representative runs per wicket was 43.1 and he produced 15.6 runs more. The point here is not simply to say how much better each player has been compared to the average replacement given the distribution of match contexts (wickets lost, result) that apply to the player's team. Crucially, the figure is not based only on the performance of, say, the West Indies side that Brian Lara was a member of in a given Test match; it is based on all Test sides in all Test cricket that are similar to the West Indies side Lara was part of in a given Test match. This distinguishes it from other approaches to contextualisation that do not rely on a generalised formulation of resemblance between Test teams and matches.

Given the way the match contexts are defined here, a lower runs-per-wicket figure for the average replacement implies that the player in question played in a weaker team. This provides a way to compare the teams that different players played in at a glance. One surprise for me was to find that the average team Shivnarine Chanderpaul played in was marginally stronger than the average team Lara played in. This actually makes some sense because all teams Chanderpaul played in also involved Lara, while the converse is not true.

Finally, the method described here is systematically extensible to produce a maximalist contextualisation. Whether or not one chooses to do so depends on how one sees cricket.