The TrueSkill ranking system is a skill based ranking system for Xbox Live developed at Microsoft Research. The purpose of a ranking system is to both identify and track the skills of gamers in a game (mode) in order to be able to match them into competitive matches. The TrueSkill ranking system only uses the final standings of all teams in a game in order to update the skill estimates (ranks) of all gamers playing in this game. Ranking systems have been proposed for many sports but possibly the most prominent ranking system in use today is ELO..
So, what is so special about the TrueSkill ranking system? In short, the biggest difference to other ranking systems is that in the TrueSkill ranking system skill is characterized by two numbers:
The ranking system maintains a belief in every gamer’s skill using these two numbers. If the uncertainty is still high, the ranking system does not yet know exactly the skill of the gamer. In contrast, if the uncertainty is small, the ranking system has a strong belief that the skill of the gamer is close to the average skill.
On the right hand side, a belief curve of the TrueSkill ranking system is drawn. For example, the green area is the belief of the TrueSkill ranking system that the gamer has a skill between level 15 and 20.
Maintaining an uncertainty allows the system to make big changes to the skill estimates early on but small changes after a series of consistent games has been played. As a result, the TrueSkill ranking system can identify the skills of individual gamers from a very small number of games. The following table gives an idea of the average number of games per gamer that the system ideally needs to identify the skill level:
Game Mode | Number of Games per Gamer |
16 Players Free-For-All | 3 |
8 Players Free-For-All | 3 |
4 Players Free-For-All | 5 |
2 Players Free-For-All | 12 |
4 Teams/2 Players Per Team | 10 |
4 Teams/4 Players Per Team | 20 |
2 Teams/4 Players Per Team | 46 |
2 Teams/8 Players Per Team | 91 |
The actual number of games per gamer can be up to three times higher depending on several factors such as the variation of the performance per game, the availability of well-matched opponents, the chance of a draw, etc. If you want to learn more about how these numbers are calculated and how the TrueSkill ranking system identifies players’ skills, please read the Detailed Description of the TrueSkill™Ranking Algorithm or find out in the Frequently Asked Questions.
If you play a ranked game on Xbox Live, the TrueSkill ranking system will compare your individual skill (the numbers μ and σ) with the skills of all the game hosts for that game mode on Xbox Live and automatically match you with players with skill similar to your own. But how can this be done when every player’s skill is represented by two numbers? The trick is to use the (hypothetical) chance of drawing with someone else: If you are likely to draw with another player then that player is a good match for you! Sounds simple? It is!
Most games have at their root a metric for judging whether the game’s goals have been met. In the case of matches involving two or more players (“multiplayer matches”), this often includes ways of ranking the skills of match participants. This encourages competition between players, both to “win” individual matches, and to have their overall skill level recognized and acknowledged in a broader community. Players may wish to evaluate their skills relative to people they know or relative to potential opponents they have never played, so they can arrange interesting matches. We term a match “uninteresting” if the chances of winning for the participating players are very unbalanced – very few people enjoy playing a match they cannot win or cannot lose. Conversely, matches which have a relatively even chance of any participant winning are deemed “interesting” matches.
Many ranking systems have been devised over the years to enable leagues to compare the relative skills of their members. A ranking system typically comprises three elements:
In particular, the ELO ranking system has been used successfully by a variety of leagues organized around two-player games, such as world football league, the US Chess Federation or the World Chess Federation, and a variety of others. In video games many of these leagues have game modes with more than two players per match. ELO is not designed to work under these circumstances. In fact, no popular skill-based ranking system is available to support these games. Many one-off ranking systems have been built and are in use for these games, but none of them is general enough to be applied to such a great variety of games.
The TrueSkill ranking system is a skill-based ranking system designed to overcome the limitations of existing ranking systems, and to ensure that interesting matches can be reliably arranged within a league. It uses a technique called Bayesian inference for ranking players.
Rather than assuming a single fixed skill for each player, the system characterizes its belief using a bell-curve belief distribution (also referred to as Gaussian) which is uniquely described by its mean μ (speak [mju:]) (“peak point”) and standard deviation σ (speak [sigma])(“spread”). An exemplary belief is shown in the figure on the right. Note that the area under the skill belief distribution curve within a certain range corresponds to the belief that the player’s skill will lie in that range. For example, the green area in the figure on the right is the belief that the player’s skill is within level 15 and 20. As the system learns more about a player’s skill, σ has the tendency to become smaller, more tightly bracketing that player’s skill. Another way of thinking about the μ and σ values is to consider them as the “average player skill belief” and the “uncertainty” associated with that assessment of their skill.
Since the TrueSkill ranking system uses a Gaussian belief distribution to characterize a player’s skill, all mean skills (that is, μ‘s) will always lie within ± 4 times the initial σ (more precisely with probability 99.99%). Experimental data tracking roughly 650,000 players over 2.8 million games support this claim: Not a single μ ever happened to be outside the range ± 4 times the initial σ and 99.99% of the μ‘s happen to be even within ± 3 times the initial σ.
Interestingly, the TrueSkill ranking system can do all calculations using an initial uncertainty of 1, because then μ and σ can be scaled to any other range by simply multiplying them. For example, suppose all calculations are done with an initial μ of 3 and σ of 1. If one wishes to express player’s skill as one of 50 “levels”, multiply μ and σ by 50/6 = 8.3 because almost all μ‘s happen to be within ± 3 times the initial σ.
The intuition is that the greater the difference between two player’s μ values – assuming their σ value are similar – the greater the chance of the player with the higher μ value performing better in a game. This principle holds true in the TrueSkill ranking system. But, this does not mean that the players with the larger μ‘s are always expected to win, but rather that their chance of winning is higher than that of the players with the smaller μ‘s. The TrueSkill ranking system assumes that the performance in a single match is varying around the skill of the player, and that the game outcome (relative ranking of all players participating in a game) is determined by their performance. Thus, the skill of a player in the TrueSkill ranking system can be thought of as the average performance of the player over a large number of games. The variation of the performance around the skill is, in principle, a configurable parameter of the TrueSkill ranking system.
The TrueSkill ranking system will base its update of μ and σ on the game outcome (relative ranking of all teams) only; it merely assumes that the outcome is due to some unobserved performance that varies around the skill of a player. If one is playing a point based game and the winner beats all the other players by a factor of ten, that player’s victory will be scored no differently than if they had only won by a single point. Every match provides the system with more information about each player’s skill belief, usually driving σ down.
Before starting to determine the new skill beliefs of all participating players for a new game outcome, the TrueSkill ranking system assumes that the skill of each player may have changed slightly between the current and the last game played by each player. The mathematical consequence of making such an assumption is that the skill uncertainty σ will be slightly increased, the amount of which is, in principle, a configurable parameter of the TrueSkill ranking system. It is this parameter that both allows the TrueSkill system to track skill improvements of gamers over time and ensures that the skill uncertainty σ never decreases to zero (“maintaining momentum”).
In order to determine the new skill beliefs of all the participating players for a new game outcome, the TrueSkill ranking system needs to determine the probability of the observed game outcome for given skills of the participating players and weight it by the probability of the corresponding skill beliefs. This is done by averaging over all possible performances (weighted by their probabilities) and deriving the game outcome from the performances: The player with the highest performance is the winner; the player with the second highest performance is the first runner up, and so on. If two players’ performances are very close together, then the TrueSkill ranking system considers the outcome between these two players a draw. The larger the margin which defines a draw in a given league, the more likely a draw is to occur, according to the TrueSkill ranking system. The size of this margin is a configurable parameter of the TrueSkill ranking system and is adjusted based on the game mode. For example, a street race in Project Gotham Racing 3 can never end in a draw (thus the parameter is set to zero) whereas a Capture-the-Flag game in Perfect Dark Zero can easily end in a draw.
By virtue of the above weighting technique (which is also known as Bayes’ Law), the system arrives at a new skill belief for every player participating in the game. These skill beliefs are not Gaussian anymore. Hence, the TrueSkill ranking system determines the best Gaussian approximation. As a result, given players’ μ values increase for each opponent they out-performed, and decreases for each opponent they lost against. The following table gives before and after values for μ and σ for each (hypothetical) participant in an 8-player match.
Name | Outcome | Pre-Game μ | Pre-Game σ | Post-Game μ | Post-Game σ |
Alice | 1^{st} | 25 | 8.3 | 36.771 | 5.749 |
Bob | 2^{nd} | 25 | 8.3 | 32.242 | 5.133 |
Chris | 3^{rd} | 25 | 8.3 | 29.074 | 4.943 |
Darren | 4^{th} | 25 | 8.3 | 26.322 | 4.874 |
Eve | 5^{th} | 25 | 8.3 | 23.678 | 4.874 |
Fabien | 6^{th} | 25 | 8.3 | 20.926 | 4.943 |
George | 7^{th} | 25 | 8.3 | 17.758 | 5.133 |
Hillary | 8^{th} | 25 | 8.3 | 13.229 | 5.749 |
One can see that σ values – the uncertainty in the skill for each player – is lower after the match, substantially more so for the players on the 4th and 5th rank (Darren and Eve) Those two players have the property that they are “bracketed” by a maximal number of players in terms of defeat: They were defeated by 3 (or 4) players and they defeated 4 (or 3) other players. In contrast, the first player (Alice) is simply known to be better than the 7 other players which does not constraint her skill from above: She may be even better than level 36.771. This is reflected in the larger uncertainty of 5.749.
The simplest case for an TrueSkill ranking system update is a two-person match. Suppose we have players A(lice) and B(ob), with μ and σ values (μ_{A},σ_{A}) and (μ_{B},σ_{B}), respectively. Once the game has finished, the update algorithm determines the winner (Alice or Bob) and loser (Bob or Alice) and applies the following update equations (we disregard the possibility of a draw for the sake of simplicity here):
In these equations, the only unknown is β^{2} which is the variance of the performance around the skill of each player. Moreover, ε is the aforementioned draw margin which depends on the game mode. But what do the functions v(.,.) and w(.,.) look like? Instead of giving the exact definitions, let us have a look at plots of these functions for varying values of ε/c:
There are a few observations about these update equations:
Win/Loss
If the winner had the much bigger mean skill relative to the total uncertainty (thus (μ_{winner}–μ_{loser}) > ε) then a win cannot buy the winner extra mean skill points or remove any uncertainty. The opposite is true if the game outcome was surprising: If the winner had the smaller mean skill (μ_{winner}–μ_{loser}) > ε), mean points proportional to μ_{loser}–μ_{winner} get added/subtracted to/from the winner/loser.
Draw
If both player had similar mean skills upfront (thus (μ_{winner}–μ_{loser}) > ε) then both player are already close enough together and no mean skill point update needs to be made; hence the uncertainty is not reduced. However, if one player was thought to be much stronger by the TrueSkill ranking system before the game (let’s say, μ_{winner}–μ_{loser}) > ε) then his mean skill will be decreased and the other player’s mean skill will be increased which, in effect, brings their two mean skill closer together.
The mean skill update equations of the TrueSkill ranking system are similar to the update equations of the ELO algorithm. The key difference is that a variable K factor is used for both players mainly depending on the ratio of the uncertainties of the two players. Hence, playing against a very certain player in the TrueSkill ranking system allows the uncertain player to move up or down in larger steps than in the case when playing against another uncertain player.
But how does the TrueSkill ranking system incorporate the game outcome of a team match? In this case, the team’s skill is assumed to be the sum of the skills of the players. The algorithm determines the sum of the skills of the two teams and uses the above two equations where (μ_{winner},σ^{2}_{winner}) and (μ_{winner},σ^{2}_{loser}) are the mean skills and skill variances of the winning and losing team, respectively.
The update equations for more than two teams are not possible to write down as they require numerical integration (the above plots have been obtained by using the same numerical integration code). In this case the TrueSkill ranking system iterates two team update equations between all teams on neighboring ranks, that is, the 1^{st} versus the 2^{nd} team, the 2^{nd} team versus the 3^{rd} team and so on. If you want to learn more about this variant of the TrueSkill ranking algorithm.
Matchmaking is an important service provided by gaming leagues. It allows participants to find team-mates and opponents who are reasonably close to their own skill level. As a consequence, it is likely that the match will be interesting, as all participants have roughly the same chances of winning.
TrueSkill ranking system’s skill beliefs are based upon probabilistic outcome models and thus enable players to be compared for relative chance of drawing. The more even the skills of match participants, the more likely it is that this configuration of players will end up in a draw, and the more interesting and fun the match will be for every participant. For example, for two players A(lice) and B(ob) with skill beliefs (μ_{A},σ_{A}) and (μ_{B},σ_{B}), the (re-scaled) chance of drawing is given by:
This number is always between 0 and 1 where 0 indicates the worst possible match and 1 the best possible match. Even if two players have identical μ values, uncertainty σ affects the quality of the match; if either of the σ values σ_{A} or σ_{B} is large, then the match quality criterion is significantly smaller than 1!
Using the two parameters μ and σ which characterize a belief in a player’s skill the TrueSkill ranking system ranks players using the so-called conservative skill estimate = μ – k*σ. This estimate is called conservative because it is a conservative approximation of the player’s skill: it is extremely likely the players actual skill is higher than the conservative estimate. The bigger the value of k the more conservative the estimate; a common value of k is 3.
If you still want to know more about the TrueSkill ranking system, you can go and check out:
The TrueSkill™ ranking system is a skill based ranking system for Xbox Live developed at Microsoft Research.
We have setup eight hypothetical players (Alice, Bob, Chris, Darren, Eve, Fabian, George and Hillary) that can be arranged in any team configuration with the drop-down list. We use color names for the teams: red, blue, green, magenta, brown, black, orange, gray. The game outcome will be specified by assigning teams to ranks; rank 1 is the winner and rank 8 is the lowest scoring team. If two teams on consecutive ranks draw, then just check the Draw checkbox. In the form-based version, your changes will not be directly applied; if results (on the right hand side) are invalid they will be stroked and greyed out. You only need to press the Recalculate Skill Level Distribution button (only in the form based version). The match quality of the current team assignment of players and their skills in terms of Mu and Sigma is the quality of the match before it is played so the game outcome (Team Ranking/Game Outcome column) cannot change the match quality.
We have defined a number of pre-defined team scenarios for your in the Select Game Mode drop-down list (at the bottom). If you wish to define your own team configurations you can either add or delete players (Add Player and Delete Player button). Also, if you want to quickly reset every player’s skill to the initial skill assigned when the TrueSkill ranking system sees you for the first time, you can press Reset Skills. If you want to try out a series of game outcomes, you can quickly copy all calculated skills into the skill input boxes on the left by pressing the After->Before (or Before < - After) button.
Finally, the TrueSkill ranking system allows to set the chance of drawing for two equally skilled players in the Draw Probability text box. For example, for a Street Race type of game you should set 0% whereas for a Capture-the-Flag type of game, you may want to try 25%. Ok, enough said: Enjoy!
Ok, we claim that it only takes 8 straight wins in an tightly matched 8-player-Free-for-All games for a player to get to Level 50 player. Here is the proof using the interactive rank calculator. We will use Alice to win all the games. Use form or AJAX based calculator.
1: Initially, you should see the following screen. If not, then select ‘8 Free For All’ in the Select Game Mode drop-down list.
2: Since we are assuming a tightly matched game, we set the uncertainty of each other player to 1. Note that all the skill estimates and the match quality have been greyed out to indicate that they are no longer valid. Also, we did not need to change the team ranking; we want to assume that Alice won and the red team, Alice‘s team, is already assigned to the highest rank, rank 1.
3: Now you have to press the Recalculate Skill Level Distribution button.
4: Now you can already see the updated skill of each player on the right hand side. Also, we see that this match would have had a high quality of roughly 40%. In order to simulate the next match, we press the After -> Before button. This copies all values (for example, Mu=34.927 and Sigma=5.374) into the left hand side. Note, again, that a click on the After->Before button greys out all results as they become invalid.
5: Again, we will assume a tightly matched game and thus, every player will get a Mu of 34.927 and all other player’s Sigma except Alice’s will be set to 1. This should give the following output.
6: Now we keep on repeating steps 3, 4 and 5 seven more times. We then get the following seven screen.
7: There you have it! After the 8th tightly matched game, Alice has a conservative skill level of 56.995 – 3*1.901 = 51.292. Now try to find out many games it would need for Alice to reach level 50 if she only played Head-to-Head.
Here is a list of questions that gamers have sent us. We have grouped the questions into several categories linked in the right hand column of this page. If you do not find the answer to your question, simply send an Email to trueskill.
Q: Why is the ranking system called TrueSkill™ ranking system?
A: We decided to use this name because this is the defining feature of the ranking system: it quickly identifies a gamer’s true skill. The primary purpose of the TrueSkill system is to minimize the number of games necessary to find out a gamers’ skill in order to optimize matchmaking.
Q: How did you compute the average number of games until convergence for the TrueSkill ranking system?
A: One way to think about the TrueSkill ranking system is that it attempts to identify the correct ordering of n players in terms of 50 skill levels. If each ordering is equally likely, a computer would need log_{2}(50) many bits of information to uniquely encode the skill level of a player. Now, assume that 2 players play a Head-to-Head game. Disregarding draws, the game outcome can provide 1 bit of information (which of the two players was the winner). Since each of these games requires 2 players, the system needs 2*log_{2}(50) many Head-to-Head games per player. Note that the particular Head-to-Head games have to be chosen such that they, in fact, do carry one bit of information. Interestingly, every match-made game where the game outcome is not predictable ahead of time ensures that the game is informative! In general, with k teams of m players in each team, one game outcome provides log_{2}(k!) bits but it needs k*m players per game so in the most general case, the system needs k*m*log_{2}(n)/log_{2}(k!) many games per player. And this is the equation we used in the table!
Of course, this calculation is idealized. There are several factors that increase the number of games necessary:
But, there are also several factors that decrease the number of games necessary:
Overall, we observed in our experiments that the sum of these effects leads to an increase by a factor of 2 – 3 in the numbers of games necessary per gamer.
Q: What is the difference between skill and performance?
A:The TrueSkill ranking system implicitly uses a performance model that represents your (hypothetical) score in a particular game. Skill is the average performance. The TrueSkill ranking system maintains a belief in your skill and assumes that your performance in a particular game varies around your skill.
Q: The default TrueSkill of a new player is 25, right?
A: That’s not fully correct. The TrueSkill value that is displayed in the leaderboard is the conservative estimate of a player’s skill, computed from two hidden parameters that are used to track a player’s skill: the mean skill μ and the skill uncertainty σ. The TrueSkill value is then μ-3*σ. What is correct is that a new player is assigned a mean skill of μ=25 and a skill uncertainty of σ=8.333. Thus, the TrueSkill of a new player is 25-3*8.333 = 0. Note that these two choices for μ and σ effectively mean that a new player’s skill can be anywhere from 0 to 50, representing a state of complete uncertainty about their skill.
Q: How many games do I have to win before I go up one level?
A: This depends a lot on how many games you have already played, how many games your opposition have already played and what type of games you play. It is a strength of the TrueSkill ranking system to move you up very quickly early on but to reduce the step-size in the updates after a series of consistent games. In general, the more people per team, the longer it takes to go up or down one level. But the more teams per game, the faster you can go up or down. Here is a list of game modes and number of wins necessary before you go up a level (assuming you have already played a fair number of games; otherwise you usually go up one level in one game).
Game Mode | Number of Games per Gamer |
8 Players Free-For-All | 3 |
4 Players Free-For-All | 4 |
2 Players Free-For-All | 7 |
4 Teams/2 Players per Team | 5 |
2 Teams/4 Players per Team | 10 |
Q: How many games do I have to lose before I go down one level?
A: These numbers exactly equal the numbers given in the last question. The TrueSkill ranking system has no preferred direction of changing the skill belief.
Q: I have been playing a lot of unranked training games and I think I am now a much more skilled player. Will the TrueSkill ranking system be able to identify my new, higher skills? If so, how many games do I have to play before the TrueSkill ranking system knows my new skill?
A: The TrueSkill ranking is assuming a small skill change between any two consecutive games in a game mode so it is able to identify your new, higher skill. But, if your skill has completely changed (you became the best player in the world from previously being the worst player in the world), then you would need to play a large number of games. We designed the system such that it would need between 50 – 100 games before the system would be able to track a substantial skill increase/decrease.
Q: If I understand the TrueSkill update formula correctly then the change in μ is largest for the first few games and decreases over time. Thus, my first few games are most important; if I lose these games, it will take the TrueSkill much longer to converge to my skill. Right?
A: Not exactly right. It is correct, that the change in μ is getting smaller and smaller with every game played, but regardless if you win or lose them. However, TrueSkill always takes more recent game outcomes more into account than older game outcomes. Hence, when playing against a set of players of same skill multiple times, a late win counts more than an early win. As an example, try the following in the interactive rank calculator (we will choose Alice for the analysis and assume a draw probability of 10%)
Scenario 1: One win followed by one loss: Final TrueSkill rank = 13
Scenario 2: One loss followed by one win: Final TrueSkill rank = 16
As you can see, winning the second game rather than the first actually resulted in a skill estimate ~2.5 levels higher than winning the first game and losing the second (to be precise, it is 2.586 = 26.293 – 23.707)! Note, however, that in this example the second game is not very well match-made. If all games are perfectly match-made, then the situation reverses. The reason is that the second game is lost against a stronger opposition or won against a weaker opposition. Try it out yourself in the interactive rank calculator.
Q: What other ranking systems are there?
A: It is impossible to enumerate all available ranking systems here. But, in order to illustrate the wide range of systems out there, let us give a few examples:
There is an interesting article Collective Choice: Competitive Rating Systems by Christopher Allen covering some of the above ranking systems.
Q: I am a chess player and I have played online chess at the Free Internet Chess Server. They use a system called Glicko which uses rating deviations. What is the relation between the TrueSkill ranking system and the Glicko ranking system?
A: The Glicko system was developed by Mark E. Glickman, chairman of the US Chess Federation (USCF) ratings committee. To the best of our knowledge, Glicko was the first Bayesian ranking system. Similarly to the TrueSkill ranking system, the Glicko system uses a Gaussian belief over a player’s skill which can be represented by two numbers: The mean skill and the variation of the skill (called rating deviation in the context of Glicko). There are a few differences between the TrueSkill ranking system and Glicko:
So, what is the difference to the Glicko system? Glicko was developed as an extension of ELO and was thus naturally limited to two player matches which end in either win or loss. Glicko cannot update skill levels of players if they compete in multi-player events or even in teams. The logistic model would make it computationally expensive to deal with team and multi-player games. Moreover, chess is usually played in pre-set tournaments and thus matching the right opponents was not considered a relevant problem in Glicko. In contrast, the TrueSkill ranking system offers a way to measure the quality of a match between any set of players.
Q: I am always playing together in the same team with my friend JoeDoe. Will the TrueSkill ranking system be able to differentiate between us two in terms of skills? In other words, is the TrueSkill ranking system capable of finding that I am the more skilled player of us two?
A: If both you and your friend only play ranked team games together then the TrueSkill ranking system cannot distinguish between you two; it always compares the team’s skills (sums of the player’s skills in the teams) and ‘distributes’ the gain/loss proportional to the individual player’s uncertainties (see detailed description). But note: if your friend also plays team games with anyone other than you then the TrueSkill ranking system will be able to identify the more skilled player of your two. Also, if both of you always only play together, you might consider forming a clan.
Q: Why does it take so many more games until convergence if I play a team game as opposed to a Free-for-All game?
A: The problem is that very little information about the individual player’s skill is contained when only exploiting which of two teams wins or if the two teams draw. This is effectively only up to 1.6 bit of ‘information’ that needs to be ‘shared’ between all players participating in the game. More specifically, consider these two scenarios:
Q: How will a team killer be ranked in the TrueSkill ranking system?
A: In the TrueSkill ranking system, the team skill is the sum of the skills of all players in the team. The TrueSkill ranking system has the potential to assign a negative skill to a player; if such players are added to a team, then the skill of the team goes down (because a team killer both reduces the chance to score against the other team or might even inflict negative points by suicide). Fortunately, the TrueSkill ranking system’s matchmaking procedure will eventually ensure that team killer will only play each other. And this can only be a good thing.
Q: I am playing a team game and all the players in my team drop out of the game. Of course, I lose the game. Will I lose as many skill points as all the people who left me standing in the rain?
A: Unfortunately, yes. All alternative options are possible exploits for cheating:
But: Players who drop regularly from a team would eventually be identified by the TrueSkill ranking system as having a negative impact on the team skill and will eventually be matched with other players of that have a negative team impact. So, you should not see this happening to often if you are a player of average skill.
Q: You are saying that the TrueSkill ranking system assumes that the skill of a team is the sum of the skills of its players. I think this model is not appropriate: I am usually playing much better with people from my friends list rather than with random players. Will this assumption lead to incorrect rankings?
A: The assumption that the team skill is the sum of the skills of its players is exactly that: an assumption. The TrueSkill ranking system will use the assumption to adopt the skill points of individual players such that the team outcome can be best predicted based on the additive assumptions of the skills. Provided that you and your friends also play team games with other players now and then, the TrueSkill ranking system will assign you a skill belief that is somewhere between the skill when you are playing with your friends and the skill when you are playing as an individual. So, in the worst case, every other game is not with your friends: then you are slightly ranked too high when you play with random team players and slightly ranked too low when you play with your friends. But, if you mostly play with your friends only the system will identify your skill correctly for most of your games.
Q: Why can two players in a party not be in two different teams?
A: This would open the possibility to cheat. You could, for example, arrange to play each other and your friend always forfeits the game. This would not allow to boost you to the top of the league (try it out with our advanced interactive ranking calculator; press the After->Before button and Recalculate) but it would increase your skill level artificially. The TrueSkill ranking system always assumes that the game outcome is a result of your skills (in the game) and not of your skills to cheat.
Q: Does the TrueSkill ranking system reward individual players in a team game?
A: The only information the TrueSkill ranking system will process is:
The TrueSkill ranking system takes neither the underlying exact scores (flag captures, kills, time etc.) for each team into account nor which particular team member performed how well. As a consequence, the only way players can influence their skill updates is by promoting the probability that their team wins. Hence, “ball bitches”, “hill whores”, “flag fruits”, “territory twits”, and “bomb bastards” will hurt their individual TrueSkill ranks unless what they are doing helps their team. Obviously, it is difficult to update individual players’ skills from team results only. To understand the difficulty and the solution consider the following analogy: Suppose you have four objects (players), each having an unknown weight (skill). Suppose further that you have a balance scale (game) to measure weight (skill) but are always only allowed to put two objects on each side of the balance. If you always combine the same pair of objects, the only information you can get is which pair of objects is heavier. But if you recombine the players into different pairs you can find out about their individual skills. As a consequence, the TrueSkill ranking system will be able to find out about individual players’ skills from team outcomes given that players not only play in one and the same team all the time but in varying team combinations.
Q: I bought a 360 for my son for Xmas, and both of us have become seriously addicted to Halo 3 on XBox Live, particularly Team Slayer matches. Basing the skill change only on the team performance yields pretty counterintuitive results. For example, I often play a string of team slayer games where I am MVP (Most Valuable Player), which means I outscore everyone. But if my team loses those games, I gain no skill. Then, I can play poorly, but if my team wins I gain skill. This lack of feedback from individual performance is frustrating and makes your skill level beholden to the performance of the rest of your team, which is usually not under your control unless you explicitly team up with friends
A: Great that you are enjoying your 360 and Halo 3.
The question you are asking has indeed been raised by quite a few people and we had many discussions about it. However, we always return to our point of view that in a team game the only way to assess someone’s skill towards the team objective is to consider the team objective only. Any auxiliary measurements such as number of flags carried, number of kills, kill-death spread, etc, all have the problem that they can be exploited thereby compromising the team objective and hence the spirit of the game. If flag carries matter, players will rush to the flag rather than defend their teammates or their own flag. Some may even kill the current flag carrier of their own team to get the flag. If it is number of kills, people will mindlessly enter combat to maximize that metric. If it is K-D spread they may hold back at a time when they could have saved a team mate. Whichever metric you take, there will be people trying to optimize their score under that metric and this will lead to distortions.
Another problem is, of course, that we would like to use the skill ratings for matchmaking. The current system essentially aims at a 50:50 win loss ratio for each team. It is unclear, how individual skill ratings based on individual achievements would change the calibration of such a system.
Of course, one might use a weighted combination of team and individual measurements. However, whenever individual measurements enter the equation there will be trouble, maybe less trouble if the weight is less, but that is not really good enough.
Q: If the skill of every player is represented by two numbers, how is it possible to rank players in a leaderboard?
A: The TrueSkill ranking system uses the so-called conservative skill estimate which is the 1% quantile of the belief distribution: it is extremely likely (to be precise, with a belief of 99%) that the player’s actual skill is higher than the conservative estimate. Have a look in the detailed description.
Q: How can I become the top player in a leaderboard?
A: It’s simple: Win games! The TrueSkill ranking system matches you with people of similar skill so winning against them will always bring you up the leaderboard.
But, more seriously, in order to become a level 50 player in an 8-player-Free-for-All game mode you will only need to win 8 (tightly) match-made games in a row! If you do not believe that, try out our interactive rank calculator: Always make sure that Alice wins the game and all other 7 players have the same μ but a σ of 1 (you can use the After->Before button). After 8 games you should see that Alice’s μ is 56.995 and the σ is 1.901; hence, the conservative skill estimate would be Level 51! It may take a bit longer in reality because in these calculations it was assumed that there are always enough players at every playing strength available.
Q: Who is the better player: Someone with a large μ and a large σ or a small μ and a small σ?
A: The answer to this question is not straightforward. For someone with a large σ the TrueSkill ranking system is still uncertain about the skill. Thus, the player with the large μ and a large σ may be better. The best way to find out is to ask the player with the large σ to play more.
Q: I am a level 30 player with a σ of 5 and my friend is a level 28 player with a σ of 2? Why does the TrueSkill ranking system claim that my friend is better; at the end of the day, my level is higher?
A: That is correct. But, you have not played enough games yet for the TrueSkill ranking system to confidently know that you are better; so conservatively speaking, your level is probably 15 = 30 – 3 * 5 whereas your friend’s conservative estimate is level 22 = 28 – 3 * 2.
Q: A couple of days ago I managed to get into the top 350 (in PGR 3 online career) after winning probably 25 of 30 races and that brought me up about 120 spots. Now tonight I have had 5 races: 2 wins,1 second,5th (got spun twice) and a 4th on one of the Vegas tracks. Because of this pathetic record (that is how the TrueSkill formula sees it) I have gone down 115 spots. How is it fair that 2 bad races basically dropped me down almost as many points as 25 wins out of 30 races took to gain all those places ?
A: There are two reasons that can cause this problem (although the latter is probably more responsible for this “phenomenon”):
Q: Well there must be a bug in the system because I jumped into a 4 person race with 3 lower ranked individuals, won the race and my position in the league I was in dropped about 50 spots.
A: Surprisingly, this is not a bug and it happens when players with very small σ but widely varying μ get matched together (thanks to rugdivot for figuring this out).
So, what is going on here? Between any two games of a gamer, the TrueSkill ranking system assumes that the true skill of a gamer, that is, μ, can have changed slightly either up or down; this property is what allows the ranking system to adapt to a change in the skill of a gamer. Technically, this is achieved by a small increase in the σ of each participating gamer before the game outcome is incorporated. Usually, a game outcome provides enough pieces of information to reduce this increased uncertainty. But, in a badly matched game (as the one described above) this is not the case; in this case, nothing can be learned about the winner from the game outcome (because it was already known before the game that the winner was significantly higher ranked than the other gamers he has beaten). So, conservatively speaking, the winner’s skill might have slightly decreased! Note that this can only happen if the gamer is not matched correctly so that he can “prove” to the TrueSkill ranking system that his skill has not changed.
Q: In Dawn of War II, I won a game and went down in TrueSkill. What happened?
A: Usually your TrueSkill rises after a win – however, in Dawn of War II the displayed TrueSkill lags behind one game. (Thanks to CheeseNought for reporting the problem)
Q: Is it at all possible to view the TrueSkill rating of an individual Xbox Live Gamertag? Is there a website that I can go to, to see the ratings of people’s gamertags?
A: Most Xbox 360 games have a leaderboard function where you can find your TrueSkill; in fact, starting May 2006 some games have also provided web access to gamers’ TrueSkill rating. However, there are a few exceptions, most notably with the game Call of Duty 2. At the moment, there is no way to find out about your TrueSkill in this game.
Q: My favorite game mode is Online Career in Project Gotham Racing 3. How can the TrueSkill ranking system find players of similar skill based on the chance of drawing when it is impossible to draw with someone else in a racing game?
A: When the TrueSkill ranking system computes the match quality of other players, it computes the (hypothetical) probability of draw between you and every other player relative to the probability of drawing between two equally skilled players; this ensures that the ratio is always between 0 and 1. This number would depend on the draw margin and thus the match-quality criterion of the TrueSkill ranking system is actually computing this ratio in the limit of a draw margin of zero! This gives the match quality equation specified in the detailed description.
In other words: The TrueSkill ranking system is not taking into account the chance of drawing for a given game mode! Thus, it does not matter that your game mode has zero chance of drawing.
Q: I am playing my first ranked game in a game mode. Will I be matched more likely with another player new to the game mode or with someone else?
A: When you play your first ranked game in a game mode, the TrueSkill ranking system assigns you a mean skill level μ in the middle of the leaderboard but a maximal variance σ^{2} of skills; it’s your first game so the ranking system should reflect its lack of knowledge. Now, the TrueSkill ranking matchmaking criterion takes its maximal value for other players with the same mean skill level μ but a small variance σ^{2}. Thus, if available, you will be matched with another player in the middle of the leaderboard but with a much smaller σ^{2}: a player of established average skill.
Why is this better than matching you with someone else new to the game? Well, this other player may, in fact, be one of the most skilled players (who just happened not to have played the game mode yet) whereas you really are a beginner. Then, you two are (up to) 50 skill levels apart. Matching you with someone who is an established average player guarantees that your skill level gap is never bigger than 25 levels.
Q: I have been playing my first game in PGR3 online career last night. I was matched with a couple of Level 22/Contender players. That does not seem right, what’s going on here?
A: The rank that is displayed in the PGR 3 online career lobby is “the conservative skill estimate”; with a chance of 99% your skill is larger than this number. More specifically, the rank is computed by “mean skill – 3 * uncertainty” but, as far as TrueSkill is concerned, your skill is anywhere between “mean – 3 * uncertainty” and “mean + 3 * uncertainty”. So, when you are displayed as “Unranked”, your mean skill is really 25 and the uncertainty is so large that your skill can be anywhere between 0 and 50. However, in matchmaking you get matched with people based on your “mean skill”. Hence you will see large gaps in the matchmaking lobby. That does not mean you are matched badly, though. You are matched as well as it is possible given the information that TrueSkill has about you and in light of all the lobbies that are available to join when you request it.
Q: In PGR3, I am having a hard time understanding why I (novice level 12) consistently get matched with players in mid to high 20’s. Yesterday I had to race a 29, 22, and a 17. And that is just the one example. It seems that the range for matching part is a little too liberal.
A: There are several effects that can lead to your observation:
One last note: Rest assured that once there are enough active Live players around in your preferred game mode, the matchmaking will become much tighter. Also, the skill learning is not affected by a bad match; in fact, if you are matched with much stronger players there is nothing to lose with respect to your TrueSkill skill; the best thing that can happen is that you pull off a win and move up the skill leaderboard by a large amount.
Q: I am among the top 100 players in the world in my game mode. Why do I usually wait longer in the matchmaking lobby than my friend JoeDoe who is an average skill player?
A: This has an easy explanation: There are simply not enough players of your caliber available at any time! Remember that Xbox Live is a worldwide service, so there are perhaps only 1000 players that would be a perfect match for you. Living in 24 different time zones. The only alternative is to match you with players who are much less skilled and sacrifice match quality for waiting time. And this would ruin both their and your experience on Xbox Live. You see: being a top player has its price!
For example, on the right hand side you see a plot of the distribution of the mean skill levels μ for a popular Xbox Live game. As you can see, there are very few players of skill level 40 and above and 5 and below so the chance that an arbitrary other player online at the moment is a good match is much smaller. This results in the longer waiting time.
Q: I am a player with a mean skill of 30 and a skill variance σ^{2} of 4 but my friend is only a player with a mean skill of 10 and a skill variance σ^{2} of 2. If we play as a party, what people will we be matched with?
A: If you play as a party, the mean skill of every party member will be the average of all the mean skills and the skill variance is the average of the skill variances of all party members. Thus, for the purpose of matchmaking only, your mean skill will be 20 and your skill variance will be 3; the same is true for your friend. Hence, together you make a team of skill 40 = 2 * 20 with a joint skill variance of 6 = 4 + 2. But, when you finish a game the update will use your actual mean skill and skill variance; thus, your mean skill will grow/shrink faster (why?) depending on the outcome of the game.
Q: I keep getting matched with people of higher TrueSkill and losing badly, which is very frustrating. Why does this happen?
A: There are several effects that may be at work here:
Q: Can the TrueSkill ranking system cope with handicapped games?
A: No. Among other things, this is something we are working on right now. The TrueSkill ranking system assumes that two equally skilled teams have the same chance of winning.
Q: Can the TrueSkill ranking system identify cheaters?
A: No. The only thing the TrueSkill ranking system can do is to track the plausibility of game outcomes. If you happen to play a lot of games whose outcomes are not very plausible, then this could raise concerns about you. But it could also mean that you are a very adaptive player whose skill is growing faster than the TrueSkill ranking system anticipated. And the last thing you want to be called then is a cheater!
Q: Can you please extend the rank calculator to more than 8 players? We are running a league and we would like to use TrueSkill to rank players on results based on player matches between our members. Some Xbox 360 games allow more than 8 players in non-ranked matches.
A: No, we have currently no plans to extend the rank calculator to more than 8 players; the user interface becomes significantly more difficult and would need a complete re-design to cope with this much information. We may revise this decision and we will let you know.
However, if you only want to rank 2 team games, we have a solution for you already: We have a Microsoft Excel spreadsheet which can compute the update for up to 16 vs. 16 player matches using the rank calculator exploiting the property that the skill of a team is the sum of the skills of its players (once there is an Xbox 360 game with more than 32 players we will update the spreadsheet to cope with even bigger teams). Here are the steps to follow:
The accuracy is only up to 3 digits but that should be sufficient for up to 1,000 players. If you intend to rank bigger leagues, please contact us directly at trueskill.
Q: I am interested to study ranking systems. Do you have any real-world data for a comparative analysis?
A: Yes, we recently released the Halo 2 Beta game outcome dataset. We are very interested in your work and would be interested to learn about your result; please feel free to contact us at trueskill with any findings you have.
Q: I am a software developer and am eager to develop a small application that mimics your TrueSkill Rank Calculator. Would it possible for you to provide me with an implementation of that application (since it was meant for research purposes, I do not see the harm) or at least pseudocode for its implementation?
A: We do not intend to make available the source code of the TrueSkill Rank Calculator in the near future. Of course, we would like to encourage you to pursue research in the subject area so here is a list of pointers that might be of help (this list will be regularly updated if new material can be released):
Principal Researcher