The Persistent Problem of PvP Rating Exploits

Velidra sent me the link to the above video of a Destruction Warlock tearing apart battlegrounds with his bare hands. The guy takes on 3 Rogues at once and walks away the sole survivor. If I walk away from an encounter with a single Rogue, I usually count myself lucky. Three of them at once? Game over, man. Game over.

But not for Adouken.

I enjoy PvP videos. They usually make me feel bad about my own UI (how do they do that with so few addons?) but they make me feel great about the potential of my class, and I try to learn from them as best I can.

Videos naturally show a skewed version of a player’s skill, but that’s doesn’t mean that players who show off their skills in them are somehow faking it. They might not be that good all the time, but damn if they weren’t that good at some point, when the cameras were rolling. Odds are pretty high that they are that good, and that they operate at a high level of play all the time.

I don’t play anywhere near as well as you see Adouken play in that video – far from it. Watch that first segment and realize that he’s casting Nether Ward inbetween the time a spell is cast at him and the time it reaches him. You notice how it looks like he reflects Death Coils back at their caster? He’s casting his own Death Coil while his opponent’s spell is in the air. That’s awesome.

There is an objective difference in skill between Adouken’s player and me. While it may not be easy, surely, we can measure it somehow, right?

That’s where PvP ratings are supposed to come in and help us know the great from the good, the poor from the mediocre. They way they work is simple, at least in concept.

  • There are two numbers used in the rating system: Matchmaking Rating (also called MMR) and your PvP Rating. You have different values for each bracket.
  • Your Matchmaking Rating changes with every win and loss, and is used by the system to try to find a level of skill where you’ll win about 50% of the time. You can think of the MMR as measuring your aptitude, your potential rating.
  • Your PvP Rating is based upon your performance over time, changes slowly, and is what PvP achievements, gear rewards, and titles are based upon. PvP Rating, in theory, measures your performance over time.

The goal of the PvP rating system is to match you up with people of equal ability, not to allow you to win all the time.

That’s kinda weird, isn’t it? From a sport perspective, it would be really strange to have a system that wasn’t based on win/loss records (performance). But you also have different leagues and ways of stratifying talent that don’t exist in computer games – local, regional, and national competitions, playoffs, major, minor, and little leagues. So instead, the goal is to put a number on you and say, this is an arbitrary level you’re performing at.

All other things being equal, you should win about 50% of your matches against teams and people of similar PvP rank.

But that’s not how it works.

THE CURIOUS CASE OF THE MISSING MMRS

So a funny thing happened in the 4.2 Patch notes.

  • The individual Matchmaking Rating column has been removed from the Arena scoreboard.
  • The individual Matchmaking Rating column has been removed from the Rated Battleground scoreboard and replaced with a team Matchmaking Rating.

This is kind of curious, isn’t it? What’s going on here?

I’ve said several times that Blizzard is trying to encourage people to get into Rated Battlegrounds in patch 4.2, and that many of the changes are with this in mind. You might think that a change like this is just to make it so that people who join rBGs don’t see how outmatched they are and throwing the match immediately.

While this makes a limited amount of sense, it’s not what’s going on. Yes, this is to try to make Rated Battlegrounds more fair, and therefore more attractive. But hiding the personal MMR is aimed at stopping a series of exploits people are using to get titles in both Arena and Rated Battlegrounds, exploits which are running rampant right now. The most common exploit involves using alts to boost the main characters’s MMR, then winning enough games at the various high levels to get the desired titles.

If you’ve been comfortably playing in a lower Arena bracket, you may have noticed that the last 2 weeks have been rather… more painful than before.

You’re not imagining things.

THE PROBLEM WITH MMR, OR WHY THAT TEAM JUST STOMPED US

[Player]: How about a joke before you go?
[GM]: Your Arena rating.
[Player]: /facepalm

Consider the following facts about how MMR works.

  • Your team MMR (different from your rating, mind you) is equal to the average individual MMR of all the players on the team.
  • In the event of a win, individual MMR should go up, thereby raising the team MMR. Losses reduce MMR, but not as much as wins do.
  • Players on new teams start out with 1500 MMR.

Let’s look at how this works out.

You start the season out with your two friends and start playing 3v3 on your mains. You win some, you lose some, but your individual MMR rises and falls together. If your MMR hits 1800, your teammates are also at 1800. Your MMR, and eventually your PvP rating, accurately reflect your team’s performance to date. Everything is rosy.

Now let’s say one of you has an alt you want to bring in. Maybe it’s because it’s a better comp, maybe it’s just time for a change. Now you’ve got two people at 1800 MMR and one person at 1500 MMR, so your team has a MMR of 1700. You’re facing teams which are a little worse than you were doing before, but maybe the alt is undergeared, so it balances out.

What’s interesting is that it might balance out to fair matches in the 1700-1800 bracket, your individual MMRs are now going to be out of sync. The alt will always have a lower MMR than the other two main characters, and can never catch up.

Now let’s take a step back and change the conditions a little bit. You have a 3v3 team, starts fresh at 1500 and goes to 1800. Two of you drop your mains and swap to alts. Your team’s MMR is now 1600. You’re facing easier teams than you did at the 1800 bracket, so you win, even though the alts might be a little undergeared. They gain +200 MMR, you gain +200 MMR, you’re now at 2000 MMR, they’re at 1700 MMR – and your team is back at 1800 MMR.

With me so far? You’re still playing at 1800 MMR teams, but your personal MMR is 2000, your team’s alts are at 1700.

Now, you swap to to one of your alts, and one of your teammates swaps to their mains. 1500 alt, 1700 alt, and 1800 main are now on a 1666 team. You play until your teammate’s main is at 2000. (You’d be at 1700, the second alt would be at 1900.)

You see where this is going, right?

By cycling through alts, teams are able to artificially boost the individual MMR of their main characters.

Now let’s take this a step further. The team cycles through once or twice, everyone’s mains are sitting around 2000 MMR. The alts are all around 1800, which is really where people’s skills are at.

So the team hops on their alts and loses every single match. Their MMR tanks. They go from an 1800 MMR team to a 500 MMR team in a night. Those characters have terrible MMR now, which is exactly what they want.

Because now, you have a crop of alts at 500 MMR to swap into a team with a 2000 MMR main. The team’s matchmaking rating is 1000, so they’re going to be facing significantly easier opponents. But they’re capable of playing at 1800 MMR, so they dominate. The main’s MMR shoots up to 3000+ while the alts are climbing back up to 1800.

And then once everyone’s mains have an MMR of 4000+, they all rejoin the team and play enough matches to bring their team rating – and therefore their PvP rating, which gives the Gladiator titles – up to the desired level. Yes, their MMR will fall from the heights it reached, but the PvP Rating will rise to meet it somewhere in the middle.

When that team comes and stomps your 1000-rated group you and your friends put together to screw around on with perfect CC chains, huge burst damage and flawless target switching… they should never have been playing you in the first place.

BUT WAIT, RATED BATTLEGROUNDS ARE EVEN WORSE

You know why MMR boosting is an even bigger problem in Rated Battlegrounds? It’s not because they’re BGs, and it’s not because I am trying to pick a fight with rBGs this week.

No, it’s because:

  1. There are 10 people on your team, and
  2. Rewards are based on your individual MMR, not your team MMR.

Nice, huh?

Swapping alts (or even players who don’t care) in and out of BGs can be done like in Arenas, but it’s a little easier to boost MMR due to the number of low rated alts you can bring to the team. If you have 2 players at 1800 and 8 players at 1000, your team will be at 1160 MMR and (hopefully) get matched accordingly.

The coordination required to alt swap and lose MMRs is harder to do with 10 people than with 3. There’s a lot more time involved with Rated Battlegrounds, and the effort put forth by a low-rated character is often the same (or more) than a high-rated one, but the high rated one will get rewarded disproportionately to their efforts. While there is some alt-swapping going on, it’s not as easy as some other methods of boosting your MMR.

No, the best thing to do is to work with a strong group until you’re all up to a decent level – say 1800-2000 – and then PuG like crazy. Get into the worst groups you can find who still have a chance of winning, and play with them. This has the same effect as the alt-swapping MMR boost – when you win, you win big, when you lose, you don’t lose that much – with none of the headaches of having to swap alts yourself. You can go from PuG to PuG, increasing your MMR with each win. You may not win as consistently as you do with your set group, but you will get a great rating, which in turn gives you access to the PvP titles. You don’t even have to win any matches at your new MMR to get the titles, because nothing is based on your team’s MMR or rating – just your individual rating.

Remember back when you thought people’s rating really measured their skill?

/AFK FTW

At some point above, you probably wondered how people can preserve their ratings while losing.

Well, if you leave a match before it finishes, it doesn’t count. This is how win-trading works – people queue in off-hours, trying to get specific teams to match up against, and leave the match if it’s not them. When people leave the match as soon as things start going a little wrong? They’re leaving to preserve their MMR, which gets modified at the end of the match.

You didn’t think people were /afking because they were scared of you, right? 🙂

WIN TRADING

Another reason why people /afk out of an Arena (or Rated Battlegrounds, though I think this is less common) match is because they’re trying to trade wins with another team.

This often happens late at night, when there aren’t a lot of teams playing in the different brackets, and it’s been a problem since Arenas started, but obviously if you can find a team who will throw the match for you, it’s a great way to get your PvP Rating to match your possibly inflated MMR.

I don’t have a lot to say about win trading. Don’t think it doesn’t happen, because it does.

WHY BLIZZARD IS HIDING INDIVIDUAL MMRS

Given that there are two different types of MMR inflation going on in both types of Rated PvP, you can start to see why Blizzard is trying to hide that value. It’s not going to prevent the problem from happening, especially not in Rated Battlegrounds, but it can reduce the precision with which people are doing it now. There will be more guesswork when exploiting, both in boosting and tanking individual MMRs.

There’s a concept in security circles called “Security through Obscurity,” which is a way of describing any security system that relies upon something being hidden for it to be secure. It’s usually treated as a bad thing, because once something is found that relies on it, it’s completely insecure. In cryptography, if your sophisticated code algorithm uses a single seed to generate codes, once the seed is known your code is useless. In piracy, if you bury your gold but don’t put a lock on it, anyone who finds the gold can take it.

In other words, security through obscurity is generally not very secure.

There’s a temptation to say that hiding the MMRs is just that – not making the system any less susceptible to exploitation, just hiding the problem. People can still do the things they’re doing now. You are going to face teams who are boosting themselves, who have great gear and skilled players but are playing with an MMR well below their real skill, and you won’t be able to tell anymore.

But, removing the data points does make it more difficult on the exploiters. Not a lot – not like a complete revamp of the MMR system would – but a bit. It’s a relatively simple change in terms of development time which will have some impact. That’s why it’s happening now.

I don’t really like this change, but I see that Blizzard has to do something.

Will teams still be able to boost their MMR into the stratosphere? You bet. As far as I can see, as long as the three conditions I laid out about the MMR system hold true, boosting is possible. You can’t have flexible teams and not have this kind of potential abuse. Will it be harder for other players to find out who is boosting? Yes, it will.

It’s not great. But it’s a start.

IT’S ALL RELATIVE

Man is the measure of all things.

-Protagoras

The interesting thing about the PvP Rating system, at least the Platonic ideal of the PvP rating system, is that it provides a way to compare people with very different character types. No matter what you play, or what your team is like, it should provide a relative measure against other players. The values are arbitrary and entirely dependent upon the actions of other players, as well as your own.

I think about other rating systems that assign a numeric value to your ability – college aptitude tests like the SAT/ACT, IQ tests, even professional placement exams – and they all measure ability based upon fixed criteria. Here is a test, there are right and wrong answers, how did you do? (Please note, I am an old fart, and I still think of the SAT as having all multiple-choice questions, none of this fancy writing stuff.)

Both types of test assign numeric values, which of course makes them more scientific.

But more than that, both purport to measure aptitude, but one is easy to game for your advantage, while the other is not. Why is that?

Take a look at the exploits again. Each one of them involves using other people. The system isn’t the problem, the people are. The system relies upon measuring you and your teammates, and your performance against other teams, which provides two places where it can be exploited.

Your opponents can really only modify your rating through throwing a match and win-trading, which is one kind of problem. You and your teammates can modify it through careful manipulation, boosting some characters, tanking the ratings of others, and preserving gains through /afking.

If these ratings were static and based upon some kind of objective performance, this kind of exploitation would not be possible. You can’t cheat an aptitude test by trying to throw off the bell curve and flooding the test pool with people who are going to score 0. You can’t get a 1600 on the SATs by being better than everyone else in your testing pool – you have to get every question right.

There are objective measurements of player skill, even in an environment soaked in relativity like PvP. Go back to the video at the top of the page. The player’s reaction time is faster than many others. They choose the right spells and abilities to succeed. They position themselves well, they use their abilities in the correct order. There is a measurable difference between that kind of play and my own, and that means we could construct a static test to measure it.

But static tests are hard. They have to be randomized, administered sparingly, maintained and updated. I don’t know how it would capture performance in the field fairly. I have only the vaguest ideas how a static PvP test would work. Perhaps like kata in martial arts, where mastery of a ritualized set of moves – perhaps a scripted PvP encounter for each class – is required to move to the next level?

That doesn’t feel much like PvP to me. PvP requires other players, living, breathing, thinking teammates and opponents.

And yet, as soon as we bring other people into our measure, we open the door for manipulating that rating.

SKILL > RATING

PvP Rating is not equal to skill. As much as we would like to have a system that really represents skill, the PvP Rating system is not it.

The more I look at how the PvP Rating system is being manipulated, the less I respect it. There are a lot of highly skilled players with high ratings, where ability and performance are in sync. But there are plenty of other teams that are taking shortcuts, who are going for the quickest way to their desired goal. They’ll stomp through the lower brackets while boosting a friend’s toon. The only incentives that aren’t about gaining the coveted rating are designed to get people into Rated Battlegrounds – everything else is about getting your numbers up.

Players who deliberately game the rating system sadly affect other players. A 2500 player playing in the 1250 range artificially depresses the ratings of people who would naturally be in the lower brackets. The upper brackets, in turn, get filled with people who have artificially inflated their ratings, giving the people who actually perform at that level easy opponents, inflating their ratings in turn.

The more players who game the system, the more imbalanced the brackets get.

And none of this is a reason to not play Arenas or Rated Battlegrounds.

  • Arenas remains the best place to learn how to win fights in PvP, period. (The only other activity that even comes close is dueling, which is really 1:1 Arena.) Yes, it’s a death match. Yes, there are strict limits about what you can and can’t use. Yes, you’re going to have unbalanced matches. Try to win them anyway. Learn from your losses.
  • Rated Battlegrounds delivered on their promise – they let you play BGs with the team composition you want against really good opponents. You have to win the individual fights, you have to execute a strategy, you have to do it against an organized opponent. Yes, you’re going to have unbalanced matches. So what? Get stronger.

As long as PvP Ratings are a relative measure, players will work together to game the system and artificially inflate their ratings. The exploits I’ve discussed are just some of the ways that players are trying to get around the system.

Is this cheating? Yep, you better believe it. Creative use of game mechanics, my foot.

But while it unbalances PvP, it’s not a reason to abandon Arenas and Rated Battlegrounds.

Skill is not equal to rating. Skill can’t be gamed, it can only be acquired through work and talent.

Screw your PvP Rating. Focus on improving your skill instead.

If you do that, all the exploits in the world won’t matter one bit.

23 Comments

Filed under Cynwise's Battlefield Manual

23 responses to “The Persistent Problem of PvP Rating Exploits

  1. aaaaa

    Skill is not equal to rating. Skill can’t be gamed, it can only be acquired through work and talent.

    Screw your PvP Rating. Focus on improving your skill instead.

    If you do that, all the exploits in the world won’t matter one bit.

    WORD

  2. The Dewd

    This is one of the big reasons I don’t PvP anymore. I got my feet wet as a resto druid at 60 doing AB and AV and healing anything that was bigger and meaner than me. In BC I did some arenas because my guild was slow out of the gate and too small to do anything other than Kara. I had two choices as a feral – I could either get the weapon off Illhoof (not likely to happen at the rate we were progressing to him) or do arenas for the Season 2 weapon. I formed up a couple of teams, lost my 10 games a week (not on purpose), and got my beatstick.

    Nothing to me was more frustrating than getting absolutely STOMPED by a team that had no right to be playing against my group. I learned what I could and tried to improve but as a feral druid in Seasons 2 and 3, there wasn’t a lot of hope for me. And then, to add insult to injury, I learned that people were gaming the system just to get easier fights, etc.

    I keep thinking that I should try PvP again but at this point I’m so scarred and gun-shy that I can’t bring myself to do it. 😦

    • It’s been more forgiving this season, though not right now at the end of it. I didn’t like Arena at all in Wrath, so it’s a bit of a personal surprise that I like it now.

      Wait for Season 10, then give it a try!

  3. I love reading your posts, Cyn, you always have such interesting topics. 🙂

    This problem is clearly quite real, and I think the root is less the use of MMR vs. personal rating, versus in how they measure the team’s MMR as a function of the individuals’ MMR. An arithmetic mean really doesn’t make sense for a couple of reasons, both of which you’ve touched on here:

    Firstly, if “skill” is a description of what a team can do, then an average isn’t a very good measure, because the amount a weak player detracts from the team seems quite a bit less than the amount a strong player contributes to it. You can see this in unrated battlegrounds all the time: One very skilled healer can, in many cases, turn a bunch of road-fighting crazies into a winning combination. One very strong flag carrier can make the difference between a midfield scramble and a three-cap victory. One diligent flag-room defender can be almost as effective as turtling. One weak player wasting his or her time running around in the midfield doesn’t cost the (unrated) group much at all.

    Secondly, using the average gives the wrong incentive to teams. If we take it as given that everyone wants to win as much as possible, then it makes sense you would want your team’s MMR to be as low as possible without sacrificing personal rating. The MMR games you’ve described here are the obvious, logical conclusion given the incentives built in to the MMR calculation.

    If they want a “quick fix”, my feeling is that instead of just hiding the MMR values (which, admittedly, does make it more tedious to game the ratings), they should change the metric. For example, perhaps they could set the team MMR to the maximum of the individuals’ MMR values instead of the average. This would be a powerful incentive for teams to keep their MMR values close together. Perhaps too powerful: It might make it hard for teams to replace somebody. Still, I think it would better align the players’ incentives with the desired outcome.

    If the maximum MMR is too extreme, another idea might be to use the median of the team’s values: Write down everybody’s MMR from highest to lowest, and pick the middle one. Now, if a team wants to game the system, they have to replace more than half their personnel with low-rated players. They could still game the system, but the cost of doing so would be enormous. Maybe a 3v3 team can afford to put in one low-ranked character, but can they afford to put in two? Can you still win with three low-ranked players in a 5v5 team? Maybe so; but at least it evens the odds a bit.

    Anyway, thank you again for an insightful and informative post!

  4. zwinglisblog

    I enjoyed BGs, but ended up going the pve rout for Wrath. I’m thinking about getting back into BGs, and I find your stuff helpful.

    What I mean is. If this kind of thing is happening, then I’ll be less likely to be frustrated when loosing. If I know I’m getting beat by a superior opponent, I have a choice to make. To give up, or to make myself as annoying to them as possible.

    I usually choose option two. 🙂

    Zwingli

  5. Seca

    Any day you can quote Hudson is a good day. 🙂 Was a good read.

    The comment I was going to make has been covered very nicely by Lara. An average is a pretty rudimentary tool, and there are arguably better ones that can be applied here.

    I do question Blizz’s motives a little more then in the post. I see this less as an attempt to make it more difficult to cheat, and more a response to blog posts such as Gevlon’s suggesting it’s better (more efficient) to roll over when clearly mismatched.

  6. Mangara

    This was a great post and it gave me much more insight into the PvP rating systems. There is one thing I have to nitpick on however: you mention IQ as an example of a static test, which does not compare you to other people. While it’s true that the questions and answers are fixed, the translation of #correct answers -> IQ is adjusted every so often to keep the median IQ at 100 and the standard deviation at 15. If we were still using the same IQ tests as 100 years ago, the average IQ would be around 130.

    • This is a good, interesting point. I’ve had a few conversations on Twitter about how the SAT isn’t a static test anymore, but I’m an old fart do I remember it being much harder than it is now. IQ tests are a little different – they’re static during the time you take them, but are adjusted gradually over time. I figure it’s good enough for an illustrative example, if not 100% accurate.

      Thanks for the comment!

  7. HPvPV;dw
    (Horde PvP video; Didn’t watch.)

    All kidding aside, I’m not sure their attempts at trying to bring pvpers like myself back into the game are going to work. :/

  8. Me and my arena mate (Tuek) had just broken 1100 rating (not high but we’re new to arena and started the season late) – as soon as we broke that bracket things went to hell in a handbasket. I’m starting to wonder if you’ve identified the reason here cos things have pretty much eased off A LOT since the new season was announced

    • There was a mad dash during the last two weeks of Season 9 where Arena was just TERRIBLE. I’m so not looking forward to that again, but at least I know it’s not that I magically started sucking.

  9. Shimoda

    Not only is Adouken a totally awesome player, he can speak Chinese while he is doing it!! LOL

  10. When at about 1300 rating I came across a few teams with one 1900 and a couple of skilled 800’s, I threw arena in for the season. I had all my gear and had been playing for fun – but that wasn’t fun.

    Thanks for the clear explanation Cyn!

    • That’s the exact scenario that drove me to look into these exploits. We started getting stomped by teams that were WAY out of our league, and I wondered why they were doing it.

      Thanks!

  11. Clayton

    “Skill is not equal to rating. Skill can’t be gamed, it can only be acquired through work and talent.”

    No skill is not equal to rating but it is not equal to equipment either. Equipment > Skill in the real world. In fact skill is more or less a myth.

    People talk about kill in PvP but in MMO’s it is simply not that relevant. It is more about having SUPERIOR equipment than anything. Yeah, decisions help but let me dueling you Season 9 stuff and you bring Season 1 stuff – lets see who wins.

    Classes also matter. For me WoW is never going to have good PvP because they allow Stealth and Pulls. That rules out good PvP in favor of legal griefing.

    My advice is just play to have fun and forget all this “skill” and rating nonsense because you are chasing a treasure that is not there

  12. Pingback: Let's talk arena - Page 4 - The Warlocks Den - WoW Warlock Discussions

  13. Pingback: The Changing Face of PvP Gear and the Rocky Road to Season 10 | Cynwise's Battlefield Manual

  14. I learned a lot of things from reading this post, not least of which was that you are much, much, much smarter than me. If I could add anything- jury’s out on that- perhaps it’s that people make manipulative opponents, and perhaps there is no perfect pvp system. What rules do you think would minimize that?

    • (To clarify, I meant what rules do you think would minimize the PvP exploits- I kept getting interrupted while writing that comment!)

      • I’m honestly not sure how I would revamp the system. Hiding your rating entirely until the end of the season is one way to eliminate the exploits. It would be hugely unpopular, but if you allowed players to still get gear and achievement when they hit a certain level (perhaps notify them only when they’ve hit 2200+), the end-of-season rush for ranking and titles wouldn’t be present. The problem is then that people are flying blind, so they don’t have feedback on if they’re improving or not.

        I like Lara’s suggestion of using a different mathematical formula for MMR, though there are a lot of details that would need to be worked out.

      • Actually, the math behind fixing MMR isn’t as bad as it seems: You want the MMR to be a function of the individual ratings, but an average doesn’t work because it lets you “hide” high individual ratings with low ones — that’s the origin of the exploit problem.

        Fortunately, though, there are lots of other statistics you could use. I personally like the idea of picking a single order statistic (e.g., maximum rating, second-highest-rating, median rating, etc.) because it gives you an easy knob to balance team flexibility (“we can swap out a player without completely bonking our rating”) against the incentive to cheat. Any order statistic at or above the median forces a team who wants to exploit MMR to replace more than half of their team with low-ranked characters, at which point it’s likely they will start losing, and the high-ranked ones will come down. If half isn’t enough, just bump it up a notch; if it’s too much, bump it down.

        You don’t want it tuned so tightly that nobody can have any fun, but right now I suspect enough Arena and RBG players are more frustrated by being crushed due to rating exploits that even a tight tuning would probably be an improvement.

        The average is nice because it means the team MMR moves smoothly; an order-statistic metric would probably move more stepwise. If you don’t like that, you could also cut the incentive to cheat by discarding a few of the lowest-ranked characters in the group. So, maybe a 5s team would have an MMR computed as the average of the top 3 players. You’d still get the smooth changes, but you couldn’t game the ratings as much.

  15. Pingback: I don’t care about the color of you clothes « Armaggedon's coming!