Tuesday, December 1, 2015

The Optimal Number of Lands

Accepted wisdom is that 17 lands is the baseline for a limited 40-card deck. You increase or decrease based on curve, splash, and other considerations. But why is this accepted wisdom, and is it necessarily correct?

I have yet to see an actual rigorous mathematical argument in favor of 17 lands (if you know of one, please point me to it). My guess is that the best players arrived at it based on massive amounts of play. But large data sets have not readily been available for analysis until recently. Is it possible that the common wisdom isn't strictly correct? That we might be able to scrape a few more points of expected value (EV) out of the game by adjusting this baseline?

Let's look at some data first (thanks to Jack for churning these out):


So what we've got here is a table showing the chance of having a particular number of lands in your opening 7 cards depending on the number of lands in your 40-card deck. For example, the very first cell indicates 0.0258 for 0 lands in your opening hand with 15 lands in your deck. That means there's a 2.58% chance your opening hand will have no lands if you run 15 lands in your deck.

I've highlighted the rows for 17 and 18 lands, since those are the ones I'm most interested in comparing. If we make the reasonable assumption that opening hands with 0, 1, 6, or 7 lands are almost always instant mulligans, then we can sum the probabilities for those cases and compare them. Those totals are on the right of the table.

So with 17 lands, we insta-mulligan 12.14% of the time.
With 18 lands, we insta-mulligan 10.47% of the time.

That's 1.67% fewer mulligans, which adds up over a large number of games. How bad does it hurt us to mulligan? Here are some numbers from the current draft format, Battle for Zendikar:


Your win% with a 7-card hand is very close to 50%, either on the play or the draw. But as we can see from this data, mulligans hurt. Bad. You lose 12% equity on your first mulligan, and almost 30% on your second. 

And this is with the new scry rule in effect. This suggests that the penalty for mulligans is so severe, adjustments to our deck that minimize mulligans are particularly valuable. More on this in a second. First let's look at a couple of more interesting points from the first table.

With 17 lands, our chance of having 2 lands in our opening hand is 24.55%.
With 18 lands, that chance is 21.61%.

That's nearly a 3% lower chance of starting with 2 lands. 2-land hands are inherently risky, unless your curve is especially low. I keep most 2-landers, especially on the draw, but there is about a 15% chance you won't hit your third land drop on turn 3, which is often very punishing.

Most players would agree that the best starting hands, on average, contain either exactly 3 or 4 lands. 

In a 17-land deck, openers with 3 or 4 lands make up 54.91% of our hands.
In an 18-land deck, openers with 3 or 4 lands make up 57.30% of our hands.

Another incremental, but significant gain in value. What about 5-land openers, which, like 2-land openers, are questionable?

In a 17-land deck, openers with 5 lands make up 8.40%.
In an 18 land deck, openers with 5 lands make up 10.62%.

So we almost certainly lose some equity here. Most 5-land openers are relatively weak and risky. But this is the only category where we actually lose equity between 17 and 18 lands. 

With 18 lands in our deck, if we group 0-1-6-7 as a singular category, we gain equity in every single category except for 5-land openers.

0-1-6-7 lands: +1.67% 
2 lands: +2.94% 
3-4 lands: +2.39%
5 lands: -2.22%

So based on opening percentages and mulligans alone, running 18 lands seems strictly better. You have small but significant gains in every category of openers but one. But what are the other costs associated with running 18 lands over 17? Flood, of course. But how much equity do we lose to flood by having 1 extra land and 1 fewer spells?

That's very difficult to measure. Ideally we'd have statistics on game and match win % based on the number of lands in a given player's deck. But the way data is gathered on MTGO, we can only see cards played out, not ones stuck in players' hands or libraries.

Another statistic we can use is the number of lands or spells on average drawn by a particular turn given a particular number of lands. I've seen this analysis before, but can't find a good example currently. If you find one, or can calculate it, let me know. I'm inclined to think the loss of EV due to flooding between 17 and 18 lands is smaller than the loss due to suboptimal opening hands, but I can't actually back that up with hard numbers.

For me the main takeaway is this: Running 18 lands over 17 lands in your limited deck leads to significant gains in the average quality of your opening hands. However, it also probably reduces the quality of your draws in the mid to late game. Just how much is difficult to quantify.

Until then, I think it may actually be correct to run 18 lands in most draft decks, and I'm going to try it for a while to see how it works out.

Tuesday, February 11, 2014

Shuffling

Pile shuffling isn't shuffling.

I've heard this a lot, and it came up recently in a discussion thread I was reading. Randomization and fairness are important elements to any card game, including Magic. The goal of shuffling is twofold:

1) Every order of cards is equally likely.
2) Neither player has information about the order of the cards that they shouldn't.

The second goal follows somewhat from the first, but not entirely. For example, a player's shuffling technique could result in a properly randomized order of cards, but might have revealed information during the actual shuffling (e.g. by showing the bottom card to the shuffler during the process of shuffling).

To determine whether a given method for shuffling is randomizing well, we would need to generate some sample of randomized cards using the technique (the larger the sample the better), and analyze whether or not there is a systematic bias. That means, are certain orders of cards occurring more often than they should be. If so, the shuffling method is not doing a good job. How well the method is performing is dependent on how systematic the bias is.

Riffle shuffling consists of dividing a deck of cards into two piles and interleaving them back into one pile

Now then:


Wait, what? The extent to which a riffle shuffle meets the goals posted above, then a single riffle shuffle is a valid part of a good shuffling method. However, a highly-skilled shuffler can riffle shuffle such that they interleave the cards perfectly, giving them exact information about the location of the cards. If a player is skilled at deck manipulation, then riffle shuffling is just as invalid as pile shuffling.

But what about pile shuffling? Some say "it's just counting, not shuffling". Well, that's not true if, as part of your entire shuffling routine the pile shuffle helps in meeting the goals above. It's true that if you pile shuffle in exactly the same way with the right number of repetitions, you will return to the same exact configuration. That doesn't mean that a single pile shuffle is not a valid part of an entire shuffling method, though. That means that a certain number of pile shuffles with no other actions is a very bad shuffling technique. Well, the same holds for riffle shuffling, if done in a particular way.

If a player pile shuffles once, then riffle shuffles several times, then pile shuffles again. Assuming they do not know the starting configuration of the deck and assuming they are not a skilled deck manipulator, then the pile shuffle will actually contribute to the overall shuffling method. A single pile shuffle insures that adjacent cards are non-adjacent. If the player knew that before shuffling the top two cards were islands, a pile shuffle would mean the player no longer knew this information (though they would know that the islands are x cards apart, where x is the number of piles). But this information is quite a bit more difficult to hold in the working memory of your average player. In general, a pile shuffle is going to make it more difficult for a player to remember the locations of individual cards, even if they knew the starting configuration.

In Magic, the pile shuffle has the added advantage that it does work as a count, making sure that you have the right number of cards in your deck. But it gets an unfair rap as "not real shuffling", probably because if done without any other actions or in a certain way it can lead to systematic bias and potentially give a player information about the configuration of the cards. But the same is true of riffle shuffling.

So stop your hating. Pile shuffling is a valid part of a good shuffling technique. You just don't pile shuffle exclusively and you'll be fine. I'm sure we could validate this premise with either experiments or simulation. Supposedly experiments have demonstrated that seven riffle shuffles is the magic number for good randomization of a standard 52-card deck. My guess is that replacing some of those riffle shuffles with pile shuffles would yield good results as well.

Tuesday, December 10, 2013

Draft Format Season Lengths

There are things you don't notice in Magic until you've been playing regularly for a while. One is the length of a given draft format season (i.e. How long have we been drafting this set?). Sometimes you just get tired of drafting a given format for various reasons. Maybe you're not doing very well. Maybe you just don't like the set that much. Maybe it's just not a set that rewards tons of drafting.

Looking at the release dates of the last 10 major sets (I'm excluding special sets like Modern Masters), here's an infographic showing the lengths of each live draft season:


























I came back to the game and started drafting heavily right at the beginning of Innistrad, so I didn't realize how long we drafted it. It was an awesome set to draft and I was still learning, so fatigue didn't really set in.

But you'll notice that seasons for the last set in the year, usually the first set of a block, are the longest, by nearly a month in most cases. This has a few ramifications.

One, at least at my local game store, it means that the number of drafters wanes as the season wears on. Some people draft to get cards for Standards, and midway through the season they've either drafted, bought, or traded all the cards they need. Some people just get tired of the format. Innistrad and Theros are very rich draft sets, and I haven't gotten tired of Theros yet, but we've got nearly two more months of it. I think by the end of the year I will have gotten my fill.

Another by-product of the length of a season, at least for me, is how it affects my win percentage. I'm usually pretty early out of the gates. I listen to the LR set reviews before pre-releases if I can, and practice on sealed generators and/or draft simulators. I have a great win percentage at pre-releases and early in the format. But then something happens. Everybody else figures the format out. Halfway through the season, even the more inexperienced drafters have pretty solid card evaluation and know what the higher-tier decks are. A better player starts to lose equity, since one of the only areas they can gain an edge is during play.

I'm experiencing this right now. I'm on a horrible downswing in live play, getting completely blanked (zero wins) the last two weekly drafts at my local game store. It seems to be a perfect storm of drafts gone awry (weak packs, inconsistent signals, etc.), the weaker players getting stronger, and just some good old variance.

Last week, I lost all three of my matches 2-1, each deciding game super close. We also do within-pod pairings, meaning that we tend to get more pair-downs (2-0's playing against 1-1's), especially when people drop early.

My win percentage for Theros had been up around 80%, but it's now fallen to around 74%. Still good, but not killing it. In between those two horrible weeks at my LGS, I top-16'd a Pro Tour Qualifier in Austin, TX, getting 11th out of 154 players, and just barely getting knocked out of the top 8. So I know I'm still playing strong. But as the format slogs on, the weaker players have much more room to grow, and that's exactly what they're doing. Just means I have to try to improve proportionally, and try to eliminate even more of the inevitable mistakes in draft and play.

I also wonder if this season length is a happenstance, or whether Wizards does this intentionally for some reason. For drafting, and probably constructed, the game would probably feel fresher if they shortened the release cycles for the year-end sets. Or, they could try something akin to what they do on MTGO, reprint a flashback set to ease the fatigue.

It's kind of a shame. I think I'm going to get sick of Theros, but I think it's been a very fun draft format, so I don't want to run it into the ground. Long draft seasons should also be an opportunity to experiment with twists on formats (like back drafting), but those are likely to be even less popular than standard drafts. 

Ah well...buckle in for two more months of Theros, folks.

Magic Online Accepting Beta Testers

Magic Online is currently accepting applications for beta testers. I've been in the closed beta for over a year and a half now and it's pretty awesome. You get to play MtG for free (sort of). The cost is being vigilant in reporting bugs in the beta, but you're actually helping make MTGO better, so that should give you a warm and fuzzy feeling anyway.

Apply here.

Monday, December 9, 2013

Draft Format Attributes

A while back on Limited Resources they had on Brian David-Marshall, who talked about whether or not a given draft format was a prince or pauper format. This designation refers to the distribution of power among the cards in the format. A prince format is one in which the power level of the rarer cards is very high, so the format is very bomb-driven. So, a pauper format is one where the power distribution of the cards is flatter (the bombs are not as impactful, though commons and uncommons on average are).

I've been drafting heavily for over two years now, and thought a lot about various formats and some other attributes that define them. These categorizations might be able to help you adjust to a particular format, but in some cases they are simply qualitative measures that may make you favor a particular format over another. 

Deep/Shallow

This designation is somewhat similar to prince/pauper, since it deals with the distribution of power level of cards in the set, but the depth of a particular format has to do less with the difference in power level between the bombs and the commons and more to do with how quickly the power level declines among the commons.

A deep format is one in which card quality remains relatively high even towards the end of packs, meaning there are probably still playable cards among the last three picks. A shallow format, however, sees a dramatic falloff in card quality soon after the best cards are taken, so that typically in the last half of a pack, finding desirable cards is very difficult.

An example of a deep format is Innistrad. In the same block, Avacyn Restored was a shallow format. I would often end an Innistrad draft with 27 or more playables, and the difficult decision was which cards to cut. AVR was the opposite, and I'd be scrambling for playables by the end of the draft. During deckbuilding I'd have to decide which cards were the best of the worst to include. Both situations require useful skills. They both involve maximizing the value of your pool, but shallow formats are more unforgiving, especially in the latter part of drafting if you're not mindful about what your deck needs and you panic and make bad decisions like moving into another color to strain your mana base.

Slow/Fast

Of course, some of these designations aren't mutually exclusive. When we talk about the speed of the format, we could be washing out information by just talking about the average speed. A given format may support decks at both ends of the spectrum, in which case talking about the average speed (e.g. on average most games end on turn 7.2) is not very useful.

The current format as of this writing, Theros, is a good example. It's not all that useful to talk about whether the format is fast or slow. What is useful to know is that the format supports some very fast decks, and so your grindy, powerful deck needs to be equipped to deal with them or you could be losing a lot of matchup-dependent games. For example, in one of my first Theros drafts, I had a very powerful blue-black deck, with four Grey Merchants. I knew the power of the card, even early in the format, and I expected to be able to come from behind against even the most aggressive starts. In round one, I faced a red heroic deck that turn 1'd an Akroan Crusader, then used two heroic triggers on turn 2, and was swinging for 10 on turn 3. I had no defenses in place and got completely run over.

In subsequent Theros drafts, when drafting a slower, more controlling deck, I began to prioritize early blockers in my colors, especially the two cheap deathtouch creatures, Baleful Eidolon and Sedge Scorpion. Cheap defenders like Returned Phalanx were also very important to fend off the super-aggressive decks and buying time to play very powerful cards like Grey Merchant.

In the last format, M14, the lower overall power of creatures meant that you were unlikely to be facing enormous pressure early, and people started to figure out that the expensive card-advantage spells like Opportunity were much better because of it. The most aggressive deck was probably slivers, but that was a very difficult archetype to draft, since the slivers were good on their own and it was difficult to draft a critical mass of them. So unlike Theros, making sure you had answers to very early aggression wasn't quite as important.

Balanced/Unbalanced

The MtG design team usually does a decent job of balancing the card quality among the colors, but it's a difficult thing to get just right and sometimes the format just isn't balanced. Avacyn Restored is a good example, with blue, green, and red being far better than white or black. Black was also in the unfortunate position of getting a rather weak mechanic (the so-called "loner" mechanic), in lieu of any soulbond, which was a very strong mechanic. Even though black had the best removal on average, and a few standout cards, it was usually avoided. This led to a sub-strategy where a careful drafter might be able to exploit black's undesirability and soak up all the black cards, making a reasonably powerful deck. Some players swore by this strategy, but I rarely saw it in action in practice. A lot of the better black cards, like Homicidal Seclusion and Killing Glare, were easily splashable, so strong drafts often cut them, making the lone black deck at the table significantly worse.

If there is a quality disparity, and you can identify it early, you might have a significant advantage over fellow drafters. As a format wears on, though, that advantage is going to diminish. There might be cycling strategies where, as in Avacyn, you can prioritize the weaker color in an effort to get a strong pool with the best cards in the worst color, but again, I have usually not found this to be a good strategy.

Constrained/Free

Some formats lead you down certain paths, and punish you from deviating from them. I call these formats "constrained". The more constrained, the fewer archetypes the format supports. The most obvious example are the recent Return to Ravnica formats. The first two sets basically forced the drafter into one of the five guilds. If you were drafting RTR and decided to draft a non-guild color combination (such as UG), your deck was probably going to be worse than most guild decks, which were rewarded with powerful aligned gold cards and high synergy between the aligned colors. Same with the following set, Gatecrash. 

Another constrained recent set was Modern Masters, which sported very rigidly-defined archetypes (Giants, Faeries, Affinity, Rebels, etc.). Deviating from one of the main archetypes usually led to sub-par pools, since the cards in those archetypes were highly synergistic.

A good example of a free format was M13, in which the colors were very evenly balanced and a strong deck could be built out of nearly every two-color combination. The result is a format that generally rewards repeat play. I found myself getting bored more quickly with the RTR drafts because there were basically five decks you could build, and you usually either got a very strong representative of that archetype, or you were getting cut and got a very bad deck in your guild.

Conclusion

One thing I didn't discuss was the average power level of the format (e.g. Cubes often have extremely high average power levels). Classifications based on relatively power level (e.g. between the rares and commons as in prince/pauper, among the commons as in deep/shallow, or between colors as in balanced/unbalanced) is more useful. 

I personally prefer deeper, balanced, freer formats, though I did enjoy drafting the RTR formats and Modern Masters. Avacyn Restored was probably my least favorite format over the last two years, since it was a very princely format (cards like Bonfire and Entreat were basically unbeatable), very shallow, and unbalanced.

The current format, Theros, is quite good, though. The average card quality is high, and it supports many different archetypes, of varying speeds. Grey Merchant should totally be an uncommon, though.

Wednesday, October 16, 2013

Theros: The Power of Four

So I've drafted Theros four times now. The last three drafts I've been lucky enough to have multiples of powerful cards, specifically four of each.

Draft #2: Four Grey Merchants.

This was still early enough in the format where people hadn't realized how nuts this card is. Now pretty much everyone knows. I could have had five, but I took Celestial Archon instead. Seemed good at the time, but in hindsight I think a 5th Merchant would have been better. This deck went 4-1, losing the first round to a ballistic heroic start both games. If I had any time at all to set up, this deck basically could not lose. I learned to value the 1/1 deathtouch in black and also Returned Phalanx, early blockers that buy me time to get into midgame, where I basically cannot lose. Sadly, this was probably the best deck I'll ever draft in the format, and it got me 6th place and 3 packs.


Draft #3: Four Deathbellow Raiders.

I also had two Kragma Warcallers and 10 minotaurs total. That's a lot of beef. The deck smashed pretty hard and went undefeated.


Draft #4: Four Horizon Chimeras.

If you're the only UG drafter at the table, life is good. This card combined with lots of card draw makes life very difficult for your opponent. This deck went undefeated as well.


The Merchants and Raiders are common, but the Chimeras are uncommon. It's much more difficult to score multiples of uncommons, though if they're multicolor and that combination is open, it's much easier. I could see getting four Kragma Warcallers, since minotaurs seem to be a bit underdrafted. Battlewise Hoplite is another possibility.

My dream draft tonight is Wx heroic with four of these guys. But I've been pretty lucky so far and that's pretty unlikely to happen.

The format is better than I thought it would be. It can be swingy, but not nearly as bad as AVR. The removal is pretty bad, but cheap deathtouch and bounce keep things reasonably balanced.


Friday, July 5, 2013

Chandra, the Underwhelming

So here's the new Chandra:


Yawn. Conley Woods has an article today where he assesses her and actually thinks she's pretty good. I don't see it. The most recent planeswalker printed is Ral Zarek, who is also four mana and gives you a Lightning Bolt for -3. That's nice value. Ping player and creature for 1? No thank you. And the zero ability means you're not playing anything on curve. If you untap with her, her value goes up substantially, but the turn she comes into play she is very weak.

I know some people were hoping for a 3-mana Chandra, and also a playable one. Here would be my initial cut (click to enlarge):


Her loyalty equals her casting cost, which is a good sign. Her plus ability is aggressive, as red should be. It synergizes with aggressive cards like Hellrider and gives you a dude as a possible sac outlet or excellent trade target. If you can somehow copy it via populate or another mechanic, you've got a blocker to protect her, but for the most part it's a balanced, standalone ability that works with red's flavor.

Her minus two is a solid, if unexciting, shock. If Zarek gets a bolt, surely Chandra should at least get a Shock. Like Liliana of the Veil's -2, it's potential immediate protection, but puts her at a very vulnerable 1. Seems like a decent tradeoff.

If this Chandra's flavor is spawning elementals, and if her +1 makes little guys, why not have her ultimate scorch everything and make Ball Lightnings out of the results? This last one should probably be tuned. Maybe it shouldn't do 6 to each player. Probably the elementals should be exiled at end of turn. But you get the idea. This is the kind of ability both playwise and flavorwise that I want to see on a Planeswalker. You probably win the game, and it's hyperaggressive, but a single Profit/Loss or Electrickery will wipe all your new dudes, so it's not necessarily and insta-win.

Unfortunately we're stuck with the clunky new Chandra. I don't think she's very good, except in Limited, which is what I most play. I could be wrong, but I don't think she'll see play in Standard.